TranscriptFlow

1 Hour → 12 Minutes • Read Videos Instead of Watching

The Problem

My "watch later" list had 100+ videos — I never got to them. I built an overnight automation: whenever I see a video worth watching, I drop its YouTube link into Notion → while I sleep, AI transcribes and formats it → in the morning, I have a clean, readable article. I read a 1‑hour video in 12 minutes. Full‑text search, highlights, notes — finally extracting knowledge from saved videos.

🏫 A note on fairness: When I save knowledge from a video, I keep the video playing (tab muted) so the author still gets a view. Feels fair.

Before / After

Before: Save a 60‑minute conference talk to "watch later" → never watch → it sits there forever → knowledge locked in video

After: Share the link to my Telegram bot → it lands in Notion → I go to sleep → at 7:00 a.m. a 12‑minute readable transcript is waiting → I highlight insights → search across all saved videos → actually learn

Impact: 5× time savings (60 min → 12 min). 95% transcription accuracy. Instant search for "RAG" across 50 videos. Handles 3‑hour lectures without issues. ~$0.20 per hour of video.

How It Works

Step 1: Any time I find a video, I share the link to my Telegram bot; it goes to the Notion database "Videos to Process."

Step 2: At 2:00 a.m. the automation runs. Downloads audio. AI transcribes it (long lectures are chunked). Another AI removes fillers ("um," "like"), adds paragraph breaks and section headings.

Step 3: The formatted transcript appears on the same Notion page. In the morning it's ready.

Result: A 1‑hour video becomes a 12‑minute read. I can search all transcripts, highlight, and annotate like an article. Queue 10 videos — everything gets processed overnight.

Technical Architecture

1) Notion Monitoring & Job Queue:

  • Nightly cron job (or on‑demand trigger)
  • Queries Notion API for pages with YouTube links and empty transcript
  • Builds a processing queue (URL, duration, page ID)
  • Handles rate limits and batches

2) Video Download Pipeline:

  • yt‑dlp for reliable YouTube downloads
  • Audio‑only to reduce size and speed up processing
  • Validates download before proceeding
  • Temporary storage with automatic cleanup

3) Smart Audio Chunking:

  • Duration check: <30 min → whole; >30 min → split
  • 10‑minute splits via ffmpeg
  • Preserves quality to avoid Whisper degradation
  • Avoids Whisper file size/time limits

4) Whisper Transcription:

  • Each chunk sent to OpenAI Whisper API
  • Automatic language detection (EN, RU, ES, etc.)
  • Returns timestamped text with punctuation
  • Parallel chunk processing for speed

5) Transcript Assembly & Structuring:

  • Merge chunks into a single transcript
  • Claude analyzes full text for logical structure
  • Detects topic shifts, inserts paragraphs and section headings
  • Removes filler words for readability
  • Preserves meaning while improving flow

6) Notion Integration:

  • Updates the original page with the formatted transcript
  • Marks the page as processed to prevent duplicates
  • Adds metadata: duration, word count, processing date
  • Keeps the video link at the top

Key Features

  • Fully Automated: save a link → wake up to a transcript. Zero manual steps
  • Batch Processing: queue 10 videos overnight
  • High Accuracy: Whisper 95%+ even on technical content
  • Readable Formatting: structured into logical paragraphs, not a text wall
  • Searchable: full‑text search across all transcripts in Notion
  • Highlight & Annotate: work like it's an article
  • Knowledge Base Integration: extract key points into permanent notes
  • 5× Faster Learning: 1‑hour video → 10–12 minutes of reading
  • Long‑Form Ready: processes 3‑hour lectures reliably

Real Numbers

Performance:

  • 1‑hour video → 10–15 minutes of reading (5× faster)
  • Runs overnight while I sleep
  • 95%+ transcription accuracy (Whisper)
  • ~$0.10–0.30 per hour of video

What Actually Changed:

  • Before: 100+ "watch later" videos I never watched
  • After: save link → read next morning → actually extract knowledge
  • Search across all video content in Notion
  • No more "I saw this in a video but can't find it"
  • Reading enables highlights and note‑taking — impossible in video

Value & Scale

Solved for: 1 person (me) with 100+ unwatched educational videos

Potential market: 2B YouTube users; millions save to "watch later" and never watch. Tech, research, education drowning in video

Time saved: 1‑hour video → 12‑minute read = 80% savings. At 5 videos/week: 4 hours saved weekly, 208 hours/year

Cost: ~$0.20 per hour (Whisper API). Overnight processing. 99% success with retries

What Makes It Different

Runs while you sleep. Not just raw transcription — AI formats it like a real article with headings and paragraphs. Turns the "watch later" graveyard into a searchable knowledge base.

Skills Demonstrated

  • Video Processing & Audio Extraction (yt‑dlp, ffmpeg)
  • Speech‑to‑Text Integration (Whisper API)
  • NLP & Text Structuring (Claude)
  • Notion API Automation & Database Management
  • Batch Processing & Job Queue Design
  • Cron Scheduling & Server Automation
  • Error Handling & Retry Logic
  • Knowledge Management System Design

Tech Stack

Technologies: Python, yt‑dlp, ffmpeg, OpenAI Whisper, Claude 3.5 Sonnet, Notion API, Cron

Processing: YouTube URL → audio download → chunking → Whisper → assembly → AI structuring → Notion (15–30 minutes per hour of video)

Complexity: 7/10 (video processing, chunking logic, API orchestration, error handling)