The Problem
My "watch later" list had 100+ videos — I never got to them. I built an overnight automation: whenever I see a video worth watching, I drop its YouTube link into Notion → while I sleep, AI transcribes and formats it → in the morning, I have a clean, readable article. I read a 1‑hour video in 12 minutes. Full‑text search, highlights, notes — finally extracting knowledge from saved videos.
🏫 A note on fairness: When I save knowledge from a video, I keep the video playing (tab muted) so the author still gets a view. Feels fair.
Before / After
Before: Save a 60‑minute conference talk to "watch later" → never watch → it sits there forever → knowledge locked in video
After: Share the link to my Telegram bot → it lands in Notion → I go to sleep → at 7:00 a.m. a 12‑minute readable transcript is waiting → I highlight insights → search across all saved videos → actually learn
Impact: 5× time savings (60 min → 12 min). 95% transcription accuracy. Instant search for "RAG" across 50 videos. Handles 3‑hour lectures without issues. ~$0.20 per hour of video.
How It Works
Step 1: Any time I find a video, I share the link to my Telegram bot; it goes to the Notion database "Videos to Process."
Step 2: At 2:00 a.m. the automation runs. Downloads audio. AI transcribes it (long lectures are chunked). Another AI removes fillers ("um," "like"), adds paragraph breaks and section headings.
Step 3: The formatted transcript appears on the same Notion page. In the morning it's ready.
Result: A 1‑hour video becomes a 12‑minute read. I can search all transcripts, highlight, and annotate like an article. Queue 10 videos — everything gets processed overnight.
Technical Architecture
1) Notion Monitoring & Job Queue:
- Nightly cron job (or on‑demand trigger)
- Queries Notion API for pages with YouTube links and empty transcript
- Builds a processing queue (URL, duration, page ID)
- Handles rate limits and batches
2) Video Download Pipeline:
- yt‑dlp for reliable YouTube downloads
- Audio‑only to reduce size and speed up processing
- Validates download before proceeding
- Temporary storage with automatic cleanup
3) Smart Audio Chunking:
- Duration check: <30 min → whole; >30 min → split
- 10‑minute splits via ffmpeg
- Preserves quality to avoid Whisper degradation
- Avoids Whisper file size/time limits
4) Whisper Transcription:
- Each chunk sent to OpenAI Whisper API
- Automatic language detection (EN, RU, ES, etc.)
- Returns timestamped text with punctuation
- Parallel chunk processing for speed
5) Transcript Assembly & Structuring:
- Merge chunks into a single transcript
- Claude analyzes full text for logical structure
- Detects topic shifts, inserts paragraphs and section headings
- Removes filler words for readability
- Preserves meaning while improving flow
6) Notion Integration:
- Updates the original page with the formatted transcript
- Marks the page as processed to prevent duplicates
- Adds metadata: duration, word count, processing date
- Keeps the video link at the top
Key Features
- Fully Automated: save a link → wake up to a transcript. Zero manual steps
- Batch Processing: queue 10 videos overnight
- High Accuracy: Whisper 95%+ even on technical content
- Readable Formatting: structured into logical paragraphs, not a text wall
- Searchable: full‑text search across all transcripts in Notion
- Highlight & Annotate: work like it's an article
- Knowledge Base Integration: extract key points into permanent notes
- 5× Faster Learning: 1‑hour video → 10–12 minutes of reading
- Long‑Form Ready: processes 3‑hour lectures reliably
Real Numbers
Performance:
- 1‑hour video → 10–15 minutes of reading (5× faster)
- Runs overnight while I sleep
- 95%+ transcription accuracy (Whisper)
- ~$0.10–0.30 per hour of video
What Actually Changed:
- Before: 100+ "watch later" videos I never watched
- After: save link → read next morning → actually extract knowledge
- Search across all video content in Notion
- No more "I saw this in a video but can't find it"
- Reading enables highlights and note‑taking — impossible in video
Value & Scale
Solved for: 1 person (me) with 100+ unwatched educational videos
Potential market: 2B YouTube users; millions save to "watch later" and never watch. Tech, research, education drowning in video
Time saved: 1‑hour video → 12‑minute read = 80% savings. At 5 videos/week: 4 hours saved weekly, 208 hours/year
Cost: ~$0.20 per hour (Whisper API). Overnight processing. 99% success with retries
What Makes It Different
Runs while you sleep. Not just raw transcription — AI formats it like a real article with headings and paragraphs. Turns the "watch later" graveyard into a searchable knowledge base.
Skills Demonstrated
- Video Processing & Audio Extraction (yt‑dlp, ffmpeg)
- Speech‑to‑Text Integration (Whisper API)
- NLP & Text Structuring (Claude)
- Notion API Automation & Database Management
- Batch Processing & Job Queue Design
- Cron Scheduling & Server Automation
- Error Handling & Retry Logic
- Knowledge Management System Design
Tech Stack
Technologies: Python, yt‑dlp, ffmpeg, OpenAI Whisper, Claude 3.5 Sonnet, Notion API, Cron
Processing: YouTube URL → audio download → chunking → Whisper → assembly → AI structuring → Notion (15–30 minutes per hour of video)
Complexity: 7/10 (video processing, chunking logic, API orchestration, error handling)