The Problem
I followed 20+ AI/tech Telegram channels. I was spending 2+ hours a day scrolling through hype and promos to find 2–3 truly valuable posts. In 5 days I built an ML filter. I rate posts from ⭐ to ⭐⭐⭐⭐⭐ — the AI learns my taste. Now it cuts 80% of the noise. I read only the signal in 15 minutes.
Before / After
Before: 20+ channels, 100+ posts/day → 2+ hours of scrolling → find 2–3 good posts → exhausted, miss important stuff
After: The bot aggregates everything → for 2 weeks I rate ⭐–⭐⭐⭐⭐⭐ → the AI learned: "likes RAG systems, dislikes LLM benchmarks" → now it shows 15–20 posts/day (all relevant) → 15 minutes of reading
Impact: 85% accuracy after 6 weeks. I discovered I skip benchmark posts but read every RAG article — a pattern I wasn't consciously aware of. 88% time saved (2h → 15 min).
How It Works
Step 1: The bot monitors your 20+ channels and sends all posts into a single Telegram feed.
Step 2: You rate each post on a ⭐–⭐⭐⭐⭐⭐ scale. After ~100 ratings (2 weeks), the ML model sees patterns. It's not just keywords — it understands meaning. "This person prefers technical RAG implementation posts, not RAG product announcements."
For example, for the "Sanchal" channel: ⭐⭐⭐⭐⭐ — deep RAG implementation breakdowns; ⭐⭐⭐ — overview posts; ⭐–⭐⭐ — promo drops and hype roundups.
Step 3: The bot starts filtering. It shows only posts similar to those you rated ⭐⭐⭐⭐–⭐⭐⭐⭐⭐. It hides the noise. It gets smarter with every rating. You can toggle "show hidden" to audit what was filtered.
Result: 100+ daily posts → 15–20 curated ones. All signal, zero noise. Saves 88% of your time.
Technical Architecture
1. Content Ingestion Pipeline:
- Telegram bot monitors subscribed channels (via Telegram API)
- Captures text, media, metadata (source, timestamp, author)
- Forwards all content to a personal aggregation bot
- Deduplication: detects cross-posts across channels
2. Storage and Embeddings:
- PostgreSQL: stores messages, ratings, metadata, source info
- Qdrant Vector DB: stores embeddings for semantic similarity
- Each message embedded with OpenAI Embeddings
- Enables semantic search to find content similar to highly-rated messages
3. Ratings and Feedback Loop:
- User receives a message → rates ⭐–⭐⭐⭐⭐⭐ via inline buttons
- Rating stored with timestamp, confidence, and context
- System tracks which topics you rate higher, which sources you trust, and which content formats you prefer
- Feedback loop updates the model in real time
4. Machine Learning Model:
- Hybrid: vector similarity + metadata features
- Vector component: semantic similarity to previously high-rated content (Qdrant nearest neighbors)
- Metadata features: source credibility, topic categories, message length, link density
- Binary classification: "Will the user give ⭐⭐⭐⭐–⭐⭐⭐⭐⭐?"
- Confidence threshold: show only messages with >70% predicted interest
5. Adaptive Delivery:
- Cold start (first 2 weeks): show more content, collect training data
- Learning phase (weeks 3–6): start filtering conservatively (include borderline cases)
- Optimized phase (6+ weeks): aggressive filtering, high-confidence only
- User can request "show suppressed" to audit filtering decisions
6. Continuous Refinement:
- Nightly retraining on new ratings
- Tracks accuracy: predicted interest vs. actual ratings
- Adjusts confidence thresholds based on performance
- Detects interest drift and adapts as preferences change
What Makes It Special
It learns nuances, not just topics. It doesn't filter "all AI content" — it filters "AI product announcements," while keeping "technical RAG implementation posts." It discovers patterns in your taste you didn't know you had.
Real Numbers
Performance:
- Filters 75–85% of noise (from 100+ daily messages down to ~15–20 relevant ones)
- Time saved: 15 minutes of focused reading vs. 2+ hours of scrolling
- Model accuracy: 85%+ after 6 weeks of training
- Useful filtering begins after ~100 ratings (2 weeks)
What Actually Changed:
- Before: 2+ hours of scrolling for the "signal"
- After: 15 minutes of curated, relevant reading
- Stay informed without FOMO or overload
- The system surfaced my hidden preference pattern (e.g., I skip LLM benchmarks but read every RAG systems post)
Value and Scale
Solved for: 1 person (me) drowning in AI/tech content
Potential market: anyone following 10+ channels/newsletters (millions of engineers, researchers, investors)
Time saved: 2 hours → 15 minutes daily = 88% savings. 11 hours/week. 572 hours/year.
Key insight: the system uncovered my unconscious preferences. I didn't realize I skip ALL benchmark posts but read EVERY RAG implementation article. The filter learned this automatically.
Skills Demonstrated
- ML for content recommendation
- Vector embeddings and semantic search (Qdrant)
- Telegram bot development and channel parsing
- Hybrid ML architecture (vector + metadata features)
- Feedback loop design and online learning
- Cold-start problem solving
- Personal knowledge management systems (PKM)
- Production ML and continuous retraining
Tech Stack
Technologies: Python, aiogram (Telegram), OpenAI Embeddings, Qdrant, PostgreSQL, scikit-learn
Data: 20+ Telegram channels monitored, 1000+ ratings collected, 6+ months of learning
Complexity: 8/10 (model training, online learning, feedback loops, content parsing)