SignalAI

2 Hours → 15 Minutes • A Content Filter That Learns YOUR Taste

The Problem

I followed 20+ AI/tech Telegram channels. I was spending 2+ hours a day scrolling through hype and promos to find 2–3 truly valuable posts. In 5 days I built an ML filter. I rate posts from ⭐ to ⭐⭐⭐⭐⭐ — the AI learns my taste. Now it cuts 80% of the noise. I read only the signal in 15 minutes.

Before / After

Before: 20+ channels, 100+ posts/day → 2+ hours of scrolling → find 2–3 good posts → exhausted, miss important stuff

After: The bot aggregates everything → for 2 weeks I rate ⭐–⭐⭐⭐⭐⭐ → the AI learned: "likes RAG systems, dislikes LLM benchmarks" → now it shows 15–20 posts/day (all relevant) → 15 minutes of reading

Impact: 85% accuracy after 6 weeks. I discovered I skip benchmark posts but read every RAG article — a pattern I wasn't consciously aware of. 88% time saved (2h → 15 min).

How It Works

Step 1: The bot monitors your 20+ channels and sends all posts into a single Telegram feed.

Step 2: You rate each post on a ⭐–⭐⭐⭐⭐⭐ scale. After ~100 ratings (2 weeks), the ML model sees patterns. It's not just keywords — it understands meaning. "This person prefers technical RAG implementation posts, not RAG product announcements."

For example, for the "Sanchal" channel: ⭐⭐⭐⭐⭐ — deep RAG implementation breakdowns; ⭐⭐⭐ — overview posts; ⭐–⭐⭐ — promo drops and hype roundups.

Step 3: The bot starts filtering. It shows only posts similar to those you rated ⭐⭐⭐⭐–⭐⭐⭐⭐⭐. It hides the noise. It gets smarter with every rating. You can toggle "show hidden" to audit what was filtered.

Result: 100+ daily posts → 15–20 curated ones. All signal, zero noise. Saves 88% of your time.

Technical Architecture

1. Content Ingestion Pipeline:

  • Telegram bot monitors subscribed channels (via Telegram API)
  • Captures text, media, metadata (source, timestamp, author)
  • Forwards all content to a personal aggregation bot
  • Deduplication: detects cross-posts across channels

2. Storage and Embeddings:

  • PostgreSQL: stores messages, ratings, metadata, source info
  • Qdrant Vector DB: stores embeddings for semantic similarity
  • Each message embedded with OpenAI Embeddings
  • Enables semantic search to find content similar to highly-rated messages

3. Ratings and Feedback Loop:

  • User receives a message → rates ⭐–⭐⭐⭐⭐⭐ via inline buttons
  • Rating stored with timestamp, confidence, and context
  • System tracks which topics you rate higher, which sources you trust, and which content formats you prefer
  • Feedback loop updates the model in real time

4. Machine Learning Model:

  • Hybrid: vector similarity + metadata features
  • Vector component: semantic similarity to previously high-rated content (Qdrant nearest neighbors)
  • Metadata features: source credibility, topic categories, message length, link density
  • Binary classification: "Will the user give ⭐⭐⭐⭐–⭐⭐⭐⭐⭐?"
  • Confidence threshold: show only messages with >70% predicted interest

5. Adaptive Delivery:

  • Cold start (first 2 weeks): show more content, collect training data
  • Learning phase (weeks 3–6): start filtering conservatively (include borderline cases)
  • Optimized phase (6+ weeks): aggressive filtering, high-confidence only
  • User can request "show suppressed" to audit filtering decisions

6. Continuous Refinement:

  • Nightly retraining on new ratings
  • Tracks accuracy: predicted interest vs. actual ratings
  • Adjusts confidence thresholds based on performance
  • Detects interest drift and adapts as preferences change

What Makes It Special

It learns nuances, not just topics. It doesn't filter "all AI content" — it filters "AI product announcements," while keeping "technical RAG implementation posts." It discovers patterns in your taste you didn't know you had.

Real Numbers

Performance:

  • Filters 75–85% of noise (from 100+ daily messages down to ~15–20 relevant ones)
  • Time saved: 15 minutes of focused reading vs. 2+ hours of scrolling
  • Model accuracy: 85%+ after 6 weeks of training
  • Useful filtering begins after ~100 ratings (2 weeks)

What Actually Changed:

  • Before: 2+ hours of scrolling for the "signal"
  • After: 15 minutes of curated, relevant reading
  • Stay informed without FOMO or overload
  • The system surfaced my hidden preference pattern (e.g., I skip LLM benchmarks but read every RAG systems post)

Value and Scale

Solved for: 1 person (me) drowning in AI/tech content

Potential market: anyone following 10+ channels/newsletters (millions of engineers, researchers, investors)

Time saved: 2 hours → 15 minutes daily = 88% savings. 11 hours/week. 572 hours/year.

Key insight: the system uncovered my unconscious preferences. I didn't realize I skip ALL benchmark posts but read EVERY RAG implementation article. The filter learned this automatically.

Skills Demonstrated

  • ML for content recommendation
  • Vector embeddings and semantic search (Qdrant)
  • Telegram bot development and channel parsing
  • Hybrid ML architecture (vector + metadata features)
  • Feedback loop design and online learning
  • Cold-start problem solving
  • Personal knowledge management systems (PKM)
  • Production ML and continuous retraining

Tech Stack

Technologies: Python, aiogram (Telegram), OpenAI Embeddings, Qdrant, PostgreSQL, scikit-learn

Data: 20+ Telegram channels monitored, 1000+ ratings collected, 6+ months of learning

Complexity: 8/10 (model training, online learning, feedback loops, content parsing)