SignalAI - A Content Filter That Learns YOUR Taste

The Problem

I followed 20+ AI/tech Telegram channels. I was spending 2+ hours a day scrolling through hype and promos to find 2–3 truly valuable posts. In 5 days I built an ML filter. I rate posts from ⭐ to ⭐⭐⭐⭐⭐ — the AI learns my taste. Now it cuts 80% of the noise. I read only the signal in 15 minutes.

Before / After

Before: 20+ channels, 100+ posts/day → 2+ hours of scrolling → find 2–3 good posts → exhausted, miss important stuff

After: The bot aggregates everything → for 2 weeks I rate ⭐–⭐⭐⭐⭐⭐ → the AI learned: "likes RAG systems, dislikes LLM benchmarks" → now it shows 15–20 posts/day (all relevant) → 15 minutes of reading

Impact: 85% accuracy after 6 weeks. I discovered I skip benchmark posts but read every RAG article — a pattern I wasn't consciously aware of. 88% time saved (2h → 15 min).

How It Works

Step 1: The bot monitors your 20+ channels and sends all posts into a single Telegram feed.

Step 2: You rate each post on a ⭐–⭐⭐⭐⭐⭐ scale. After ~100 ratings (2 weeks), the ML model sees patterns. It's not just keywords — it understands meaning. "This person prefers technical RAG implementation posts, not RAG product announcements."

For example, for the "Sanchal" channel: ⭐⭐⭐⭐⭐ — deep RAG implementation breakdowns; ⭐⭐⭐ — overview posts; ⭐–⭐⭐ — promo drops and hype roundups.

Step 3: The bot starts filtering. It shows only posts similar to those you rated ⭐⭐⭐⭐–⭐⭐⭐⭐⭐. It hides the noise. It gets smarter with every rating. You can toggle "show hidden" to audit what was filtered.

Result: 100+ daily posts → 15–20 curated ones. All signal, zero noise. Saves 88% of your time.

Technical Architecture

1. Content Ingestion Pipeline:

Telegram bot monitors subscribed channels (via Telegram API)
Captures text, media, metadata (source, timestamp, author)
Forwards all content to a personal aggregation bot
Deduplication: detects cross-posts across channels

2. Storage and Embeddings:

PostgreSQL: stores messages, ratings, metadata, source info
Qdrant Vector DB: stores embeddings for semantic similarity
Each message embedded with OpenAI Embeddings
Enables semantic search to find content similar to highly-rated messages

3. Ratings and Feedback Loop:

User receives a message → rates ⭐–⭐⭐⭐⭐⭐ via inline buttons
Rating stored with timestamp, confidence, and context
System tracks which topics you rate higher, which sources you trust, and which content formats you prefer
Feedback loop updates the model in real time

4. Machine Learning Model:

Hybrid: vector similarity + metadata features
Vector component: semantic similarity to previously high-rated content (Qdrant nearest neighbors)
Metadata features: source credibility, topic categories, message length, link density
Binary classification: "Will the user give ⭐⭐⭐⭐–⭐⭐⭐⭐⭐?"
Confidence threshold: show only messages with >70% predicted interest

5. Adaptive Delivery:

Cold start (first 2 weeks): show more content, collect training data
Learning phase (weeks 3–6): start filtering conservatively (include borderline cases)
Optimized phase (6+ weeks): aggressive filtering, high-confidence only
User can request "show suppressed" to audit filtering decisions

6. Continuous Refinement:

Nightly retraining on new ratings
Tracks accuracy: predicted interest vs. actual ratings
Adjusts confidence thresholds based on performance
Detects interest drift and adapts as preferences change

What Makes It Special

It learns nuances, not just topics. It doesn't filter "all AI content" — it filters "AI product announcements," while keeping "technical RAG implementation posts." It discovers patterns in your taste you didn't know you had.

Real Numbers

Performance:

Filters 75–85% of noise (from 100+ daily messages down to ~15–20 relevant ones)
Time saved: 15 minutes of focused reading vs. 2+ hours of scrolling
Model accuracy: 85%+ after 6 weeks of training
Useful filtering begins after ~100 ratings (2 weeks)

What Actually Changed:

Before: 2+ hours of scrolling for the "signal"
After: 15 minutes of curated, relevant reading
Stay informed without FOMO or overload
The system surfaced my hidden preference pattern (e.g., I skip LLM benchmarks but read every RAG systems post)

Value and Scale

Solved for: 1 person (me) drowning in AI/tech content

Potential market: anyone following 10+ channels/newsletters (millions of engineers, researchers, investors)

Time saved: 2 hours → 15 minutes daily = 88% savings. 11 hours/week. 572 hours/year.

Key insight: the system uncovered my unconscious preferences. I didn't realize I skip ALL benchmark posts but read EVERY RAG implementation article. The filter learned this automatically.

Skills Demonstrated

ML for content recommendation
Vector embeddings and semantic search (Qdrant)
Telegram bot development and channel parsing
Hybrid ML architecture (vector + metadata features)
Feedback loop design and online learning
Cold-start problem solving
Personal knowledge management systems (PKM)
Production ML and continuous retraining

Tech Stack

Technologies: Python, aiogram (Telegram), OpenAI Embeddings, Qdrant, PostgreSQL, scikit-learn

Data: 20+ Telegram channels monitored, 1000+ ratings collected, 6+ months of learning

Complexity: 8/10 (model training, online learning, feedback loops, content parsing)