30 minutes → 3 seconds • Medical records you can actually use
A friend spent 10+ years visiting doctors without a diagnosis. Each specialist saw only their fragment — missing connections between symptoms across clinics and years. Built HealthRAG in 12 days. Processed 870 documents. Now she answers any doctor’s question about her history in 3 seconds instead of searching through papers for 30 minutes.
Before/After
Before: Doctor asks “What was your hemoglobin six months ago?” → 30 minutes searching papers, trying to remember which clinic
After: Ask Telegram bot → 3 seconds, complete answer with dates and trend
Impact: 870 documents from 10+ years, instantly searchable. Trends visible that were impossible to see before (e.g., “vitamin D dropping for 2 years”).
How It Works
Step 1: Send photo of medical document to Telegram bot
Step 2: AI reads it (including handwriting), extracts test results, dates, diagnoses. When it encounters unknown abbreviations, asks you once, then remembers forever.
Step 3: Data organized into two systems: structured database for trends (“show cholesterol last 2 years”) + search engine for context (“find all mentions of thyroid”)
Result: Ask any question about your medical history, get answer in 3 seconds with specific numbers and dates.
Technical Architecture
Data Pipeline: Medical Document → Google Document AI (OCR) → Claude AI (Structuring + Normalization) → JSON Schema Validation → Parallel Write to BigQuery (structured data) + Qdrant (embeddings) → Conversational AI Layer
Key Technical Decisions:
- Google Document AI: Superior OCR for medical documents with complex layouts (~$19 for all 870 documents)
- Claude 3.5 Sonnet: Best-in-class for medical entity extraction and normalization
- BigQuery: Powerful analytics engine for health trends, cost-effective at scale
- Qdrant: Self-hosted vector database for privacy-sensitive medical data
- Hybrid Storage: Structured data in BigQuery (fast queries), full context in Qdrant (semantic search)
Real Numbers
- 95% OCR accuracy, 90% extraction accuracy
- 15-30 seconds per document (OCR + structuring + storage)
- ~$0.05-0.15 per document processed
- < 2 seconds to answer questions
- 870 documents processed, 10+ years of history
- ~21,000 lines of code across 35 modules. Built in 12 days.