🔍 Design Type-Ahead Search — System Design Interview Guide
Medium · Search & Autocomplete
Design a type-ahead (autocomplete) search system like Google's search bar that suggests completions as users type, ranked by relevance and popularity, with < 100ms response time.
Open the interactive Type-Ahead Search design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.
Functional requirements
- Return top-10 query completions for any typed prefix
- Suggestions ranked by search frequency (most searched first)
- Support for personalized suggestions (recent searches)
- Handle 100+ languages and special characters (Unicode)
- New trending queries appear in suggestions within 1 hour
Non-functional requirements & scale
- 10M queries per second at peak (Google scale)
- Latency < 100ms P99 (user notices delays > 100ms)
- 5B unique search queries in the suggestion corpus
- Suggestion data updated in near real-time for trending topics
- System should gracefully handle partial outages
Capacity estimation
10M QPS means data must be served from memory. Trie can hold all queries but is memory-intensive (5B entries). Partition by prefix first character (26 for English). Cache top completions per prefix node. Pre-compute top-K completions at each trie node offline.
Core entities
- SearchQuery — queryText, frequency, lastSearched, trending (bool)
- TrieNode — prefix, topKCompletions[], children{char→nodeId}
- UserSearch — userId, queryText, timestamp (for personalization)
API design
GET /api/v1/suggest?q=appl&limit=10— Returns top completions for prefix. Served from Redis/in-memory trie.POST /api/v1/searches— Log a completed search query. Used for frequency aggregation.
High-level design
Suggestion Service holds pre-computed trie in memory. Each node stores top-10 completions by frequency. Client keystrokes hit CDN-cached suggestions (single chars), then Suggestion Service for longer prefixes. Trie rebuilt from Kafka aggregation pipeline every hour.
Deep dives
🌳 Trie Data Structure
Trie (prefix tree): each node = one character. Path from root to node = prefix. Each node stores top-K completions (not all — pruned). Memory: 5B queries × avg 10 chars × 4 bytes = 200GB. Too large for one server. Partition by first 2 chars (676 partitions). Each partition fits in one server's RAM. Alternatively: store trie in Redis as sorted sets (prefix → ZSET of completions by score).
📊 Frequency Counting
Every search query published to Kafka. Flink job: time-windowed count (last 7 days sliding window). Output: query → frequency map. Update trie node top-K if query enters top-10 for a prefix. Trending: short window (1h) with higher weight for recency. Challenge: count distinct queries across distributed Kafka partitions — use count-min sketch for approximate counting.
⚡ 10M QPS Strategy
CDN caches responses for short prefixes (1-3 chars) — same completions for everyone. Cache hit rate ~70%. For longer prefixes: Suggestion Service with in-memory trie (nanosecond lookup). No DB calls in hot path. Rate limiting at gateway. Client-side debounce: wait 100ms of no typing before sending request. Client caches recent responses locally.
🌍 Personalization
User's recent searches stored in Redis (LPUSH, LTRIM to 50). On suggestion request: fetch user's recent queries matching prefix. Merge with global top-K: weight personal 60% + global 40%. Only for logged-in users. Privacy: recent searches never sent to Kafka (stays client-side or encrypted per-user shard).
Scaling considerations
- Partition trie by prefix hash — each Suggestion Server owns a key range
- Consistent hashing for request routing to correct trie partition
- CDN responses for top-1000 prefixes cover 70% of traffic
- Flink sliding window aggregation rebuilds trie segments every 15 minutes
- Graceful degradation: if trie service down, return empty suggestions (not error)
What interviewers expect by level
- Junior: Describe trie data structure and prefix lookup. Know why DB is too slow for < 100ms.
- Mid: Trie partitioning, frequency aggregation with Kafka, CDN for short prefixes, debounce on client.
- Senior: Count-min sketch, trie update pipeline, personalization merge, sharding strategy for 10M QPS.
- Staff: Multi-language support (CJK trie vs Latin trie), global rollout with regional trie builds, cost optimization.
Practice more system design case studies
- Design URL Shortener
- Design Social Media Feed
- Design Chat System
- Design Video Streaming
- Design Ride-Sharing Platform
- Design E-Commerce Platform
- Design UPI Payment Gateway
- Design Google Docs
- Design Tinder
- Design Google Drive / Dropbox
- Design Instagram
- Design Web Crawler
- Design Ticket Booking (BookMyShow)
- Design Pastebin
- Design Notification System
- Design Rate Limiter (Standalone)
- Design Simple Web App
- Design Food Delivery (Swiggy)
- Design Stock Trading System
- Design Live Streaming (Twitch)
- Design Distributed Key-Value Store
- Design Ad Click Aggregation
- Design Monitoring / Metrics (Datadog)
- Design Online Judge (LeetCode)
- Design FB Post Search
- Design Yelp
- Design Cache Layer
- Design Message Queue
- Design Full Production Stack
PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…