🔍 Design Type-Ahead Search — System Design Interview Guide

Medium · Search & Autocomplete

Design a type-ahead (autocomplete) search system like Google's search bar that suggests completions as users type, ranked by relevance and popularity, with < 100ms response time.

Open the interactive Type-Ahead Search design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Return top-10 query completions for any typed prefix
Suggestions ranked by search frequency (most searched first)
Support for personalized suggestions (recent searches)
Handle 100+ languages and special characters (Unicode)
New trending queries appear in suggestions within 1 hour

Non-functional requirements & scale

10M queries per second at peak (Google scale)
Latency < 100ms P99 (user notices delays > 100ms)
5B unique search queries in the suggestion corpus
Suggestion data updated in near real-time for trending topics
System should gracefully handle partial outages

Capacity estimation

10M QPS means data must be served from memory. Trie can hold all queries but is memory-intensive (5B entries). Partition by prefix first character (26 for English). Cache top completions per prefix node. Pre-compute top-K completions at each trie node offline.

Core entities

SearchQuery — queryText, frequency, lastSearched, trending (bool)
TrieNode — prefix, topKCompletions[], children{char→nodeId}
UserSearch — userId, queryText, timestamp (for personalization)

API design

GET /api/v1/suggest?q=appl&limit=10 — Returns top completions for prefix. Served from Redis/in-memory trie.
POST /api/v1/searches — Log a completed search query. Used for frequency aggregation.

High-level design

Suggestion Service holds pre-computed trie in memory. Each node stores top-10 completions by frequency. Client keystrokes hit CDN-cached suggestions (single chars), then Suggestion Service for longer prefixes. Trie rebuilt from Kafka aggregation pipeline every hour.

Deep dives

🌳 Trie Data Structure

Trie (prefix tree): each node = one character. Path from root to node = prefix. Each node stores top-K completions (not all — pruned). Memory: 5B queries × avg 10 chars × 4 bytes = 200GB. Too large for one server. Partition by first 2 chars (676 partitions). Each partition fits in one server's RAM. Alternatively: store trie in Redis as sorted sets (prefix → ZSET of completions by score).

📊 Frequency Counting

Every search query published to Kafka. Flink job: time-windowed count (last 7 days sliding window). Output: query → frequency map. Update trie node top-K if query enters top-10 for a prefix. Trending: short window (1h) with higher weight for recency. Challenge: count distinct queries across distributed Kafka partitions — use count-min sketch for approximate counting.

⚡ 10M QPS Strategy

CDN caches responses for short prefixes (1-3 chars) — same completions for everyone. Cache hit rate ~70%. For longer prefixes: Suggestion Service with in-memory trie (nanosecond lookup). No DB calls in hot path. Rate limiting at gateway. Client-side debounce: wait 100ms of no typing before sending request. Client caches recent responses locally.

🌍 Personalization

User's recent searches stored in Redis (LPUSH, LTRIM to 50). On suggestion request: fetch user's recent queries matching prefix. Merge with global top-K: weight personal 60% + global 40%. Only for logged-in users. Privacy: recent searches never sent to Kafka (stays client-side or encrypted per-user shard).

Scaling considerations

Partition trie by prefix hash — each Suggestion Server owns a key range
Consistent hashing for request routing to correct trie partition
CDN responses for top-1000 prefixes cover 70% of traffic
Flink sliding window aggregation rebuilds trie segments every 15 minutes
Graceful degradation: if trie service down, return empty suggestions (not error)

What interviewers expect by level

Junior: Describe trie data structure and prefix lookup. Know why DB is too slow for < 100ms.
Mid: Trie partitioning, frequency aggregation with Kafka, CDN for short prefixes, debounce on client.
Senior: Count-min sketch, trie update pipeline, personalization merge, sharding strategy for 10M QPS.
Staff: Multi-language support (CJK trie vs Latin trie), global rollout with regional trie builds, cost optimization.

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…