🔍 Design Type-Ahead Search — System Design Interview Guide

Medium · Search & Autocomplete

Design a type-ahead (autocomplete) search system like Google's search bar that suggests completions as users type, ranked by relevance and popularity, with < 100ms response time.

Open the interactive Type-Ahead Search design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Non-functional requirements & scale

Capacity estimation

10M QPS means data must be served from memory. Trie can hold all queries but is memory-intensive (5B entries). Partition by prefix first character (26 for English). Cache top completions per prefix node. Pre-compute top-K completions at each trie node offline.

Core entities

API design

High-level design

Suggestion Service holds pre-computed trie in memory. Each node stores top-10 completions by frequency. Client keystrokes hit CDN-cached suggestions (single chars), then Suggestion Service for longer prefixes. Trie rebuilt from Kafka aggregation pipeline every hour.

Deep dives

🌳 Trie Data Structure

Trie (prefix tree): each node = one character. Path from root to node = prefix. Each node stores top-K completions (not all — pruned). Memory: 5B queries × avg 10 chars × 4 bytes = 200GB. Too large for one server. Partition by first 2 chars (676 partitions). Each partition fits in one server's RAM. Alternatively: store trie in Redis as sorted sets (prefix → ZSET of completions by score).

📊 Frequency Counting

Every search query published to Kafka. Flink job: time-windowed count (last 7 days sliding window). Output: query → frequency map. Update trie node top-K if query enters top-10 for a prefix. Trending: short window (1h) with higher weight for recency. Challenge: count distinct queries across distributed Kafka partitions — use count-min sketch for approximate counting.

⚡ 10M QPS Strategy

CDN caches responses for short prefixes (1-3 chars) — same completions for everyone. Cache hit rate ~70%. For longer prefixes: Suggestion Service with in-memory trie (nanosecond lookup). No DB calls in hot path. Rate limiting at gateway. Client-side debounce: wait 100ms of no typing before sending request. Client caches recent responses locally.

🌍 Personalization

User's recent searches stored in Redis (LPUSH, LTRIM to 50). On suggestion request: fetch user's recent queries matching prefix. Merge with global top-K: weight personal 60% + global 40%. Only for logged-in users. Privacy: recent searches never sent to Kafka (stays client-side or encrypted per-user shard).

Scaling considerations

What interviewers expect by level

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…