🎯 Design Recommendation System — System Design Interview Guide
Hard · AI & ML Systems
Design a large-scale recommendation system (feed/products/video) that selects, from millions of items, the handful a user is most likely to engage with — in tens of milliseconds.
Open the interactive Recommendation System design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.
Functional requirements
- Personalized ranking of items per user request
- Two-stage: candidate generation → heavy ranking
- Incorporate real-time signals (recent clicks) and long-term profile
- Business rules: dedup, diversity, freshness, blocked items
- Exploration of new items (avoid feedback loops)
Non-functional requirements & scale
- 100M users, 10M+ items, 50K recommendation requests/sec
- End-to-end ranking latency p99 < 100ms
- Model + features refreshed continuously; no stale feedback loops
- Online metrics (CTR, watch time) tied to offline training
- Consistent features between training and serving (no skew)
Capacity estimation
You cannot score 10M items per request in 100ms, so recommendation is a funnel: cheap candidate generation narrows millions → ~hundreds, then an expensive ML ranker scores those. Embeddings + ANN power retrieval; a feature store feeds the ranker. The hard problems are latency, train/serve feature consistency, and avoiding feedback loops.
Core entities
- User — userId, profileEmbedding, recentEvents[], demographics
- Item — itemId, itemEmbedding, metadata, freshness, popularity
- Interaction — userId, itemId, type (click/like/watch), timestamp, context
- Feature — entityId, featureName, value, version (for the feature store)
API design
GET /api/v1/recommendations— Params: { userId, context, count }. Returns a ranked item list.POST /api/v1/events— Log an interaction (click/like/watch) for training + real-time features.POST /api/v1/items— Register/update an item; triggers embedding + indexing.
High-level design
On request, the Rec Service fetches user features from the feature store, runs candidate generation (ANN over item embeddings + popular/recent sources), then scores the ~hundreds of candidates with a ranking model, applies business rules (diversity/dedup/freshness), and returns the top-N. Interaction events stream into a real-time feature pipeline and an offline training pipeline that periodically ships new embeddings and ranker models.
Deep dives
🪜 Two-Stage Funnel
Stage 1 (candidate generation / retrieval): cheaply reduce 10M items → ~500 using multiple sources — ANN over user×item embeddings (two-tower model), trending, recently-viewed, follow graph. Optimize for recall, not precision. Stage 2 (ranking): a heavier model scores those ~500 with rich features (user, item, context, cross features) for click/watch probability. This split is what makes sub-100ms over millions of items possible.
🗄️ Feature Store & Train/Serve Skew
The #1 silent killer of rec quality is computing a feature one way in training (batch) and another at serving (online). A feature store provides the same definitions to both: an offline store for training datasets and a low-latency online store (Redis) for serving. Point-in-time-correct joins prevent label leakage; freshness of real-time features (last-5-min clicks) drives responsiveness.
🔀 Two-Tower Retrieval
Train a user tower and an item tower to embed both into one space so relevance ≈ dot product. Precompute all item embeddings into an ANN index offline; at request time embed the user once and do an ANN lookup. This scales retrieval to millions of items in milliseconds and is the workhorse of modern candidate generation.
🎲 Exploration vs Exploitation
Always serving the top predicted items creates feedback loops: the model never learns about items it never shows. Inject exploration (epsilon-greedy, Thompson sampling, or a bandit) and log propensity scores so you can debias training. Add diversity/dedup rules and freshness boosts so the feed is not repetitive or stale.
📈 Online/Offline Evaluation
Offline metrics (AUC, NDCG, recall@K) guide iteration but do not always move business metrics. Validate with online A/B tests on CTR, watch time, and retention. Counterfactual/off-policy evaluation using logged propensities lets you estimate a new policy before shipping. Guard against metric gaming (clickbait) with long-term objectives.
Scaling considerations
- Precompute item embeddings + ANN index offline; refresh as items/models change
- Online feature store (Redis) with tight p99; degrade gracefully to cached/popular on miss
- Rank only a few hundred candidates per request to hold the latency budget
- Kafka events fan out to both real-time feature updates and offline training
- Shadow/A-B new models; roll out via traffic splitting with guardrail metrics
What interviewers expect by level
- Junior: Explain candidate generation vs ranking and why we can't score every item.
- Mid: Design the two-stage funnel, embeddings + ANN retrieval, a feature store, event logging.
- Senior: Two-tower retrieval, train/serve skew prevention, real-time features, exploration, A/B evaluation.
- Staff: Off-policy evaluation, feedback-loop mitigation, multi-objective ranking, end-to-end ML platform + model governance.
Practice more system design case studies
- Design URL Shortener
- Design Social Media Feed
- Design Chat System
- Design Video Streaming
- Design Ride-Sharing Platform
- Design E-Commerce Platform
- Design UPI Payment Gateway
- Design Google Docs
- Design Tinder
- Design Google Drive / Dropbox
- Design Instagram
- Design Type-Ahead Search
- Design Web Crawler
- Design Ticket Booking (BookMyShow)
- Design Pastebin
- Design Notification System
- Design Rate Limiter (Standalone)
- Design Simple Web App
- Design Food Delivery (Swiggy)
- Design Stock Trading System
- Design Live Streaming (Twitch)
- Design Distributed Key-Value Store
- Design Ad Click Aggregation
- Design Monitoring / Metrics (Datadog)
- Design Online Judge (LeetCode)
- Design FB Post Search
- Design Yelp
- Design Cache Layer
- Design Message Queue
- Design Full Production Stack
- Design AI Chatbot
- Design Semantic Search
- Design RAG System
- Design LLM Serving Platform
PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…