📸 Design Instagram — System Design Interview Guide

Medium · Social & Media

Design Instagram — a photo and video sharing platform where users post content, follow others, see a personalized feed, and interact via likes, comments, and stories.

Open the interactive Instagram design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Non-functional requirements & scale

Capacity estimation

100M uploads/day means aggressive CDN and storage strategy. Feed generation: Instagram uses ML-ranked feed (not chronological since 2016). Stories = TTL-based ephemeral posts. Photo processing: multiple sizes (thumbnail, standard, HD). Scale: 2B users × avg 5 posts viewed/day = 10B feed post loads/day.

Core entities

API design

High-level design

Photo upload → S3 → Lambda resize → CDN. Post → Kafka → Fanout service writes to follower feed caches. Feed read: Redis sorted set by ML score, hydrate post details from Cassandra + CDN photo URLs.

Deep dives

📱 Photo Processing Pipeline

Upload: client sends photo to Upload Service which stores in S3 and returns postId. Async: Lambda triggered by S3 event → resize to 4 variants (thumbnail 150px, small 320px, medium 640px, HD 1080px) → store back in S3 under fixed paths. CDN automatically serves from nearest edge on first access, then caches. p99 < 100ms for cached images.

⏰ Stories with TTL

Stories expire in 24h. Two approaches: (1) Scheduled job scans DB for expired stories — doesn't scale. (2) Redis TTL: store storyId in Redis with TTL = 86400s. On expiry, Redis key-expiry event (pub/sub) triggers cleanup job. (3) Store expiresAt in Cassandra; filter at read time (simple, no cleanup needed). Production: use approach 3 + async cleanup batch.

🔍 Hashtag Search

Index posts by hashtag in Elasticsearch. Tag a post → async worker indexes hashtag→postId. Search #travel → Elasticsearch returns top posts (by recency + engagement). Trending hashtags: count hashtag appearances in last 1h using Kafka + Flink windowed aggregation → top-K by count stored in Redis. Explore page = trending hashtags + recommended based on engagement history.

🏆 Feed Ranking (ML)

Instagram moved from chronological to ML-ranked feed in 2016. Signal features: content type (video vs photo), author relationship strength (DM frequency, tag history), predicted likes/comments, time since post, hashtag overlap with interests. Model: neural network. Inference at feed request time on top-K candidates from feed cache. Online learning updates model daily.

Scaling considerations

What interviewers expect by level

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…