🎮 Design Live Streaming (Twitch) — System Design Interview Guide

Hard · Media & Real-Time

Design a live streaming platform like Twitch where content creators broadcast in real-time to potentially millions of concurrent viewers with low latency, while supporting live chat and reactions.

Open the interactive Live Streaming (Twitch) design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Non-functional requirements & scale

Capacity estimation

Live streaming is different from VOD: content is generated in real-time. Streamer pushes RTMP to ingest server → transcode → distribute. HLS: 2s segments = 15-30s latency. LL-HLS: < 3s latency. Chat: WebSocket, 1000s of messages/sec per popular stream.

Core entities

API design

High-level design

Streamer pushes RTMP → Ingest Server → Transcoder (multiple resolutions) → S3 + CDN (HLS segments). Viewers request playlist → CDN serves segments. Chat via WebSocket cluster with Redis Pub/Sub.

Deep dives

📡 RTMP Ingest & Transcoding

Streamer uses OBS/streaming software to push RTMP (Real-Time Messaging Protocol) to ingest server. Ingest decodes stream → FFmpeg transcodes to HLS segments at 360p/720p/1080p simultaneously. Each 2s HLS segment written to S3 immediately. Manifest (.m3u8) updated with new segment. CDN origin-pulls new segments from S3 within milliseconds.

⏱️ Latency Reduction

Standard HLS: 2-second segments × 3-segment buffer = 6-10s latency. Low-Latency HLS (LL-HLS): partial segments pushed every 200ms. Chunked Transfer Encoding sends partial segments before they complete. CDN must support LL-HLS. Twitch uses custom low-latency protocol: 1-2s glass-to-glass latency for gaming streams where reaction time matters.

💬 Chat at Scale

1M concurrent viewers in one stream. Naive: 1M WebSocket connections to one server. Solution: shard chat by streamId. Multiple chat servers handle a stream's connections. Redis Pub/Sub channel per streamId. Message published once → distributed to all chat servers → pushed to all viewers. Rate limit: max 20 messages/30s per user to prevent spam.

📊 Viewer Count

Real-time viewer count displayed on stream. Counting 1M concurrent connections accurately is hard. Approach: each chat/viewer server maintains local connection count per streamId → reports to Redis every 5s (INCRBY). Central aggregator reads from Redis, applies HyperLogLog for unique viewers. Viewer count cached and pushed via SSE every 10s.

Scaling considerations

What interviewers expect by level

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…