🎮 Design Live Streaming (Twitch) — System Design Interview Guide

Hard · Media & Real-Time

Design a live streaming platform like Twitch where content creators broadcast in real-time to potentially millions of concurrent viewers with low latency, while supporting live chat and reactions.

Open the interactive Live Streaming (Twitch) design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Streamers broadcast video/audio in real-time
Viewers watch live streams with adaptive bitrate
Live chat alongside the stream
Viewer count displayed in real-time
Stream recording and VOD (video on demand) after stream ends
Stream discovery: browse by game, category, viewer count

Non-functional requirements & scale

10M concurrent live streams; 100M concurrent viewers
Stream ingestion latency (streamer to CDN) < 5 seconds
Viewer latency < 15 seconds (HLS standard)
Auto-scale ingest capacity for viral streams (0 to 1M viewers in 1 min)
Chat messages delivered < 500ms to all participants
Stream must not drop even if one ingest server fails

Capacity estimation

Live streaming is different from VOD: content is generated in real-time. Streamer pushes RTMP to ingest server → transcode → distribute. HLS: 2s segments = 15-30s latency. LL-HLS: < 3s latency. Chat: WebSocket, 1000s of messages/sec per popular stream.

Core entities

Stream — streamId, streamerId, title, category, viewerCount, status, startedAt, streamKey
StreamSegment — segmentId, streamId, resolution, s3Key, duration, sequenceNum, createdAt
ChatMessage — msgId, streamId, userId, content, emotes[], timestamp
VOD — vodId, streamId, title, duration, s3Key, viewCount, createdAt

API design

POST /api/v1/streams — Start stream. Returns { streamKey, rtmpUrl }.
GET /api/v1/streams/:streamId/playlist.m3u8 — HLS manifest with live segment URLs.
WS wss://chat.twitch.tv/:streamId — Join stream chat.
GET /api/v1/streams?category=Gaming&sort=viewers — Browse live streams.

High-level design

Streamer pushes RTMP → Ingest Server → Transcoder (multiple resolutions) → S3 + CDN (HLS segments). Viewers request playlist → CDN serves segments. Chat via WebSocket cluster with Redis Pub/Sub.

Deep dives

📡 RTMP Ingest & Transcoding

Streamer uses OBS/streaming software to push RTMP (Real-Time Messaging Protocol) to ingest server. Ingest decodes stream → FFmpeg transcodes to HLS segments at 360p/720p/1080p simultaneously. Each 2s HLS segment written to S3 immediately. Manifest (.m3u8) updated with new segment. CDN origin-pulls new segments from S3 within milliseconds.

⏱️ Latency Reduction

Standard HLS: 2-second segments × 3-segment buffer = 6-10s latency. Low-Latency HLS (LL-HLS): partial segments pushed every 200ms. Chunked Transfer Encoding sends partial segments before they complete. CDN must support LL-HLS. Twitch uses custom low-latency protocol: 1-2s glass-to-glass latency for gaming streams where reaction time matters.

💬 Chat at Scale

1M concurrent viewers in one stream. Naive: 1M WebSocket connections to one server. Solution: shard chat by streamId. Multiple chat servers handle a stream's connections. Redis Pub/Sub channel per streamId. Message published once → distributed to all chat servers → pushed to all viewers. Rate limit: max 20 messages/30s per user to prevent spam.

📊 Viewer Count

Real-time viewer count displayed on stream. Counting 1M concurrent connections accurately is hard. Approach: each chat/viewer server maintains local connection count per streamId → reports to Redis every 5s (INCRBY). Central aggregator reads from Redis, applies HyperLogLog for unique viewers. Viewer count cached and pushed via SSE every 10s.

Scaling considerations

Multiple ingest servers per region — streams hash to specific ingest (consistent hashing)
Transcoder auto-scales based on active streams (Kubernetes + Spot fleet)
CDN serves 100M viewers — no origin bandwidth needed per viewer
Chat Redis cluster sharded by streamId hash
VOD: after stream ends, concatenate HLS segments in S3 → create VOD playlist

What interviewers expect by level

Junior: Describe RTMP ingest → transcode → HLS serve flow. Know difference between live and VOD.
Mid: HLS segment creation, CDN for viewer scale, WebSocket chat with Redis Pub/Sub, viewer count.
Senior: LL-HLS implementation, ingest failover, chat sharding at 1M viewers, transcode auto-scaling.
Staff: Sub-second latency pipeline, multi-CDN strategy, cost at 100M concurrent viewers, content moderation at scale.

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…