🎮 Design Live Streaming (Twitch) — System Design Interview Guide
Hard · Media & Real-Time
Design a live streaming platform like Twitch where content creators broadcast in real-time to potentially millions of concurrent viewers with low latency, while supporting live chat and reactions.
Open the interactive Live Streaming (Twitch) design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.
Functional requirements
- Streamers broadcast video/audio in real-time
- Viewers watch live streams with adaptive bitrate
- Live chat alongside the stream
- Viewer count displayed in real-time
- Stream recording and VOD (video on demand) after stream ends
- Stream discovery: browse by game, category, viewer count
Non-functional requirements & scale
- 10M concurrent live streams; 100M concurrent viewers
- Stream ingestion latency (streamer to CDN) < 5 seconds
- Viewer latency < 15 seconds (HLS standard)
- Auto-scale ingest capacity for viral streams (0 to 1M viewers in 1 min)
- Chat messages delivered < 500ms to all participants
- Stream must not drop even if one ingest server fails
Capacity estimation
Live streaming is different from VOD: content is generated in real-time. Streamer pushes RTMP to ingest server → transcode → distribute. HLS: 2s segments = 15-30s latency. LL-HLS: < 3s latency. Chat: WebSocket, 1000s of messages/sec per popular stream.
Core entities
- Stream — streamId, streamerId, title, category, viewerCount, status, startedAt, streamKey
- StreamSegment — segmentId, streamId, resolution, s3Key, duration, sequenceNum, createdAt
- ChatMessage — msgId, streamId, userId, content, emotes[], timestamp
- VOD — vodId, streamId, title, duration, s3Key, viewCount, createdAt
API design
POST /api/v1/streams— Start stream. Returns { streamKey, rtmpUrl }.GET /api/v1/streams/:streamId/playlist.m3u8— HLS manifest with live segment URLs.WS wss://chat.twitch.tv/:streamId— Join stream chat.GET /api/v1/streams?category=Gaming&sort=viewers— Browse live streams.
High-level design
Streamer pushes RTMP → Ingest Server → Transcoder (multiple resolutions) → S3 + CDN (HLS segments). Viewers request playlist → CDN serves segments. Chat via WebSocket cluster with Redis Pub/Sub.
Deep dives
📡 RTMP Ingest & Transcoding
Streamer uses OBS/streaming software to push RTMP (Real-Time Messaging Protocol) to ingest server. Ingest decodes stream → FFmpeg transcodes to HLS segments at 360p/720p/1080p simultaneously. Each 2s HLS segment written to S3 immediately. Manifest (.m3u8) updated with new segment. CDN origin-pulls new segments from S3 within milliseconds.
⏱️ Latency Reduction
Standard HLS: 2-second segments × 3-segment buffer = 6-10s latency. Low-Latency HLS (LL-HLS): partial segments pushed every 200ms. Chunked Transfer Encoding sends partial segments before they complete. CDN must support LL-HLS. Twitch uses custom low-latency protocol: 1-2s glass-to-glass latency for gaming streams where reaction time matters.
💬 Chat at Scale
1M concurrent viewers in one stream. Naive: 1M WebSocket connections to one server. Solution: shard chat by streamId. Multiple chat servers handle a stream's connections. Redis Pub/Sub channel per streamId. Message published once → distributed to all chat servers → pushed to all viewers. Rate limit: max 20 messages/30s per user to prevent spam.
📊 Viewer Count
Real-time viewer count displayed on stream. Counting 1M concurrent connections accurately is hard. Approach: each chat/viewer server maintains local connection count per streamId → reports to Redis every 5s (INCRBY). Central aggregator reads from Redis, applies HyperLogLog for unique viewers. Viewer count cached and pushed via SSE every 10s.
Scaling considerations
- Multiple ingest servers per region — streams hash to specific ingest (consistent hashing)
- Transcoder auto-scales based on active streams (Kubernetes + Spot fleet)
- CDN serves 100M viewers — no origin bandwidth needed per viewer
- Chat Redis cluster sharded by streamId hash
- VOD: after stream ends, concatenate HLS segments in S3 → create VOD playlist
What interviewers expect by level
- Junior: Describe RTMP ingest → transcode → HLS serve flow. Know difference between live and VOD.
- Mid: HLS segment creation, CDN for viewer scale, WebSocket chat with Redis Pub/Sub, viewer count.
- Senior: LL-HLS implementation, ingest failover, chat sharding at 1M viewers, transcode auto-scaling.
- Staff: Sub-second latency pipeline, multi-CDN strategy, cost at 100M concurrent viewers, content moderation at scale.
Practice more system design case studies
- Design URL Shortener
- Design Social Media Feed
- Design Chat System
- Design Video Streaming
- Design Ride-Sharing Platform
- Design E-Commerce Platform
- Design UPI Payment Gateway
- Design Google Docs
- Design Tinder
- Design Google Drive / Dropbox
- Design Instagram
- Design Type-Ahead Search
- Design Web Crawler
- Design Ticket Booking (BookMyShow)
- Design Pastebin
- Design Notification System
- Design Rate Limiter (Standalone)
- Design Simple Web App
- Design Food Delivery (Swiggy)
- Design Stock Trading System
- Design Distributed Key-Value Store
- Design Ad Click Aggregation
- Design Monitoring / Metrics (Datadog)
- Design Online Judge (LeetCode)
- Design FB Post Search
- Design Yelp
- Design Cache Layer
- Design Message Queue
- Design Full Production Stack
PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…