🚦 Design Rate Limiter (Standalone) — System Design Interview Guide
Medium · Infrastructure & Algorithms
Design a rate limiter that can enforce configurable request rate limits per user, IP, or API key, across a distributed fleet of servers, with low latency impact.
Open the interactive Rate Limiter (Standalone) design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.
Functional requirements
- Limit requests per user/IP/API key (e.g., 100 req/min)
- Different limits per API endpoint or user tier
- Return HTTP 429 with Retry-After header when limit exceeded
- Limits configurable without code deployment
- Support for burst allowance (allow brief spikes above average)
Non-functional requirements & scale
- Rate limit check must add < 5ms to every request
- System must work across 100+ distributed API servers
- Accuracy: allow exactly N requests per window (not N+50%)
- Graceful degradation: if rate limiter down, fail open (allow traffic)
- 99.99% availability for the rate limit check path
Capacity estimation
Deployed as middleware on every API server. Each server handles 10K req/sec. Rate limit state (counters) must be centralized (Redis) or synchronized. Local counters are fast but inaccurate across servers. Redis is accurate but adds network RTT.
Core entities
- RateLimitRule — ruleId, key (userId/ip/apiKey), endpoint, maxRequests, windowSeconds, burstSize
- Counter — key (userId:endpoint:window), count, windowStart (in Redis)
API design
Internal Middleware check(userId, endpoint)— Returns ALLOW or DENY with remaining count and reset time.GET /admin/rules— List all rate limit rules.PUT /admin/rules/:ruleId— Update rate limit config. Takes effect within 30s.
High-level design
Rate limiter middleware intercepts every request. Checks Redis for current counter. Atomic INCR + EXPIRE via Lua script. If count > limit: return 429. If Redis unavailable: fail open (allow request). Rules fetched from config DB, cached locally for 30s.
Deep dives
🪣 Token Bucket Algorithm
Token bucket: bucket holds max N tokens. Refills at R tokens/sec. Each request consumes 1 token. If empty: reject. Allows bursting up to N. Implementation: store {tokens, lastRefill} in Redis. On request: time_passed = now - lastRefill; tokens = min(max, tokens + time_passed × R); if tokens >= 1: tokens--; allow. Else: deny. Atomic via Lua script.
🔢 Sliding Window Log
Store timestamp of each request in Redis Sorted Set. On each request: ZADD key timestamp; ZREMRANGEBYSCORE key 0 (now-window); count = ZCARD. If count >= limit: deny. Pros: precise. Cons: O(N) memory per user — not suitable for 100K req/min users. Use Sliding Window Counter (hybrid) for production.
🌐 Distributed Rate Limiting
Problem: 100 API servers each maintain local counters → user gets 100× the limit. Solution: centralized Redis. But Redis adds 1-3ms per request. Optimization: (1) Local counter + periodic sync: each server allows 10% of global limit locally, sync to Redis every 100ms. (2) Rate limit at API Gateway level (single chokepoint). (3) Redis pipeline for batch counter updates.
⚠️ Handling Redis Failure
If Redis cluster down: fail open (allow all requests) — preferred for availability. OR fail closed (reject all) — for security-sensitive APIs. Circuit breaker pattern: if Redis error rate > 5% for 10s, open circuit and fail open with timeout 60s. Monitor: alert immediately on circuit open. Health check: ping Redis every 5s.
Scaling considerations
- Redis Cluster with multiple shards — counter key hashed to shard
- Lua script for atomic INCR + compare + EXPIRE (one RTT)
- Local counter cache per server (sliding sync) to reduce Redis pressure
- Rules config cached locally; background refresh every 30s
- Separate Redis cluster for rate limiting (isolated from app cache)
What interviewers expect by level
- Junior: Describe fixed window counter. Know token bucket concept. Understand why distributed rate limiting needs centralized state.
- Mid: Token bucket vs leaky bucket vs sliding window — trade-offs. Redis Lua for atomicity, fail-open strategy.
- Senior: Distributed rate limiting with local sync, sliding window counter, Rule engine with hot reload, Redis cluster failure handling.
- Staff: Global rate limiting across regions, adaptive limits (dynamic based on system load), DDoS protection integration.
Practice more system design case studies
- Design URL Shortener
- Design Social Media Feed
- Design Chat System
- Design Video Streaming
- Design Ride-Sharing Platform
- Design E-Commerce Platform
- Design UPI Payment Gateway
- Design Google Docs
- Design Tinder
- Design Google Drive / Dropbox
- Design Instagram
- Design Type-Ahead Search
- Design Web Crawler
- Design Ticket Booking (BookMyShow)
- Design Pastebin
- Design Notification System
- Design Simple Web App
- Design Food Delivery (Swiggy)
- Design Stock Trading System
- Design Live Streaming (Twitch)
- Design Distributed Key-Value Store
- Design Ad Click Aggregation
- Design Monitoring / Metrics (Datadog)
- Design Online Judge (LeetCode)
- Design FB Post Search
- Design Yelp
- Design Cache Layer
- Design Message Queue
- Design Full Production Stack
PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…