🚦 Design Rate Limiter (Standalone) — System Design Interview Guide

Medium · Infrastructure & Algorithms

Design a rate limiter that can enforce configurable request rate limits per user, IP, or API key, across a distributed fleet of servers, with low latency impact.

Open the interactive Rate Limiter (Standalone) design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Limit requests per user/IP/API key (e.g., 100 req/min)
Different limits per API endpoint or user tier
Return HTTP 429 with Retry-After header when limit exceeded
Limits configurable without code deployment
Support for burst allowance (allow brief spikes above average)

Non-functional requirements & scale

Rate limit check must add < 5ms to every request
System must work across 100+ distributed API servers
Accuracy: allow exactly N requests per window (not N+50%)
Graceful degradation: if rate limiter down, fail open (allow traffic)
99.99% availability for the rate limit check path

Capacity estimation

Deployed as middleware on every API server. Each server handles 10K req/sec. Rate limit state (counters) must be centralized (Redis) or synchronized. Local counters are fast but inaccurate across servers. Redis is accurate but adds network RTT.

Core entities

RateLimitRule — ruleId, key (userId/ip/apiKey), endpoint, maxRequests, windowSeconds, burstSize
Counter — key (userId:endpoint:window), count, windowStart (in Redis)

API design

Internal Middleware check(userId, endpoint) — Returns ALLOW or DENY with remaining count and reset time.
GET /admin/rules — List all rate limit rules.
PUT /admin/rules/:ruleId — Update rate limit config. Takes effect within 30s.

High-level design

Rate limiter middleware intercepts every request. Checks Redis for current counter. Atomic INCR + EXPIRE via Lua script. If count > limit: return 429. If Redis unavailable: fail open (allow request). Rules fetched from config DB, cached locally for 30s.

Deep dives

🪣 Token Bucket Algorithm

Token bucket: bucket holds max N tokens. Refills at R tokens/sec. Each request consumes 1 token. If empty: reject. Allows bursting up to N. Implementation: store {tokens, lastRefill} in Redis. On request: time_passed = now - lastRefill; tokens = min(max, tokens + time_passed × R); if tokens >= 1: tokens--; allow. Else: deny. Atomic via Lua script.

🔢 Sliding Window Log

Store timestamp of each request in Redis Sorted Set. On each request: ZADD key timestamp; ZREMRANGEBYSCORE key 0 (now-window); count = ZCARD. If count >= limit: deny. Pros: precise. Cons: O(N) memory per user — not suitable for 100K req/min users. Use Sliding Window Counter (hybrid) for production.

🌐 Distributed Rate Limiting

Problem: 100 API servers each maintain local counters → user gets 100× the limit. Solution: centralized Redis. But Redis adds 1-3ms per request. Optimization: (1) Local counter + periodic sync: each server allows 10% of global limit locally, sync to Redis every 100ms. (2) Rate limit at API Gateway level (single chokepoint). (3) Redis pipeline for batch counter updates.

⚠️ Handling Redis Failure

If Redis cluster down: fail open (allow all requests) — preferred for availability. OR fail closed (reject all) — for security-sensitive APIs. Circuit breaker pattern: if Redis error rate > 5% for 10s, open circuit and fail open with timeout 60s. Monitor: alert immediately on circuit open. Health check: ping Redis every 5s.

Scaling considerations

Redis Cluster with multiple shards — counter key hashed to shard
Lua script for atomic INCR + compare + EXPIRE (one RTT)
Local counter cache per server (sliding sync) to reduce Redis pressure
Rules config cached locally; background refresh every 30s
Separate Redis cluster for rate limiting (isolated from app cache)

What interviewers expect by level

Junior: Describe fixed window counter. Know token bucket concept. Understand why distributed rate limiting needs centralized state.
Mid: Token bucket vs leaky bucket vs sliding window — trade-offs. Redis Lua for atomicity, fail-open strategy.
Senior: Distributed rate limiting with local sync, sliding window counter, Rule engine with hot reload, Redis cluster failure handling.
Staff: Global rate limiting across regions, adaptive limits (dynamic based on system load), DDoS protection integration.

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…