🚦 Design Rate Limiter (Standalone) — System Design Interview Guide

Medium · Infrastructure & Algorithms

Design a rate limiter that can enforce configurable request rate limits per user, IP, or API key, across a distributed fleet of servers, with low latency impact.

Open the interactive Rate Limiter (Standalone) design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Non-functional requirements & scale

Capacity estimation

Deployed as middleware on every API server. Each server handles 10K req/sec. Rate limit state (counters) must be centralized (Redis) or synchronized. Local counters are fast but inaccurate across servers. Redis is accurate but adds network RTT.

Core entities

API design

High-level design

Rate limiter middleware intercepts every request. Checks Redis for current counter. Atomic INCR + EXPIRE via Lua script. If count > limit: return 429. If Redis unavailable: fail open (allow request). Rules fetched from config DB, cached locally for 30s.

Deep dives

🪣 Token Bucket Algorithm

Token bucket: bucket holds max N tokens. Refills at R tokens/sec. Each request consumes 1 token. If empty: reject. Allows bursting up to N. Implementation: store {tokens, lastRefill} in Redis. On request: time_passed = now - lastRefill; tokens = min(max, tokens + time_passed × R); if tokens >= 1: tokens--; allow. Else: deny. Atomic via Lua script.

🔢 Sliding Window Log

Store timestamp of each request in Redis Sorted Set. On each request: ZADD key timestamp; ZREMRANGEBYSCORE key 0 (now-window); count = ZCARD. If count >= limit: deny. Pros: precise. Cons: O(N) memory per user — not suitable for 100K req/min users. Use Sliding Window Counter (hybrid) for production.

🌐 Distributed Rate Limiting

Problem: 100 API servers each maintain local counters → user gets 100× the limit. Solution: centralized Redis. But Redis adds 1-3ms per request. Optimization: (1) Local counter + periodic sync: each server allows 10% of global limit locally, sync to Redis every 100ms. (2) Rate limit at API Gateway level (single chokepoint). (3) Redis pipeline for batch counter updates.

⚠️ Handling Redis Failure

If Redis cluster down: fail open (allow all requests) — preferred for availability. OR fail closed (reject all) — for security-sensitive APIs. Circuit breaker pattern: if Redis error rate > 5% for 10s, open circuit and fail open with timeout 60s. Monitor: alert immediately on circuit open. Health check: ping Redis every 5s.

Scaling considerations

What interviewers expect by level

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…