📋 Design Pastebin — System Design Interview Guide

Easy · Storage & Encoding

Design a text-sharing service like Pastebin where users can paste text/code and share it via a short URL, with optional expiry, syntax highlighting, and access control.

Open the interactive Pastebin design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Users can paste any text (code, logs, notes) and get a short URL
Pastes can be public, private, or unlisted
Optional expiry: 10 min, 1 hour, 1 day, 1 week, never
Syntax highlighting for 50+ programming languages
View count and paste history for registered users

Non-functional requirements & scale

5M new pastes per day; 50M reads per day (10:1 read:write)
Paste content size: up to 10MB
Short URL must not conflict with existing pastes
Read latency < 50ms for popular pastes (cached)
Data durability: no paste loss unless explicitly expired

Capacity estimation

Similar to URL Shortener but stores content (up to 10MB each) vs just a URL. 5M pastes/day × 10MB max = 50TB/day theoretical max; realistic avg 10KB = 50GB/day. Use object storage (S3) for content, relational DB for metadata. 10:1 read/write means aggressive caching.

Core entities

Paste — pasteId (Base62), userId?, title, language, visibility, expiresAt, viewCount, createdAt
PasteContent — s3Key (= pasteId), content (stored in S3, keyed by pasteId)
User — userId, username, email, apiKey, plan (free/pro)

API design

POST /api/v1/pastes — Create paste. Body: { content, title?, language?, visibility, expiresIn? }. Returns { pasteId, url }.
GET /:pasteId — View paste. Returns metadata + content (from S3 or cache). Increments viewCount async.
DELETE /api/v1/pastes/:pasteId — Delete paste (owner only). Removes from S3, DB, and cache.
GET /api/v1/users/me/pastes — List user's pastes with pagination.

High-level design

Create: generate pasteId (Base62 of Snowflake ID), write metadata to MySQL, upload content to S3. Read: check Redis cache (pasteId → content), miss → fetch from S3 + populate cache. Expiry: background TTL worker scans expired pastes and deletes from S3 + DB.

Deep dives

🔑 ID Generation

Same as URL Shortener: Snowflake ID → Base62 encode. 8 characters gives 62^8 = 218 trillion combinations. Alternative: random 8-char Base62 string — simpler but needs collision check (DB SELECT before INSERT). At 5M/day, collision probability is negligible with 8 chars. Snowflake preferred for distributed generation without coordination.

🗄️ Content Storage Strategy

Small pastes (< 1KB): store inline in MySQL for fast retrieval. Large pastes (> 1KB): store in S3, MySQL stores s3Key. S3 key = pasteId (content-addressed approach loses dedup but is simpler). CDN caches public paste content at edge. Cache-Control: max-age=3600 for pastes (may be edited/deleted). Private pastes: skip CDN, serve directly from S3 signed URL.

⏰ Paste Expiry

Store expiresAt in MySQL. Three approaches: (1) TTL in Redis — automatic but only removes from cache, not S3. (2) Background scanner job — SELECT * WHERE expiresAt < NOW() LIMIT 1000 every minute. Scale issue at large volumes. (3) Lazy deletion — check expiresAt on every read, return 404 if expired, clean up async. Production: lazy deletion + periodic scanner for storage reclamation.

📊 Analytics & Rate Limiting

View count: async INCR in Redis → batch flush to MySQL every 5 minutes (avoids write storm). Rate limiting: free users = 10 pastes/hour, pro = 1000/hour. Implement with Redis sliding window counter. API key required for high-volume usage. Spam detection: same IP + same content hash within 60s = duplicate request, return existing paste.

Scaling considerations

S3 for content eliminates scaling concern for storage
Redis cache for hot pastes (LRU eviction, 80/20 rule applies)
MySQL sharded by pasteId hash for write distribution
CDN for public paste reads — cache-hit eliminates server roundtrip
Background cleanup job for expired pastes (S3 object lifecycle policies)

What interviewers expect by level

Junior: Describe create + read flow, S3 for content storage, MySQL for metadata, short URL generation.
Mid: Redis caching strategy, expiry with TTL/scanner, rate limiting, S3 vs inline storage decision.
Senior: Full pipeline with CDN, view count anti-patterns, lazy deletion trade-offs, dedup optimization.
Staff: Cost analysis: S3 Intelligent-Tiering for cold pastes, CDN cost vs origin cost at 50M reads/day.

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…