📋 Design Pastebin — System Design Interview Guide
Easy · Storage & Encoding
Design a text-sharing service like Pastebin where users can paste text/code and share it via a short URL, with optional expiry, syntax highlighting, and access control.
Open the interactive Pastebin design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.
Functional requirements
- Users can paste any text (code, logs, notes) and get a short URL
- Pastes can be public, private, or unlisted
- Optional expiry: 10 min, 1 hour, 1 day, 1 week, never
- Syntax highlighting for 50+ programming languages
- View count and paste history for registered users
Non-functional requirements & scale
- 5M new pastes per day; 50M reads per day (10:1 read:write)
- Paste content size: up to 10MB
- Short URL must not conflict with existing pastes
- Read latency < 50ms for popular pastes (cached)
- Data durability: no paste loss unless explicitly expired
Capacity estimation
Similar to URL Shortener but stores content (up to 10MB each) vs just a URL. 5M pastes/day × 10MB max = 50TB/day theoretical max; realistic avg 10KB = 50GB/day. Use object storage (S3) for content, relational DB for metadata. 10:1 read/write means aggressive caching.
Core entities
- Paste — pasteId (Base62), userId?, title, language, visibility, expiresAt, viewCount, createdAt
- PasteContent — s3Key (= pasteId), content (stored in S3, keyed by pasteId)
- User — userId, username, email, apiKey, plan (free/pro)
API design
POST /api/v1/pastes— Create paste. Body: { content, title?, language?, visibility, expiresIn? }. Returns { pasteId, url }.GET /:pasteId— View paste. Returns metadata + content (from S3 or cache). Increments viewCount async.DELETE /api/v1/pastes/:pasteId— Delete paste (owner only). Removes from S3, DB, and cache.GET /api/v1/users/me/pastes— List user's pastes with pagination.
High-level design
Create: generate pasteId (Base62 of Snowflake ID), write metadata to MySQL, upload content to S3. Read: check Redis cache (pasteId → content), miss → fetch from S3 + populate cache. Expiry: background TTL worker scans expired pastes and deletes from S3 + DB.
Deep dives
🔑 ID Generation
Same as URL Shortener: Snowflake ID → Base62 encode. 8 characters gives 62^8 = 218 trillion combinations. Alternative: random 8-char Base62 string — simpler but needs collision check (DB SELECT before INSERT). At 5M/day, collision probability is negligible with 8 chars. Snowflake preferred for distributed generation without coordination.
🗄️ Content Storage Strategy
Small pastes (< 1KB): store inline in MySQL for fast retrieval. Large pastes (> 1KB): store in S3, MySQL stores s3Key. S3 key = pasteId (content-addressed approach loses dedup but is simpler). CDN caches public paste content at edge. Cache-Control: max-age=3600 for pastes (may be edited/deleted). Private pastes: skip CDN, serve directly from S3 signed URL.
⏰ Paste Expiry
Store expiresAt in MySQL. Three approaches: (1) TTL in Redis — automatic but only removes from cache, not S3. (2) Background scanner job — SELECT * WHERE expiresAt < NOW() LIMIT 1000 every minute. Scale issue at large volumes. (3) Lazy deletion — check expiresAt on every read, return 404 if expired, clean up async. Production: lazy deletion + periodic scanner for storage reclamation.
📊 Analytics & Rate Limiting
View count: async INCR in Redis → batch flush to MySQL every 5 minutes (avoids write storm). Rate limiting: free users = 10 pastes/hour, pro = 1000/hour. Implement with Redis sliding window counter. API key required for high-volume usage. Spam detection: same IP + same content hash within 60s = duplicate request, return existing paste.
Scaling considerations
- S3 for content eliminates scaling concern for storage
- Redis cache for hot pastes (LRU eviction, 80/20 rule applies)
- MySQL sharded by pasteId hash for write distribution
- CDN for public paste reads — cache-hit eliminates server roundtrip
- Background cleanup job for expired pastes (S3 object lifecycle policies)
What interviewers expect by level
- Junior: Describe create + read flow, S3 for content storage, MySQL for metadata, short URL generation.
- Mid: Redis caching strategy, expiry with TTL/scanner, rate limiting, S3 vs inline storage decision.
- Senior: Full pipeline with CDN, view count anti-patterns, lazy deletion trade-offs, dedup optimization.
- Staff: Cost analysis: S3 Intelligent-Tiering for cold pastes, CDN cost vs origin cost at 50M reads/day.
Practice more system design case studies
- Design URL Shortener
- Design Social Media Feed
- Design Chat System
- Design Video Streaming
- Design Ride-Sharing Platform
- Design E-Commerce Platform
- Design UPI Payment Gateway
- Design Google Docs
- Design Tinder
- Design Google Drive / Dropbox
- Design Instagram
- Design Type-Ahead Search
- Design Web Crawler
- Design Ticket Booking (BookMyShow)
- Design Notification System
- Design Rate Limiter (Standalone)
- Design Simple Web App
- Design Food Delivery (Swiggy)
- Design Stock Trading System
- Design Live Streaming (Twitch)
- Design Distributed Key-Value Store
- Design Ad Click Aggregation
- Design Monitoring / Metrics (Datadog)
- Design Online Judge (LeetCode)
- Design FB Post Search
- Design Yelp
- Design Cache Layer
- Design Message Queue
- Design Full Production Stack
PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…