📋 Design Pastebin — System Design Interview Guide

Easy · Storage & Encoding

Design a text-sharing service like Pastebin where users can paste text/code and share it via a short URL, with optional expiry, syntax highlighting, and access control.

Open the interactive Pastebin design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Non-functional requirements & scale

Capacity estimation

Similar to URL Shortener but stores content (up to 10MB each) vs just a URL. 5M pastes/day × 10MB max = 50TB/day theoretical max; realistic avg 10KB = 50GB/day. Use object storage (S3) for content, relational DB for metadata. 10:1 read/write means aggressive caching.

Core entities

API design

High-level design

Create: generate pasteId (Base62 of Snowflake ID), write metadata to MySQL, upload content to S3. Read: check Redis cache (pasteId → content), miss → fetch from S3 + populate cache. Expiry: background TTL worker scans expired pastes and deletes from S3 + DB.

Deep dives

🔑 ID Generation

Same as URL Shortener: Snowflake ID → Base62 encode. 8 characters gives 62^8 = 218 trillion combinations. Alternative: random 8-char Base62 string — simpler but needs collision check (DB SELECT before INSERT). At 5M/day, collision probability is negligible with 8 chars. Snowflake preferred for distributed generation without coordination.

🗄️ Content Storage Strategy

Small pastes (< 1KB): store inline in MySQL for fast retrieval. Large pastes (> 1KB): store in S3, MySQL stores s3Key. S3 key = pasteId (content-addressed approach loses dedup but is simpler). CDN caches public paste content at edge. Cache-Control: max-age=3600 for pastes (may be edited/deleted). Private pastes: skip CDN, serve directly from S3 signed URL.

⏰ Paste Expiry

Store expiresAt in MySQL. Three approaches: (1) TTL in Redis — automatic but only removes from cache, not S3. (2) Background scanner job — SELECT * WHERE expiresAt < NOW() LIMIT 1000 every minute. Scale issue at large volumes. (3) Lazy deletion — check expiresAt on every read, return 404 if expired, clean up async. Production: lazy deletion + periodic scanner for storage reclamation.

📊 Analytics & Rate Limiting

View count: async INCR in Redis → batch flush to MySQL every 5 minutes (avoids write storm). Rate limiting: free users = 10 pastes/hour, pro = 1000/hour. Implement with Redis sliding window counter. API key required for high-volume usage. Spam detection: same IP + same content hash within 60s = duplicate request, return existing paste.

Scaling considerations

What interviewers expect by level

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…