☁️ Design Google Drive / Dropbox — System Design Interview Guide
Hard · File Storage & Sync
Design a cloud file storage service like Google Drive or Dropbox where users can upload, sync, share, and collaborate on files across multiple devices.
Open the interactive Google Drive / Dropbox design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.
Functional requirements
- Upload files of any type and size (up to 5GB)
- Sync files across multiple devices automatically
- Share files/folders with other users (view/edit permissions)
- Version history: restore previous versions
- Collaborative editing for supported formats (Docs, Sheets)
- Offline access with delta sync when reconnecting
Non-functional requirements & scale
- 1B users; 15 GB free storage per user = 15 EB total
- Upload throughput: support 1GB file upload reliably
- Sync latency: change on one device visible on others < 5 seconds
- Storage efficiency: deduplicate identical files (same content hash)
- Files must never be lost — durable storage (11 nines = S3)
- Download p99 < 500ms for files in CDN cache
Capacity estimation
Core challenge: sync. When user edits a file on laptop, mobile must see change in <5s. For large files, don't re-upload entirely — chunk into blocks and sync only changed blocks (rsync algorithm). Deduplication: same file content across users = store once (content-addressed storage).
Core entities
- File — fileId, ownerId, name, mimeType, size, parentFolderId, currentVersionId, createdAt
- FileVersion — versionId, fileId, blockIds[] (content hash list), size, createdAt, createdBy
- Block — blockId (SHA-256 of content), content, size, refCount (dedup reference count)
- Folder — folderId, ownerId, name, parentId, sharedWith[], createdAt
- SyncEvent — eventId, userId, deviceId, fileId, changeType, timestamp
API design
POST /api/v1/files/upload-session— Initialize resumable upload. Returns { uploadId, blockUrls[] } for chunked upload.PUT /upload/:uploadId/blocks/:blockId— Upload individual block. Content-addressed — skip if block already exists.GET /api/v1/sync/delta?cursor=— Get all changes since cursor (Dropbox-style delta sync). Returns { changes[], newCursor }.GET /api/v1/files/:fileId/download— Get pre-signed S3 URL for file download.
High-level design
Upload: chunk file → hash each block → upload only new blocks to S3 → commit file metadata. Sync: long-poll or WebSocket for change notifications → client fetches delta → downloads changed blocks from CDN.
Deep dives
🧩 Chunked Upload & Dedup
Split file into 4MB blocks. Compute SHA-256 hash of each block. Check if block already exists in S3 (query block metadata DB). Upload only new blocks. S3 key = block hash (content-addressed). Result: if 1000 users upload same 100MB file → stored once (100MB), not 100GB. Saves ~90% storage on common files (OS images, shared assets).
🔄 Delta Sync Algorithm
Client maintains local cursor (last sync timestamp). On reconnect: GET /sync/delta?cursor=<timestamp>. Server returns all file events (create/update/delete) since cursor. Client applies changes: download new/updated block list from CDN, delete removed files. Handles offline periods of days. Cursor is server-issued opaque token, not raw timestamp.
📂 Shared Folders
Shared folder creates a virtual copy in each user's namespace. Underlying storage is shared (same block IDs). On change notification: notify all users with access. Permission check on every API call (read ownership or shared ACL). Conflict: simultaneous edits → create conflict copy (OriginalFile (Conflicted Copy by John).docx).
🔁 Resumable Uploads
Large file upload may fail midway. Solution: Upload Session. POST /upload-session returns sessionId + S3 multipart upload IDs per block. Client uploads blocks independently (parallelizable). On failure: resume from last successfully uploaded block (client tracks local progress). S3 multipart upload: parts expire after 7 days if not completed.
Scaling considerations
- Content-addressed block storage (S3) enables near-infinite deduplication
- MySQL sharded by userId for file/folder metadata
- Long-polling or WebSocket for sync notifications — choose per client capability
- CDN with large TTLs for blocks (content hash = immutable), short TTL for manifests
- Kafka partitioned by userId — ordered change events per user for correct sync
What interviewers expect by level
- Junior: Describe file upload to S3, metadata in DB, basic sharing. Know why chunking is needed for large files.
- Mid: Block-level deduplication, resumable uploads, delta sync with cursor, CDN for downloads.
- Senior: Content-addressed storage, conflict handling, shared folder fan-out, offline sync edge cases.
- Staff: 15 EB at 11-nines durability, cross-region replication strategy, cost per GB optimization, GDPR deletion (overwrite blocks with 0s).
Practice more system design case studies
- Design URL Shortener
- Design Social Media Feed
- Design Chat System
- Design Video Streaming
- Design Ride-Sharing Platform
- Design E-Commerce Platform
- Design UPI Payment Gateway
- Design Google Docs
- Design Tinder
- Design Instagram
- Design Type-Ahead Search
- Design Web Crawler
- Design Ticket Booking (BookMyShow)
- Design Pastebin
- Design Notification System
- Design Rate Limiter (Standalone)
- Design Simple Web App
- Design Food Delivery (Swiggy)
- Design Stock Trading System
- Design Live Streaming (Twitch)
- Design Distributed Key-Value Store
- Design Ad Click Aggregation
- Design Monitoring / Metrics (Datadog)
- Design Online Judge (LeetCode)
- Design FB Post Search
- Design Yelp
- Design Cache Layer
- Design Message Queue
- Design Full Production Stack
PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…