☁️ Design Google Drive / Dropbox — System Design Interview Guide

Hard · File Storage & Sync

Design a cloud file storage service like Google Drive or Dropbox where users can upload, sync, share, and collaborate on files across multiple devices.

Open the interactive Google Drive / Dropbox design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Upload files of any type and size (up to 5GB)
Sync files across multiple devices automatically
Share files/folders with other users (view/edit permissions)
Version history: restore previous versions
Collaborative editing for supported formats (Docs, Sheets)
Offline access with delta sync when reconnecting

Non-functional requirements & scale

1B users; 15 GB free storage per user = 15 EB total
Upload throughput: support 1GB file upload reliably
Sync latency: change on one device visible on others < 5 seconds
Storage efficiency: deduplicate identical files (same content hash)
Files must never be lost — durable storage (11 nines = S3)
Download p99 < 500ms for files in CDN cache

Capacity estimation

Core challenge: sync. When user edits a file on laptop, mobile must see change in <5s. For large files, don't re-upload entirely — chunk into blocks and sync only changed blocks (rsync algorithm). Deduplication: same file content across users = store once (content-addressed storage).

Core entities

File — fileId, ownerId, name, mimeType, size, parentFolderId, currentVersionId, createdAt
FileVersion — versionId, fileId, blockIds[] (content hash list), size, createdAt, createdBy
Block — blockId (SHA-256 of content), content, size, refCount (dedup reference count)
Folder — folderId, ownerId, name, parentId, sharedWith[], createdAt
SyncEvent — eventId, userId, deviceId, fileId, changeType, timestamp

API design

POST /api/v1/files/upload-session — Initialize resumable upload. Returns { uploadId, blockUrls[] } for chunked upload.
PUT /upload/:uploadId/blocks/:blockId — Upload individual block. Content-addressed — skip if block already exists.
GET /api/v1/sync/delta?cursor= — Get all changes since cursor (Dropbox-style delta sync). Returns { changes[], newCursor }.
GET /api/v1/files/:fileId/download — Get pre-signed S3 URL for file download.

High-level design

Upload: chunk file → hash each block → upload only new blocks to S3 → commit file metadata. Sync: long-poll or WebSocket for change notifications → client fetches delta → downloads changed blocks from CDN.

Deep dives

🧩 Chunked Upload & Dedup

Split file into 4MB blocks. Compute SHA-256 hash of each block. Check if block already exists in S3 (query block metadata DB). Upload only new blocks. S3 key = block hash (content-addressed). Result: if 1000 users upload same 100MB file → stored once (100MB), not 100GB. Saves ~90% storage on common files (OS images, shared assets).

🔄 Delta Sync Algorithm

Client maintains local cursor (last sync timestamp). On reconnect: GET /sync/delta?cursor=<timestamp>. Server returns all file events (create/update/delete) since cursor. Client applies changes: download new/updated block list from CDN, delete removed files. Handles offline periods of days. Cursor is server-issued opaque token, not raw timestamp.

📂 Shared Folders

Shared folder creates a virtual copy in each user's namespace. Underlying storage is shared (same block IDs). On change notification: notify all users with access. Permission check on every API call (read ownership or shared ACL). Conflict: simultaneous edits → create conflict copy (OriginalFile (Conflicted Copy by John).docx).

🔁 Resumable Uploads

Large file upload may fail midway. Solution: Upload Session. POST /upload-session returns sessionId + S3 multipart upload IDs per block. Client uploads blocks independently (parallelizable). On failure: resume from last successfully uploaded block (client tracks local progress). S3 multipart upload: parts expire after 7 days if not completed.

Scaling considerations

Content-addressed block storage (S3) enables near-infinite deduplication
MySQL sharded by userId for file/folder metadata
Long-polling or WebSocket for sync notifications — choose per client capability
CDN with large TTLs for blocks (content hash = immutable), short TTL for manifests
Kafka partitioned by userId — ordered change events per user for correct sync

What interviewers expect by level

Junior: Describe file upload to S3, metadata in DB, basic sharing. Know why chunking is needed for large files.
Mid: Block-level deduplication, resumable uploads, delta sync with cursor, CDN for downloads.
Senior: Content-addressed storage, conflict handling, shared folder fan-out, offline sync edge cases.
Staff: 15 EB at 11-nines durability, cross-region replication strategy, cost per GB optimization, GDPR deletion (overwrite blocks with 0s).

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…