📝 Design Google Docs — System Design Interview Guide

Hard · Real-Time Collaboration

Design a collaborative document editor like Google Docs where multiple users can edit the same document simultaneously with real-time updates and conflict resolution.

Open the interactive Google Docs design on PrepGrind → Drag load balancers, caches, databases, and queues onto a canvas, run a live traffic simulation to watch latency and bottlenecks under load, and follow the full interview walkthrough below — free, in your browser.

Functional requirements

Non-functional requirements & scale

Capacity estimation

Core challenge: concurrent edits from multiple users. If User A deletes char at pos 5 and User B inserts at pos 6 simultaneously — naive last-write-wins corrupts the document. Operational Transformation (OT) or CRDTs solve convergence. At Google scale: 10M concurrent WS connections.

Core entities

API design

High-level design

Client connects via WebSocket to Doc Service. User types → generates Operation → sent to server → OT applied against concurrent ops → broadcast to all collaborators → persisted to DB. Periodic snapshots reduce replay time on load.

Deep dives

🔀 Operational Transformation (OT)

OT transforms concurrent operations so they can be applied in any order and still converge. Example: A deletes char at pos 3; B inserts "x" at pos 5 — when B's op arrives at server after A's delete, transform B's position to 4. Jupiter algorithm: server serializes all ops; each client tracks server-revision and local-revision; transform against divergence.

📦 CRDT Alternative

Conflict-free Replicated Data Types (CRDTs) like LSEQ or Logoot assign unique fractional positions to each character. Characters are never moved — deletion marks as tombstone. Merge = union of all character sets, sorted by position. Advantage: no central server needed (P2P possible). Disadvantage: tombstones grow unboundedly, requires periodic GC.

💾 Revision History

Store every operation in append-only log (Spanner/BigTable). On load: fetch latest snapshot + replay ops since snapshot. Create new snapshot every 1000 ops or 1 hour. Snapshot = full document state at that revision. Version comparison: diff between two snapshot revisions. Storage: each op ~200 bytes; 1M ops = 200MB per doc (large docs).

📴 Offline Editing

Client stores ops locally (IndexedDB). On reconnect: client sends all offline ops with their baseRevision. Server transforms offline ops against any ops that happened during offline period. Conflict: OT resolves automatically for text; for structural conflicts (table deleted then edited) → prompt user.

Scaling considerations

What interviewers expect by level

Practice more system design case studies

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…