Design a real-time collaborative text editor (like Google Docs).
The core problem is conflict resolution on concurrent edits: OT (Operational Transformation) or CRDTs. Plus: a sync server (WebSockets), local-first optimistic edits, presence/cursors, document model + persistence, offline support, and undo/redo per-user. CRDTs (e.g. Yjs) are the modern go-to.
A collaborative editor's hard problem isn't the UI — it's conflict resolution: two people editing the same spot at the same time must converge to the same document.
The core: OT vs CRDT
Naively syncing raw text positions breaks instantly — if I insert a character at index 5 while you delete index 2, my "index 5" is now wrong. Two solutions:
Operational Transformation (OT) — represent edits as operations (insert, delete at position). When a remote op arrives that was made concurrently with local ops, you transform it against them so it applies correctly. Powerful but complex to implement correctly; usually needs a central server to order operations. (This is what Google Docs historically used.)
CRDTs (Conflict-free Replicated Data Types) — data structures designed so that concurrent edits always merge deterministically regardless of order, without a central coordinator. Each character gets a unique, stable identity, so positions don't shift out from under you. Yjs and Automerge are mature CRDT libraries — the modern default because they're more robust, support offline/peer-to-peer, and you don't hand-roll the transform logic.
The rest of the architecture
- Transport — WebSockets for low-latency bidirectional sync of ops/updates. A sync server relays updates between clients (and with CRDTs can be a "dumb" relay).
- Local-first / optimistic edits — apply the user's edit to their local document immediately (the editor must feel instant), then sync in the background. Never wait for the server round-trip.
- Document model — a structured representation (not a raw string) — a rich-text model handling formatting, blocks, etc. (ProseMirror/Slate as the editor layer, often paired with Yjs).
- Persistence — the server persists the document (snapshots + recent ops/updates) so it survives and new joiners can load it.
- Presence — show who's online and their cursors/selections in real time. This is ephemeral state, synced separately from the document (Yjs has an "awareness" channel) — you don't persist it.
- Undo/redo — must be per-user: undoing should revert my changes, not my collaborator's. CRDT libs provide scoped undo managers.
- Offline support — local-first + CRDT means edits made offline merge cleanly on reconnect.
- Access control & versioning — permissions, document history/snapshots.
The framing
"The hard part is conflict resolution on concurrent edits — raw position syncing breaks immediately. Two approaches: Operational Transformation, which transforms concurrent ops against each other but is complex and server-coordinated; or CRDTs, data structures where concurrent edits merge deterministically regardless of order. CRDTs — Yjs, Automerge — are the modern default: robust, offline-capable, no hand-rolled transform logic. Around that: WebSockets for sync, local-first optimistic edits so typing feels instant, a structured rich-text document model, server persistence with snapshots, a separate ephemeral channel for presence and cursors, and per-user scoped undo/redo."
Follow-up questions
- •Why does naive position-based syncing break?
- •OT vs CRDT — what are the trade-offs?
- •Why must undo/redo be per-user?
- •Why is presence synced separately from the document?
Common mistakes
- •Syncing raw text/positions without OT or CRDT.
- •Waiting for the server before applying the user's own edit (laggy typing).
- •Persisting ephemeral presence/cursor state with the document.
- •Global undo that reverts other people's changes.
- •Underestimating OT's implementation complexity.
Performance considerations
- •Local-first edits keep typing at zero latency. CRDT metadata grows with edit count — periodic snapshotting/garbage collection keeps document size bounded. Batch and debounce updates over the wire; presence is high-frequency but ephemeral so it can be lossy.
Edge cases
- •Two users editing the exact same character simultaneously.
- •A user editing offline for a long time, then reconnecting.
- •Network partition / reconnection and catching up on missed updates.
- •A late joiner needing the full current document state.
- •Large documents — snapshotting vs replaying all ops.
Real-world examples
- •Google Docs (OT historically); Figma, Linear, and many apps using CRDTs.
- •Yjs + ProseMirror/Slate + a WebSocket provider as a common production stack.