Build a collaborative text editor (Google Docs style)
Don't roll your own. Use a CRDT (Yjs) or OT engine for conflict-free concurrent edits, a rich-text framework (ProseMirror via TipTap / Slate / Lexical) for the editor, and WebSocket / WebRTC transport with awareness for cursors. Render presence + remote cursors via decorations. Persist server-side. CRDTs are dominant in 2026 for offline-tolerance and decentralized topologies; OT for centralized servers like Google Docs.
This is a system-design question wrapped in a coding one. Three layers:
┌───────────────────────────────────────────────┐
│ UI: React + rich-text framework (TipTap) │
│ ├─ contenteditable, selection, formatting │
│ └─ Renders presence (cursors, avatars) │
├───────────────────────────────────────────────┤
│ Sync engine: CRDT (Yjs) or OT (ShareDB) │
│ ├─ Local mutation → emit op │
│ ├─ Remote op → merge into local doc │
│ └─ Awareness (cursors, selection per user) │
├───────────────────────────────────────────────┤
│ Transport: WebSocket (y-websocket) + server │
│ ├─ Broadcast ops to other clients │
│ └─ Persist to DB (Postgres / S3) │
└───────────────────────────────────────────────┘CRDTs vs OT.
- OT (Operational Transformation, used by Google Docs): every op is transformed against intervening ops on the server. Requires a centralized authority. Bounded latency assumption.
- CRDT (Conflict-free Replicated Data Type): ops are commutative — apply in any order, converge. Works peer-to-peer, offline-first, no central server required. Yjs is the dominant CRDT impl for text in 2026.
For new projects, CRDT is the default. Yjs has a ~30KB runtime, mature bindings for ProseMirror/Slate/Quill/CodeMirror, and a tested WebSocket provider.
Minimal Yjs + TipTap example.
import * as Y from "yjs";
import { WebsocketProvider } from "y-websocket";
import { useEditor, EditorContent } from "@tiptap/react";
import Collaboration from "@tiptap/extension-collaboration";
import CollaborationCursor from "@tiptap/extension-collaboration-cursor";
import StarterKit from "@tiptap/starter-kit";
const ydoc = new Y.Doc();
const provider = new WebsocketProvider("wss://yws/", "doc-id", ydoc);
function Editor({ user }) {
const editor = useEditor({
extensions: [
StarterKit.configure({ history: false }), // Yjs handles undo
Collaboration.configure({ document: ydoc }),
CollaborationCursor.configure({ provider, user }),
],
});
return <EditorContent editor={editor} />;
}That's it for the happy path. Yjs handles merging, TipTap handles rendering, the provider handles sync. Cursors and selections appear automatically.
The hard parts (where staff-level questions go).
1. Persistence and snapshots. Yjs can write its binary update stream to disk. But replaying 100k tiny ops on load is slow. Periodically snapshot the doc state (Y.encodeStateAsUpdate(doc)) and store the rolled-up version; new clients fetch the snapshot + ops since.
2. Awareness (cursors, presence). Separate from the document — ephemeral, broadcast, no persistence. Yjs's "awareness" protocol handles this. Show user color, name, live caret position.
3. Offline support. A user types offline; ops queue locally. On reconnect, sync up. CRDTs handle the merge automatically — that's the headline feature. Persist Yjs state in IndexedDB via y-indexeddb for survival across page reload.
4. Permissions / access control. Per-doc ACLs on the server. The WebSocket provider must authenticate; the server validates that the client may read/write before forwarding ops.
5. Conflict resolution semantics. CRDTs converge to some state — that doesn't mean it's the state any user expected. Concurrent "delete this paragraph" + "edit this paragraph" → depending on the CRDT, the edit may end up orphaned. Surface conflicts via revision history, not just final state.
6. Undo across users. Y.UndoManager scopes undo to the local user's changes — undo doesn't revert someone else's edit. This is what people expect; building it manually is the hard part.
7. Scale. A single Yjs document can hold ~1MB comfortably; multi-MB starts to lag on mobile. Split very long docs into sub-documents. Shard rooms across servers; load-balance WebSockets by room ID.
8. Rendering performance. Re-renders on every remote op. Throttle to ~30fps, use ProseMirror's decoration API to draw cursors without re-rendering the doc body.
Why not contenteditable directly? Browsers' contenteditable is inconsistent and produces weird DOM (Firefox vs WebKit differ on Enter handling, bold nesting, paste behavior). Frameworks like ProseMirror and Slate replace the editing model — they listen to events, mutate their own state, and re-render the DOM. Lexical is Meta's newer alternative.
Build vs buy. Don't build the editor core. Use TipTap (ProseMirror + extensions), Lexical (Meta), or Slate. Pair with Yjs for collaboration. Pair with a hosted provider (Liveblocks, Hocuspocus, Convex) if you don't want to run sync infrastructure.
Senior framing. The candidate's job in this question is to not pretend to invent CRDTs in an interview. Name the layers, name the choices (Yjs vs ShareDB, TipTap vs Lexical), discuss the hard parts (offline, permissions, presence, undo, snapshots), and pick build-vs-buy based on the team size.
Follow-up questions
- •Yjs (CRDT) vs OT — when does each model win?
- •How do you implement per-user undo in a collaborative editor?
- •How would you scale WebSocket connections for millions of concurrent docs?
- •What does Yjs's awareness protocol do?
Common mistakes
- •Trying to implement CRDT merge logic from scratch in an interview.
- •Building on top of `contenteditable` without a framework.
- •Using two histories (local + remote) instead of trusting the CRDT.
- •Not separating ephemeral presence from persistent document data.
Performance considerations
- •Snapshot Yjs state periodically to avoid replaying full history on load.
- •Decoration-based cursor rendering avoids re-renders of the doc body.
- •Shard WebSocket rooms across nodes; pin clients in the same doc to the same node.
Edge cases
- •Network partition — both sides edit, then reconnect; CRDT merges, but users may need a diff view.
- •Large paste of 10MB content — chunk into ops to avoid blocking the main thread.
- •Embeds (videos, attachments) — store as block-level nodes with stable IDs.
Real-world examples
- •Linear, Notion, and Figma use CRDT-based collaboration. Google Docs uses OT.
- •Liveblocks and Hocuspocus are commercial sync backends built on Yjs.