System Design

hard

senior

Designing an offline-first frontend

Treat the local store as the source of truth; the network is an eventual-consistency partner. Cache shell assets via a Service Worker, persist data in IndexedDB (Dexie), queue mutations for replay on reconnect, and resolve merge conflicts via versioning or CRDTs. Surface offline state in the UI; never block the user on a request. Plan for partial connectivity (slow, flaky, captive portal), not just on/off.

10 min read·~25 min to think through

Offline-first reframes the architecture: the network is unreliable; the local cache is authoritative. Every read hits local first, every write happens locally first and syncs in the background.

The four layers

┌──────────────────────────────────────────────┐
│ UI: reads from local store, never blocks     │
│      on network. Shows sync status.          │
├──────────────────────────────────────────────┤
│ Local store: IndexedDB (Dexie / idb)          │
│      - structured data, mutations queue       │
├──────────────────────────────────────────────┤
│ Sync engine: background fetch + reconciliation│
│      - retry with backoff                     │
│      - conflict resolution                    │
├──────────────────────────────────────────────┤
│ Service Worker: app shell + asset cache      │
│      - cache-first for static, stale-while-  │
│        revalidate for HTML                    │
└──────────────────────────────────────────────┘

Layer 1: app shell via Service Worker

Cache the HTML, JS, CSS, fonts so the app loads with no network at all.

// sw.js
self.addEventListener("install", e => {
  e.waitUntil(caches.open("shell-v1").then(c => c.addAll(["/", "/app.js", "/app.css"])));
});

self.addEventListener("fetch", e => {
  if (e.request.destination === "document") {
    // stale-while-revalidate for the HTML
    e.respondWith(staleWhileRevalidate(e.request));
  } else if (e.request.destination === "script" || e.request.destination === "style") {
    // cache-first for hashed assets
    e.respondWith(caches.match(e.request).then(r => r || fetch(e.request)));
  }
});

Use Workbox in production — handles versioning, navigation preload, expirations.

Layer 2: data in IndexedDB

localStorage is sync, capped at ~5MB, string-only. Use IndexedDB via a wrapper:

// Dexie
const db = new Dexie("app");
db.version(1).stores({
  tasks: "id, projectId, updatedAt",
  mutations: "++id, createdAt",
});

await db.tasks.add({ id: "1", title: "buy milk", updatedAt: Date.now() });
const myTasks = await db.tasks.where({ projectId: "p1" }).toArray();

UI reads through the local store and uses subscriptions / liveQuery so writes (local or remote-synced) propagate to the UI automatically.

Layer 3: mutation queue

Every user action becomes a mutation that's stored locally and applied to the local DB optimistically:

async function createTask(t: Task) {
  await db.transaction("rw", db.tasks, db.mutations, async () => {
    await db.tasks.add(t);
    await db.mutations.add({ op: "create-task", payload: t, createdAt: Date.now() });
  });
  flushQueue(); // fire-and-forget
}

async function flushQueue() {
  const pending = await db.mutations.orderBy("id").toArray();
  for (const m of pending) {
    try {
      await api.send(m);
      await db.mutations.delete(m.id);
    } catch {
      break; // retry on next reconnect
    }
  }
}

Trigger flushQueue on:

App start.
Network back online (window.addEventListener("online", flushQueue)).
After each user action.
Periodically (every 30s) for partial-connectivity resilience.

Background Sync API (registration.sync.register("sync-mutations")) lets the Service Worker flush even after the tab is closed — great for mobile.

Layer 4: conflict resolution

When two clients edit the same record while offline, who wins?

1. Last-write-wins (LWW).

Each record has a updatedAt timestamp. On sync, server takes the newer. Simple, but loses data silently if clocks drift.

2. Version vectors / Lamport timestamps.

Each record has a version per client. Conflict = both versions advanced. Surface to the user for manual resolve.

3. CRDTs.

The principled solution. Each operation is commutative — apply in any order, converge. Yjs / Automerge for structured docs. Higher up-front cost; eliminates manual conflict UI.

4. Optimistic with revert.

Apply locally, send to server. If server rejects (validation, conflict), roll back the local change and show a toast.

UI: surface state

Online / offline indicator in the app shell.
Per-item sync state — pending, synced, error.
Disable destructive actions offline only when truly required (most can queue).
Optimistic counts ("3 unsynced changes").
Conflict banner when manual merge is needed.

Don't lie. Don't show "Saved" when it's just in the local queue — say "Saved locally, syncing…"

The catches

1. Storage quotas. IndexedDB has per-origin limits (~10% of disk on desktop, less on mobile). Use navigator.storage.estimate() to monitor; navigator.storage.persist() to request persistent storage (won't be evicted under pressure).

2. Cross-tab coordination. Two tabs both flushing the queue → duplicate mutations. Use BroadcastChannel + a single-writer lock (navigator.locks.request()).

3. Auth. What if the refresh token expires while offline? The mutation queue holds work that requires a new login on reconnect. UX: explicit re-login banner; never silently drop work.

4. Schema migrations. A user opens the app after months — local DB is on schema v1, app expects v3. Migration scripts must run before any data is read.

5. Privacy / multi-user devices. Offline data persists across logouts. Clear IndexedDB on logout.

6. Captive portals & flaky networks. navigator.onLine === true doesn't mean you have internet — it means you have a link. The real test is whether a small ping to your API succeeds. Build a reachability probe, don't trust the flag alone.

When NOT to go offline-first

Read-only consumer products where users always have data (news, video streaming) — Service Worker for the shell is enough.
Highly collaborative real-time apps where the source of truth is server state — offline is a degraded mode, not the default.
Cost: offline-first roughly doubles the implementation effort. Adopt when the product requires it (field workers, mobile-flaky markets, "works on a plane" UX).

Senior framing. Mention: (1) Service Worker for shell, (2) IndexedDB + queue, (3) optimistic UI, (4) conflict strategy chosen consciously (LWW / CRDT), (5) cross-tab coordination, (6) auth-while-offline, (7) UI honesty about sync state. The "we use a Service Worker" answer is junior; the architecture above is senior.

Follow-up questions

•Why is `navigator.onLine` unreliable?
•When would you choose CRDTs over LWW for conflict resolution?
•How do you coordinate sync across multiple tabs?
•What happens to the mutation queue when the user logs out?

Common mistakes

•Trusting `navigator.onLine` as a connectivity check.
•Using localStorage for app data — sync, capped, string-only.
•Letting two tabs flush the queue concurrently.
•Silent LWW conflict resolution that drops user changes.

Performance considerations

•IndexedDB writes are async but can be slow on mobile — batch in transactions.
•Service Worker cache strategies — stale-while-revalidate balances freshness and speed.
•Don't pre-cache the entire dataset; sync on demand by entity.

Edge cases

•Schema migrations from old offline data.
•Storage quotas hit mid-write.
•Background Sync not supported on iOS Safari (use periodic sync polyfill on app open).

Real-world examples

•Notion, Linear, Things — local-first with sync engines.
•Google Docs — online-first with offline as degraded mode.
•Replicache, Liveblocks, Triplit — commercial sync engines.