Designing a frontend feature flag system
Two flag types: release flags (kill-switch, gradual rollout — short-lived) and experiment flags (A/B with variants and metrics — bounded life). Evaluate server-side when possible to avoid bundle-time flicker; deliver to the client via a context provider and a stable hashing rule on user/session id. Bake in: targeting rules, percentage rollouts, sticky assignment, defaults that fail safe, dashboards, and a process to remove dead flags.
Feature flags are easy to add and impossible to remove. Designing the system means designing the lifecycle, not just the runtime check.
Two flag types — keep them separate
Release flags. Ship code dark; toggle on for some / all users; remove the flag once stable. Short-lived (days/weeks). Used for: gradual rollout, kill-switch, ops control.
Experiment flags. A/B test variants, measure metrics, pick a winner. Bounded life (1–4 weeks typical), then promote winner and remove.
Mixing them — "is this user in the new flow?" being both a rollout gate AND an experiment — is the path to flag chaos. Use different namespaces, dashboards, and ownership.
The evaluation question: server, edge, or client?
| Location | Pros | Cons |
|---|---|---|
| Server (during SSR) | No flash, encoded in HTML | Need user id at request time |
| Edge (CDN with vary key) | Fast, no SSR cost | Vary keys multiply cache entries |
| Client (after first paint) | Simple, low infra | UI flicker, blocked first-paint logic |
Default to server-side evaluation. Resolve flags during SSR / on the first server-rendered HTML, embed the resolved values in the page, and hydrate. Avoids the "user briefly sees the old UI then it flips" experience.
If you must evaluate client-side, bootstrap with defaults that are safe (the conservative path) so first paint isn't blocked.
Data flow
flag service (LaunchDarkly / GrowthBook / Statsig / self-hosted)
↓ rule definitions (push or pull)
SSR server → HTML with resolved flags inlined
↓
Browser (FlagProvider context)
↓
useFlag("new-checkout")Sticky assignment. A user must get the same variant across sessions, devices, and re-fetches. Hash a stable key (user id, or anonymous id from a long-lived cookie) into the bucket:
function bucket(userId: string, flag: string, pct: number): boolean {
const hash = murmur3(`${flag}:${userId}`);
return (hash % 100) < pct;
}Critical that the hash function is stable across server and client — otherwise SSR shows one variant and client renders another.
The API surface
// hook
const showNewCheckout = useFlag("new-checkout", { default: false });
// component (for code splitting)
<Flag name="new-checkout" fallback={<OldCheckout />}>
<NewCheckout />
</Flag>
// imperative (for non-React, e.g., analytics middleware)
flags.isEnabled("new-checkout", { userId });Defaults that fail safe. The default should be the boring, known-good path. If the flag service is down or the value is missing, render the legacy experience.
Targeting rules
The runtime needs to evaluate against attributes:
- User id (for sticky pct rollout).
- User attributes (plan tier, region, beta opt-in).
- Session attributes (device, locale, app version).
- Random salt for control vs treatment.
Rule examples: "Enable for 5% of pro users in US"; "Enable for users in this allow-list".
Lifecycle — the part everyone skips
A flag system without a cleanup process accumulates flags forever. After 18 months: 200 flags, 30% dead code paths, nobody knows what's safe to remove.
Process to enforce.
- Required metadata per flag: owner, created date, expected removal date, purpose (release / experiment / ops).
- Stale-flag alerts: any flag older than 90 days with 100% / 0% rollout is dead — bot opens a removal PR.
- Two-step removal: first remove the flag's check in code (always-on or always-off); deploy; then remove the definition in the flag service. Avoids race during deploy.
- Code review rule: every new flag PR must include a removal plan.
The release-flag workflow
1. PR adds new code behind flag, default off.
2. Merge to main; deploy. Code is dark.
3. Enable for internal users (employee email allow-list).
4. Enable for 1% → 5% → 25% → 50% → 100% over days.
5. Monitor error rates and product metrics per cohort.
6. At 100% stable for a week, schedule removal.
7. PR removes the flag check; code becomes default.
8. Delete the flag definition.The experiment-flag workflow
1. Define hypothesis + primary metric.
2. Set variant split (e.g., 50/50).
3. Run for N users / N days for statistical power.
4. Analyze; pick winner.
5. Ship the winner; delete the loser path.Need a stats layer (CI intervals, p-values, sequential testing). Don't roll your own — use GrowthBook / Statsig / Optimizely.
Pitfalls
- Flag depth. Code paths nested 3 flags deep are untestable. Limit nesting; refactor into composable branches when needed.
- Flag-coupled state. Storing data shapes that differ per variant is dangerous — both branches must read each other's data on rollback.
- Hash drift. Changing the hash function or seed reshuffles users; an experiment in flight gets invalidated.
- Client-side only. Easy mode; opens the door to flicker, bundle-bloat, and tampering (user toggles flag in DevTools). Server-side at minimum for sensitive flows.
- Default-on flags. Make a deploy non-rollbackable for users already on. Keep defaults off until tested.
- No removal owner. "I'll clean up later" never happens. Owner + expiry up front.
Tooling — build vs buy
| Need | Build | Buy |
|---|---|---|
| 10 flags, 10 engineers | YAML config in repo, deploy on change | overkill |
| 100 flags, 50 engineers | Self-host (Unleash, GrowthBook OSS) | LaunchDarkly / Statsig |
| Experiment statistics | Hard — use a library | GrowthBook / Statsig built-in |
Build when flags are simple, infrequent, or compliance-restricted. Buy when you need targeting rules, gradual rollouts with monitoring, and experimentation in one place.
Senior framing. The candidate who can describe (1) flag types separation, (2) server-side evaluation to avoid flicker, (3) sticky hashing for consistent assignment, (4) lifecycle / cleanup process, (5) build-vs-buy with reasons — is senior. The "we use process.env.FEATURE_X" answer is junior.
Follow-up questions
- •Why is server-side evaluation usually preferred?
- •What metadata does every flag need to avoid permanent accumulation?
- •How would you handle a long-running experiment fairly across new users joining mid-experiment?
- •What's the trade-off between LaunchDarkly and self-hosted Unleash?
Common mistakes
- •Mixing release flags and experiment flags in one system.
- •Client-only evaluation causing flicker.
- •No cleanup process — flag debt accumulates.
- •Defaults that fail-open into unfinished code paths.
Performance considerations
- •Edge evaluation needs cache vary keys per flag combination — can balloon CDN entries.
- •Client SDKs have a small initial payload + a streaming connection for live updates — budget the kB.
Edge cases
- •Anonymous → logged-in user assignment continuity.
- •Flag rollout during a deploy — code expects flag to exist before the value is published.
- •Multi-region: flag changes propagate asynchronously; brief inconsistency between regions.
Real-world examples
- •LaunchDarkly, GrowthBook, Statsig, Optimizely, Vercel Toolbar flags.
- •GitHub's `feature.enabled` system, Shopify's stages of rollout.