How would you implement A/B testing without affecting current users

Gate variants behind feature flags / an experimentation platform, assign users to stable buckets via consistent hashing, evaluate server-side (or pre-paint) to avoid flicker, keep the control path unchanged, instrument metrics, and ensure a clean kill switch and exposure logging.

7 min read·~15 min to think through

"Without affecting current users" means: the control group's experience is unchanged, there's no flicker, assignment is stable, and you can kill it instantly. A/B testing is feature flags + bucketing + measurement.

1. Assignment — stable and consistent

Consistent hashing — hash a stable id (user id, or a persistent anonymous id) → bucket. The same user always gets the same variant, across sessions and devices. Never random-per-render or per-session.
Define the population — who's eligible (new users? a region? logged-in?). Everyone else stays on control, untouched.
Percentage rollout — start small (1–5%), ramp up. The other 95–99% are unaffected by definition.
Respect held-out control groups for clean measurement.

2. Delivery — no flicker, control untouched

Evaluate server-side / at the edge where possible — the user gets the right variant in the initial HTML, no flash of the control then a snap to variant (the classic A/B "flicker"/FOUC).
If client-side, assign before first paint (a blocking head script or bootstrapped flag values) and gate rendering on flag readiness.
Don't fork the control path — variant code is additive and behind a flag; if the flag is off or evaluation fails, you fall through to the exact existing code path. Control users literally run the same code as before.
Code-split variant code so control users don't even download large variant bundles unnecessarily (or accept a small shared cost).

3. Use an experimentation platform

LaunchDarkly, Statsig, Optimizely, GrowthBook, Split, or in-house. They provide: bucketing, targeting, gradual rollout, exposure logging (who saw what, when), metric association, and statistical analysis. Don't hand-roll the stats.

4. Measurement — the whole point

Exposure events — log when a user is actually bucketed into the experiment (not just eligible), so analysis is sound.
Define metrics upfront — primary success metric + guardrail metrics (don't improve conversion while tanking performance or error rate).
Tie into analytics; let the platform run significance testing. Don't peek-and-stop early.

5. Safety

Kill switch — turn the experiment off instantly without a deploy if a variant misbehaves. This is the core "without affecting users" guarantee for variant users too.
Monitor error rates and performance per variant — a broken variant gets auto-flagged.
Clean teardown — when the experiment concludes, ship the winner as the default and delete the flag and the losing branch (experiment flags are tech debt if they linger).
Avoid overlapping experiments that confound each other (mutual exclusion groups).

The framing

"A/B testing is feature flags + consistent bucketing + measurement. Control users are unaffected because variant code is additive behind a flag — flag off means the original code path runs untouched. I'd assign via consistent hashing on a stable id, evaluate server-side to avoid flicker, start at a small percentage, log exposure events, define success + guardrail metrics, keep a kill switch, and clean up the flag when it's done."

Follow-up questions

•Why is consistent hashing important for bucket assignment?
•How do you avoid the A/B test 'flicker' of control showing before the variant?
•What are guardrail metrics and why do they matter?
•Why must you log exposure events, not just eligibility?

Common mistakes

•Random per-session assignment, so a user flips between variants.
•Client-side evaluation causing a flash of the control before the variant.
•Forking the control path, so a bug in the experiment affects control users.
•No kill switch; no cleanup of finished experiment flags.
•Peeking at results and stopping early; no guardrail metrics.

Performance considerations

•Server-side/edge evaluation removes a client request and the flicker. Code-split variant bundles so control users aren't penalized. Exposure logging must be lightweight. Bucketing should be a cheap local hash, not a network call per render.

Edge cases

•Anonymous users with no stable id (need a persistent client id).
•A user crossing devices mid-experiment.
•Overlapping experiments confounding each other.
•A variant that breaks — must be killable instantly.

Real-world examples

•Statsig/LaunchDarkly/GrowthBook running a gradual rollout with consistent bucketing and exposure logging.
•SSR-evaluated experiments so users never see the control flash before their variant.

Senior engineer discussion

Seniors frame A/B testing as feature flags + consistent bucketing + measurement, and explain 'no impact on current users' precisely: variant code is additive behind a flag so control falls through to the unchanged path. They cover consistent hashing on a stable id, server-side evaluation to kill flicker, exposure logging, guardrail metrics, kill switches, and flag cleanup as the lifecycle discipline.