How would you implement A/B testing without affecting current users
Gate variants behind feature flags / an experimentation platform, assign users to stable buckets via consistent hashing, evaluate server-side (or pre-paint) to avoid flicker, keep the control path unchanged, instrument metrics, and ensure a clean kill switch and exposure logging.
"Without affecting current users" means: the control group's experience is unchanged, there's no flicker, assignment is stable, and you can kill it instantly. A/B testing is feature flags + bucketing + measurement.
1. Assignment — stable and consistent
- Consistent hashing — hash a stable id (user id, or a persistent anonymous id) → bucket. The same user always gets the same variant, across sessions and devices. Never random-per-render or per-session.
- Define the population — who's eligible (new users? a region? logged-in?). Everyone else stays on control, untouched.
- Percentage rollout — start small (1–5%), ramp up. The other 95–99% are unaffected by definition.
- Respect held-out control groups for clean measurement.
2. Delivery — no flicker, control untouched
- Evaluate server-side / at the edge where possible — the user gets the right variant in the initial HTML, no flash of the control then a snap to variant (the classic A/B "flicker"/FOUC).
- If client-side, assign before first paint (a blocking head script or bootstrapped flag values) and gate rendering on flag readiness.
- Don't fork the control path — variant code is additive and behind a flag; if the flag is off or evaluation fails, you fall through to the exact existing code path. Control users literally run the same code as before.
- Code-split variant code so control users don't even download large variant bundles unnecessarily (or accept a small shared cost).
3. Use an experimentation platform
LaunchDarkly, Statsig, Optimizely, GrowthBook, Split, or in-house. They provide: bucketing, targeting, gradual rollout, exposure logging (who saw what, when), metric association, and statistical analysis. Don't hand-roll the stats.
4. Measurement — the whole point
- Exposure events — log when a user is actually bucketed into the experiment (not just eligible), so analysis is sound.
- Define metrics upfront — primary success metric + guardrail metrics (don't improve conversion while tanking performance or error rate).
- Tie into analytics; let the platform run significance testing. Don't peek-and-stop early.
5. Safety
- Kill switch — turn the experiment off instantly without a deploy if a variant misbehaves. This is the core "without affecting users" guarantee for variant users too.
- Monitor error rates and performance per variant — a broken variant gets auto-flagged.
- Clean teardown — when the experiment concludes, ship the winner as the default and delete the flag and the losing branch (experiment flags are tech debt if they linger).
- Avoid overlapping experiments that confound each other (mutual exclusion groups).
The framing
"A/B testing is feature flags + consistent bucketing + measurement. Control users are unaffected because variant code is additive behind a flag — flag off means the original code path runs untouched. I'd assign via consistent hashing on a stable id, evaluate server-side to avoid flicker, start at a small percentage, log exposure events, define success + guardrail metrics, keep a kill switch, and clean up the flag when it's done."
Follow-up questions
- •Why is consistent hashing important for bucket assignment?
- •How do you avoid the A/B test 'flicker' of control showing before the variant?
- •What are guardrail metrics and why do they matter?
- •Why must you log exposure events, not just eligibility?
Common mistakes
- •Random per-session assignment, so a user flips between variants.
- •Client-side evaluation causing a flash of the control before the variant.
- •Forking the control path, so a bug in the experiment affects control users.
- •No kill switch; no cleanup of finished experiment flags.
- •Peeking at results and stopping early; no guardrail metrics.
Performance considerations
- •Server-side/edge evaluation removes a client request and the flicker. Code-split variant bundles so control users aren't penalized. Exposure logging must be lightweight. Bucketing should be a cheap local hash, not a network call per render.
Edge cases
- •Anonymous users with no stable id (need a persistent client id).
- •A user crossing devices mid-experiment.
- •Overlapping experiments confounding each other.
- •A variant that breaks — must be killable instantly.
Real-world examples
- •Statsig/LaunchDarkly/GrowthBook running a gradual rollout with consistent bucketing and exposure logging.
- •SSR-evaluated experiments so users never see the control flash before their variant.