Users report intermittent UI glitches in different browsers—how would you troubleshoot
Reproduce systematically — gather environment data, use RUM/session replay, test across browsers. Intermittent + browser-specific points to race conditions, CSS/JS engine differences, third-party scripts, or extensions. Isolate, instrument, and narrow with a binary search.
"Intermittent" + "browser-specific" is a hard combination — you can't fix what you can't see. The whole game is turning it reproducible.
1. Gather data — make the invisible visible
You can't reproduce it locally, so collect from the field:
- Exact environment from reporters — browser + version, OS, device, screen size, network, extensions, steps.
- RUM + error monitoring (Sentry, etc.) — JS errors segmented by browser/version. Is one engine over-represented?
- Session replay (LogRocket, FullStory) — watch the glitch happen in the user's actual session. Often the single most useful tool here.
- Look for patterns — specific browser? viewport? slow network? logged-in vs out? after a specific action?
2. Reason about likely causes (intermittent + cross-browser)
- Race conditions — async ordering, effects firing in unexpected order, requests resolving out of order, animations vs data. "Intermittent" screams timing.
- Browser engine differences — CSS support/quirks (fl/grid edge cases,
:has(), container queries), JS API availability, default styles, event timing, font rendering. - Layout timing — measuring layout before it settles; CLS; race between fonts/images loading and JS.
- Third-party scripts / extensions — ad blockers, password managers, translation tools mutating the DOM; analytics scripts loading non-deterministically.
- Caching/versioning — stale chunks after a deploy, stale service worker.
- State leakage — uncleaned listeners, stale closures, memory growth over a long session.
3. Reproduce and isolate
- Match the environment — exact browser version, real device (BrowserStack/Sauce Labs for the ones you don't have), throttled network/CPU.
- Test with extensions disabled vs enabled; incognito.
- Binary search the cause — disable third-party scripts, toggle features/flags, bisect recent deploys (
git bisectagainst the regression window). - Add targeted instrumentation — log timing/ordering around the suspected area, ship it, watch RUM. Make the intermittent visible.
- Stress it — rapid interactions, slow network, repeated actions to force the race.
4. Fix and verify
- Fix the root cause — guard the race (cancel stale requests, sequence effects, await properly), add the CSS fallback/prefix, defensively handle the third-party DOM mutation.
- Cross-browser test the fix; add a regression test (E2E across browsers via Playwright).
- Confirm via RUM that the error rate drops; follow up with reporters.
The framing
"Intermittent and browser-specific tells me race condition or browser-engine difference. I can't fix what I can't see, so step one is RUM + session replay + environment data to make it reproducible. Then match the environment exactly, binary-search the cause (extensions, third-party scripts, recent deploys), instrument the suspect area, fix the root cause, and add a cross-browser regression test."
Follow-up questions
- •Why does 'intermittent' often point to a race condition?
- •How does session replay help with bugs you can't reproduce?
- •How would you isolate whether a browser extension is the cause?
- •How do you binary-search for which deploy introduced a regression?
Common mistakes
- •Trying to fix it without reproducing it first.
- •Only testing in your own browser.
- •Ignoring extensions and third-party scripts as suspects.
- •Patching the symptom instead of the timing/root cause.
- •No regression test, so it comes back.
Performance considerations
- •Many intermittent glitches are timing/perf-related — layout measured too early, races between fonts/images/JS, jank under throttled CPU. Reproducing under throttling often surfaces them.
Edge cases
- •Bug only on a specific browser version that later auto-updates away.
- •Caused by a user's extension you can't control.
- •Stale cached chunks after a deploy.
- •Only reproduces on slow networks or low-end CPUs.
Real-world examples
- •A glitch traced via session replay to a password-manager extension injecting DOM.
- •An intermittent layout flash fixed by awaiting font load before measuring; git bisect pinpointing the deploy.