Every product team is bolting streaming AI output onto their interfaces in 2026. Chatbots, copilots, AI-assisted editors, code generators — they all stream tokens in real time. And most of them do it badly.
The problem is not the AI. It is the interface. Streaming content behaves fundamentally differently from static content, and the standard web patterns we have relied on for two decades simply were not designed for text that materialises word by word. The result? Layout shifts, scroll hijacking, elements jumping around, and users losing their place mid-read.
If your team is building anything with streaming AI output — and statistically, you probably are — here is how to get the UX right.
TL;DR
- Streaming AI content creates UX problems that static interfaces never had to solve: layout shifts, scroll hijacking, and container instability
- Reserve space for streaming containers using CSS min-height and aspect-ratio to prevent content below from jumping
- Use intersection-observer-based scroll anchoring instead of aggressive scrollToBottom patterns
- Respect reduced-motion preferences and offer a “show complete” toggle for accessibility
- Skeleton loaders and typing indicators set user expectations before tokens arrive
- Test streaming UX with throttled connections and variable token speeds, not just local dev
The Layout Shift Problem
When a streaming response begins, your container is empty. As tokens arrive, the container grows, pushing everything below it downward. If the user has scrolled to see other content — a previous response, a sidebar element, a form — that content jumps unpredictably. Google penalises layout shifts in Core Web Vitals (Cumulative Layout Shift), and streaming AI output is one of the worst offenders.
The fix is deceptively simple: reserve space before streaming begins. Set a sensible min-height on your streaming container based on the expected response length. For chat interfaces, 120–200px is a reasonable default. For longer-form generation (article drafts, code blocks), use a taller reservation.
.stream-container {
min-height: 200px;
transition: min-height 0.3s ease;
contain: layout;
}
.stream-container[data-streaming="complete"] {
min-height: auto;
}
The contain: layout property is crucial — it tells the browser that size changes inside this element should not trigger layout recalculations for the rest of the page. This alone can eliminate most of the visual instability that makes streaming interfaces feel janky.
Scroll Anchoring Done Right
The most common scroll pattern in streaming interfaces is scrollToBottom() on every token. This is aggressive and hostile. If the user scrolls up to re-read something, the interface yanks them back down with every new word. It is the digital equivalent of someone grabbing your book and flipping to the last page while you are reading.
A better approach uses the Intersection Observer API to detect whether the user is actually at the bottom of the conversation:
const sentinel = document.querySelector('.scroll-sentinel');
let userAtBottom = true;
const observer = new IntersectionObserver(
([entry]) => {
userAtBottom = entry.isIntersecting;
},
{ threshold: 0.1 }
);
observer.observe(sentinel);
function onNewToken(token) {
appendToken(token);
if (userAtBottom) {
sentinel.scrollIntoView({ behavior: 'smooth', block: 'end' });
}
}
Place an invisible sentinel element at the bottom of your scrollable container. When it is visible (the user is at the bottom), auto-scroll. When the user scrolls up, stop auto-scrolling and show a “Jump to latest” button instead. This is the same pattern Slack and Discord use, and users already understand it intuitively.
The Typing Indicator Is Not Optional
There is an awkward gap between the user hitting “send” and the first token arriving. Depending on the model, the prompt size, and the network conditions, this can be anywhere from 200 milliseconds to several seconds. Without feedback, users assume the interface is broken.
A simple three-dot typing indicator or a skeleton loader bridges this gap. But here is the nuance: transition smoothly from the indicator to the actual content. If the typing indicator is in one position and the streamed text appears in another, you have created a visual jump. The indicator should occupy the exact space where the first tokens will appear, then fade out as real content replaces it.
.typing-indicator {
min-height: 1.5em;
opacity: 1;
transition: opacity 0.15s ease;
}
.typing-indicator[data-replaced] {
opacity: 0;
position: absolute;
pointer-events: none;
}
Handling Code Blocks and Structured Output
Streaming plain text is relatively straightforward. Streaming structured content — code blocks, tables, lists, markdown — is where things get properly difficult. A code block mid-stream is syntactically incomplete: the opening fence has arrived but the closing fence has not. Your syntax highlighter may choke, your container may not know its final height, and copy-to-clipboard buttons are useless until the block is complete.
The pragmatic approach is to buffer structured elements. When you detect the start of a code fence or table, collect tokens in a buffer rather than rendering them incrementally. Display a skeleton placeholder (“Generating code…”) until the block is complete, then render the entire block at once with proper syntax highlighting. This trades a small amount of perceived latency for a dramatically better visual experience.
For markdown rendering in general, consider using a streaming-aware parser. Libraries like marked and markdown-it can be configured to handle incomplete input gracefully, rendering what they can and deferring ambiguous constructs until more tokens clarify the structure.
Accessibility: The Forgotten Dimension
Screen readers and streaming content have a complicated relationship. When new text appears in a container marked as a live region (aria-live="polite"), the screen reader announces it. If you are appending word by word, the screen reader announces every. Single. Word. This is unusable.
The solution is to batch screen reader announcements. Instead of marking the streaming container as a live region, use a separate visually-hidden element that you update in sentence-length chunks:
let announcementBuffer = '';
let announceTimeout;
function onNewToken(token) {
renderToken(token); // Visual update, word by word
announcementBuffer += token;
clearTimeout(announceTimeout);
announceTimeout = setTimeout(() => {
liveRegion.textContent = announcementBuffer;
announcementBuffer = '';
}, 1500); // Announce every 1.5 seconds
}
Additionally, always respect prefers-reduced-motion. Some users find the word-by-word appearance of text genuinely disorienting. Offer a “Show complete response” toggle that waits for the full response before displaying it. This is also useful for users on slow connections who would rather wait five seconds and read normally than watch text trickle in.
Performance: Token Rendering at Scale
Appending a DOM node for every token sounds trivial until you have a conversation with fifty messages, each containing hundreds of tokens. Naive implementations create thousands of text nodes, each triggering a layout recalculation. On lower-powered devices — and your users are on lower-powered devices more often than you think — this causes visible jank.
Two patterns help here. First, batch DOM updates using requestAnimationFrame. Collect tokens that arrive between frames and append them in a single operation:
let tokenQueue = [];
let frameRequested = false;
function onNewToken(token) {
tokenQueue.push(token);
if (!frameRequested) {
frameRequested = true;
requestAnimationFrame(flushTokens);
}
}
function flushTokens() {
const batch = tokenQueue.splice(0);
container.textContent += batch.join('');
frameRequested = false;
}
Second, consider virtualising completed messages. Once a message has finished streaming, replace its token-by-token DOM structure with a single pre-rendered HTML block. This reduces the node count dramatically and improves scroll performance for long conversations.
Testing Streaming UX
Here is where most teams fall down: they test streaming on localhost with sub-millisecond token delivery. The experience feels smooth because there is no latency, no variable token speed, and no network jitter. Real-world conditions are messier.
Build a mock streaming provider that lets you control token speed, inject pauses (simulating model “thinking”), and simulate network interruptions mid-stream. Test with Chrome DevTools throttled to “Slow 3G”. Test on a three-year-old Android phone. Test with a screen reader enabled. The streaming UX that works on your M3 MacBook Pro with a fibre connection is not the streaming UX your users will experience.
The Bigger Picture
Streaming AI output is not a temporary UI pattern — it is becoming as fundamental as form inputs or image loading. The teams that treat it as a first-class UX challenge rather than an afterthought will build products that feel polished whilst their competitors feel prototypish.
At REPTILEHAUS, we have been building AI-powered interfaces for clients across SaaS, fintech, and enterprise. The streaming UX patterns above come from real production experience, not theory. If your team is integrating AI features and the interface does not feel right, get in touch — it is exactly the kind of problem we solve.



