For years, adding AI features to a web application meant the same thing: send user data to a cloud API, wait for the response, pay per token, and hope your provider’s privacy policy holds up. That model isn’t going away, but a second option has quietly arrived — and it changes the calculus for a surprising number of use cases.
Chrome and Microsoft Edge now ship built-in AI APIs that run entirely on-device. No API keys. No round trips. No data leaving the browser. If you build for the web, this is worth understanding right now.
TL;DR
- Chrome and Edge now include built-in AI APIs (Translator, Summarizer, Language Detector, Writer, Rewriter, Prompt, Proofreader) that run locally on-device using models like Gemini Nano and Phi-4-mini.
- Three APIs — Translator, Summarizer, and Language Detector — are production-ready today; four more are available on an opt-in experimental basis.
- On-device inference means zero latency from network round trips, complete data privacy, and no per-token API costs.
- The trade-offs are real: initial model downloads are in the gigabyte range, only Chromium-based browsers are supported, and model capabilities are limited compared to cloud-hosted LLMs.
- Smart development teams are adopting a hybrid architecture — browser-native AI for lightweight tasks, cloud APIs for heavy reasoning — to cut costs and improve user experience.
What’s Actually Available Right Now
As of April 2026, Chrome and Edge offer two tiers of browser-native AI capability.
Production-Ready APIs
Three APIs are stable and available to all users immediately:
- Translator API — Translates text between language pairs, assuming the relevant model is downloaded. Available in both Chrome and Edge.
- Language Detector API — Identifies the language of input text. Currently Chrome-only, with Edge support planned.
- Summarizer API — Condenses text into headlines, summaries, key points, or TL;DRs. Supports configurable output types and lengths. Available in both browsers.
Experimental APIs (Opt-In)
Four additional APIs are available behind flags for developers who want to start building today:
- Writer API — Generates text from prompts.
- Rewriter API — Revises existing text based on instructions.
- Prompt API — Enables natural language requests to the on-device model.
- Proofreader API — Checks spelling and grammar.
Under the bonnet, Chrome runs Gemini Nano while Edge uses Phi-4-mini. Both share the Chromium codebase, so the API surface is largely identical — the difference is the model doing the work.
How It Works in Practice
The developer experience is refreshingly simple. Here’s the pattern, using the Summarizer API as an example:
Step 1: Check availability. You verify the API exists in the browser and confirm the model is ready:
if ('Summarizer' in self) {
const status = await Summarizer.availability();
// 'available', 'downloadable', or 'unavailable'
}
Step 2: Create the summarizer with your parameters — output type (teaser, TL;DR, headline, key points), length (short, medium, long), and optional shared context:
const summarizer = await Summarizer.create({
type: 'key-points',
length: 'medium',
sharedContext: 'Technical blog post about web development'
});
Step 3: Stream the output, just like you would with a cloud API:
const stream = summarizer.summarizeStreaming(articleText);
for await (const chunk of stream) {
outputElement.textContent = chunk;
}
No API key management. No CORS configuration. No billing dashboard. The model runs in the browser process itself.
The Real Advantages
Privacy by Architecture
When the model runs on the user’s device, sensitive data never leaves. This isn’t a policy promise — it’s a structural guarantee. For applications handling medical notes, legal documents, financial data, or personal communications, that distinction matters enormously. GDPR compliance becomes simpler when there’s no data transfer to account for.
Zero-Latency Inference
Cloud AI APIs typically add 200-800ms of network latency before the model even starts generating. On-device inference eliminates that entirely. For real-time features — auto-complete, live translation, instant summarisation — the difference is visceral.
Cost Elimination
There are no per-token costs for browser-native AI. If you’re currently paying for cloud API calls to handle translation, summarisation, or text classification, the maths is straightforward: those costs drop to zero for supported use cases.
Offline Capability
Once the model is downloaded, these APIs work without an internet connection. Combined with service workers and local storage, you can build genuinely offline-capable AI features — something that’s been essentially impossible with cloud-dependent architectures.
The Trade-Offs You Need to Know
This isn’t a replacement for cloud AI. The limitations are real and worth understanding before you commit to an architecture.
Model Downloads Are Heavy
The on-device models are in the gigabyte range. There’s no lightweight “micro” version. Users need to download these models before first use, and there’s currently no programmatic way to manage or pre-trigger the download — it happens through browser internals. For users on slow connections or constrained devices, this is a meaningful barrier.
Chromium Only
Firefox and Safari don’t support these APIs. As of April 2026, you’re building for roughly 75-80% of desktop users and a larger share of mobile (via Chrome for Android). That’s a strong majority, but it’s not universal. You’ll need fallback strategies.
Capability Ceiling
Gemini Nano and Phi-4-mini are impressive for their size, but they’re not Claude or GPT-4. Complex reasoning, long-context analysis, and nuanced generation still require cloud-hosted models. Browser-native AI excels at well-scoped tasks: translation, summarisation, classification, simple generation.
Limited Debugging
The tooling is still maturing. Model management is only accessible through chrome://on-device-internals/, and there’s limited visibility into inference performance or model status from JavaScript. You’re somewhat flying blind compared to the observability you’d get with a cloud provider.
The Hybrid Architecture: Where This Is Heading
The smartest teams we’re working with aren’t choosing between browser-native and cloud AI — they’re using both. The pattern emerging is a tiered inference architecture:
- Tier 1 (Browser-native): Translation, summarisation, language detection, spell-checking, simple text generation. Fast, free, private.
- Tier 2 (Cloud API): Complex reasoning, long-context analysis, code generation, multi-modal tasks. More capable, but slower and costlier.
The browser-native capability check becomes your routing logic:
async function summarise(text) {
if ('Summarizer' in self) {
const status = await Summarizer.availability();
if (status === 'available') {
return browserSummarise(text); // Free, instant, private
}
}
return cloudSummarise(text); // Fallback to API
}
This pattern — try local first, fall back to cloud — gives you the best of both worlds. Users with capable browsers get instant, private results. Everyone else still gets a working feature.
What This Means for Web Development Teams
Browser-native AI isn’t a gimmick. It’s the beginning of a genuine platform shift. The Web Machine Learning Community Group is working towards standardisation, which means these capabilities will eventually move beyond Chromium.
If you’re planning a product roadmap, here’s what we’d recommend:
- Audit your current cloud AI usage. Which API calls handle simple, well-scoped tasks that browser-native models could manage?
- Prototype with the stable APIs. The Translator and Summarizer APIs are production-ready. Build a proof of concept and measure the UX improvement.
- Design for graceful degradation. Your architecture should work without browser-native AI. Treat it as a progressive enhancement.
- Watch the experimental APIs. The Prompt API and Writer API will open up significantly more use cases once they stabilise.
The era of AI-as-a-web-primitive is here. The question isn’t whether to use it — it’s how quickly you can integrate it into your stack.
Need Help Building AI Into Your Web Application?
At REPTILEHAUS, we specialise in building modern web applications that leverage cutting-edge browser capabilities — from progressive web apps to AI-powered features. Whether you’re exploring browser-native AI, building hybrid inference architectures, or integrating AI agents into your product, our team can help you ship faster and smarter. Get in touch.
📷 Photo by Zulfugar Karimov on Unsplash



