The browser interface AI agents deserve.

AWI is an MCP server that gives AI agents a compact, semantic view of any web page — instead of 40,000 bytes of DOM they can't reason over.

Stable element IDs
Token-efficient snapshots
No raw DOM noise

$npx agent-web-interface install

View on GitHub

agent · semantic snapshot

▸navigate"example.com/checkout"

<page url="…/checkout"
      title="Checkout">
  <main>
    <heading lvl=1>Your Cart (3)</heading>
    <button eid="btn-checkout"
            label="Proceed to Payment" />
  </main>
</page>

Same page, what's sent

Raw DOM41 KB

AWI0.4 KB

stable across calls

Open-source
MIT-licensed
Works with Claude, GPT-4o, Gemini, Ollama
Requires Node.js + Chrome

The Problem

Browser automation is easy for scripts.

Hard for agents.

Current tools dump thousands of tokens of noise into the agent's context. Tasks fail mid-flow. Selectors break on the next deploy. And there's no stable way to reference an element across calls.

Raw DOM

<div class="wrapper xyz-123">
  <div class="inner" data-v="8">
    <button type="submit"
      class="btn btn-primary
      x-4 y-2 md:x-6">
      Submit
    </button>
  </div>
</div>

40,000+ bytes of noise. Brittle class selectors. Context exhaustion.

Accessibility Tree

RootWebArea "Checkout"
  generic ""
    generic "wrapper xyz-123"
      generic "inner"
        button "Submit"
          StaticText "Submit"

Better, but still verbose. No stable IDs to reference across calls.

Agent Web Interface

<page title="Checkout">
  <region role="main">
    <button
      eid="btn-1"
      label="Submit" />
  </region>
</page>

Compact. Stable eid. Only what the agent needs.

Live Example

See what your agent sees.

Every MCP tool call returns a structured semantic snapshot — not raw DOM.

agent-web-interface

input → tool call

{
  "tool": "navigate",
  "params": {
    "url": "https://example.com/checkout"
  }
}

output → snapshot

<page url="https://example.com/checkout"
      title="Checkout">
  <region role="main">
    <heading level="1">Your Cart (3 items)</heading>
    <list>
      <item eid="p1">Widget Pro — $29.99</item>
      <item eid="p2">Widget Lite — $9.99</item>
    </list>
    <button eid="btn-checkout"
            label="Proceed to Payment" />
  </region>
</page>

The agent references btn-checkout by eid across every call that follows — no selector hunting, no re-finding elements after navigation.

How It Works

Five steps. One install command.

Agent calls a browser tool via MCP

navigate, click, find, type, screenshot…

AWI intercepts via MCP

A local server launched with npx — no daemon to manage

Puppeteer drives Chrome

Local Chrome via CDP — real rendering, full JS

Page reduced to semantic XML

Headings, buttons, links, forms — no raw markup

Agent receives stable eids

Reference the same button across 10 tool calls. No re-finding elements.

AWI runs locally and needs Node.js and a Chrome install. For serverless or shared deployments, see AWI Cloud.

Features

What agents get that Playwright doesn't.

Semantic Snapshots

Regions, headings, links, buttons — not 40,000 bytes of DOM. Agents reason over structure, not markup.

Stable Element IDs

Every interactive element gets a stable eid. Reference it across 10 tool calls — CSS classes change on the next deploy, eid doesn't.

Token-Efficient

Snapshots return only the structure an agent needs to act — not the full page on every call. Longer task horizons. Lower token spend.

Network Inspection

See every request that followed an action — verify form submissions, trace auth flows, and debug redirects without a devtools tab.

Canvas Inspection

When the page is a canvas, chart, or image, capture a screenshot or read canvas data directly. Semantic snapshots where they work; pixels where they don't.

Model-Agnostic

Works with any MCP-compatible agent runtime: Claude, GPT-4o, Gemini, local models. No vendor lock-in.

What People Build

Agents that act on the live web.

Web Research Agents

Navigate, read, and extract from live pages — pricing monitors, news aggregators, and competitor trackers that work on real rendered content, not stale APIs.

QA & Flow Automation

Walk through sign-up, checkout, and onboarding flows, fill forms, and verify responses — without the fragile CSS selectors that break Playwright suites.

Data Entry & Form Filling

Log in, find fields by their label, and submit — even on pages where the DOM shifts between visits and class names change on every deploy.

Managed Cloud

Don't want to manage Chrome?

AWI Cloud runs headless Chrome for you. Connect via API key in seconds, share sessions across your team, and pay only for the browser time you use.

No local Chrome or Node.js to install — just an API key
Share browser sessions and credentials across your team
Pay by the minute — no seat fees, no commitments
Start free, upgrade when you scale

Get Started

One command. Agents browsing.

$npx agent-web-interface install

Open-source · MIT-licensed · Works with Claude, GPT-4o, Gemini, and any MCP-compatible agent.