Insights10 min read

Playwright MCP, Chrome MCP, Agent Browser, Stagehand: What Actually Matters for Testing

By qtrl Team · Engineering

Every week there's a new tool that lets AI agents control a browser. Playwright MCP, Chrome MCP, Vercel's Agent Browser, Stagehand by Browserbase. The space is moving fast, and if you're building or running a QA team, it's worth understanding what each one actually does.

The good news: these tools are real, they work, and they solve meaningfully different problems. The nuance is that they're all focused on the automation layer, getting an AI agent to navigate pages, click elements, and verify state. That's a big piece of the puzzle, but it's not the whole picture.

Here's a breakdown of each tool, where it shines, and what sits beyond their scope.

First, What Is MCP?

MCP stands for Model Context Protocol. Anthropic open-sourced it in late 2024 as a standard way for AI models to interact with external tools and services. Think of it as a universal adapter: instead of every AI agent needing custom integrations with every tool, MCP provides a shared protocol that any tool can implement and any agent can consume.

For browser automation specifically, an MCP server wraps a browser engine and exposes actions (click, type, navigate, screenshot) as structured tools that an LLM can call. The AI sends a "click the login button" instruction, the MCP server translates it into actual browser commands, and sends back the result.

It's a clean architecture. But the real question is: which MCP server should you actually use?

Playwright MCP: The Mature Pick from Microsoft

Playwright MCP is the official MCP server from Microsoft's Playwright team, and it's probably the most mature option in this space. If your team already uses Playwright, this is the natural starting point.

Its core design principle: use accessibility tree snapshots instead of screenshots. When an AI agent needs to understand what's on a page, Playwright MCP sends back a structured representation of the page's accessibility tree rather than a pixel image. This means you don't need a vision model. A standard text-based LLM can read the tree and decide what to click, type, or navigate to. It's faster and cheaper per action than screenshot-based approaches.

You get the full Playwright feature set under the hood: cross-browser support (Chrome, Firefox, WebKit, Edge), network interception, device emulation, trace recording, and video capture. It can also output TypeScript Playwright test scripts as it goes, which means you can convert AI-driven explorations into deterministic tests later.

There's also a newer CLI mode (the @playwright/cli package) that reduces token consumption by roughly 4x compared to the MCP protocol, saving snapshots and screenshots to disk instead of streaming them into the LLM context window. It's already integrated into GitHub Copilot's coding agent, Cursor, VS Code, and Cline.

The trade-off: it's designed for isolated, clean browser sessions. If you need to interact with a browser where a user is already logged in with existing cookies and session state, you'll need to work around its default ephemeral approach. There's also a learning curve to configure it properly for your specific environment.

Chrome MCP: From Google's Official Server to Community Extensions

"Chrome MCP" isn't one thing. There's Google's official offering, and then there are several community-built alternatives. They solve different problems.

Google's Chrome DevTools team released Chrome DevTools MCP in September 2025. It uses Puppeteer and the Chrome DevTools Protocol (CDP) under the hood, exposing roughly 29 tools across navigation, input, debugging, network inspection, performance tracing, and even Lighthouse audits. This is the deep diagnostics option. It can record performance traces, extract Core Web Vitals, inspect full request/response bodies, and run accessibility audits. If Playwright MCP tells you what happened from a user's perspective, Chrome DevTools MCP tells you why it happened from the browser's perspective.

On the community side, mcp-chrome by hangwin is a Chrome extension-based MCP server that connects to your daily browser. It can use your existing login sessions, cookies, extensions, and bookmarks. For tasks where SSO or complex auth is required, this is a real advantage over tools that spin up clean browser instances.

The biggest strength across the board: access to things other tools can't reach. Google's backing on the DevTools side means it's not going anywhere, and the extension-based variants give you a uniquely convenient path for tasks that need your real browser session.

The trade-offs: Chrome only, naturally. No Firefox, WebKit, or Safari. Chrome DevTools MCP also consumes roughly 18,000 tokens just for tool definitions, which is 6x more than minimal alternatives. And the community extensions, while handy for local automation, lack session isolation, so you'll want to be careful that your test run isn't affected by whatever else you have open.

Vercel Agent Browser: Built for Speed and AI Agents

Vercel's Agent Browser takes a different approach entirely. It's not an MCP server. It's a Rust-native CLI tool designed specifically for AI coding agents like Claude Code.

The core idea is a snapshot-and-reference system. When the agent takes a snapshot of a page, Agent Browser returns an accessibility tree where every interactive element gets a unique reference (like @e1, @e2). The agent then targets elements by reference instead of using CSS selectors or XPath. No fragile selectors. No "element not found" because the class name changed.

Every browser action is a single CLI command. Open a URL, take a snapshot, click an element, fill a form, take a screenshot. Commands can be chained from any language or framework. The Rust implementation gives it sub-millisecond startup, which matters when an AI agent is executing hundreds of commands per task.

The result is extremely fast. The Rust-native daemon eliminates browser launch overhead on subsequent commands, and the whole thing is designed for AI agents from the ground up: structured JSON output, annotated screenshots for multimodal models, comprehensive error messages that help agents self-correct. No server configuration needed. With 14,000+ GitHub stars, it's seen strong community adoption.

The trade-off: it's focused on giving AI coding agents browser access, not on being a testing framework. There's no built-in test runner, no assertion library, no reporting. You're getting raw browser control and building everything else yourself.

Stagehand: The AI-Native Automation Framework

Stagehand by Browserbase sits in a different category. While the tools above give AI agents raw browser access, Stagehand is a full automation framework that blends traditional code with AI-powered actions.

It exposes three atomic primitives: act (perform an action), extract (pull data from the page), and observe (understand what's on screen). You write code that mixes deterministic Playwright-style commands with natural language instructions. When you know exactly what element to click, you write code. When the page is unfamiliar or the layout might change, you use natural language and let the AI figure it out.

Version 3 (released in 2025) dropped the Playwright dependency in favor of a CDP-native architecture. It introduced self-healing: when a DOM shifts or a layout changes, Stagehand adapts automatically instead of failing. It also caches discovered elements so subsequent runs skip the LLM inference entirely for known paths, which cuts both cost and latency.

That combination of deterministic code for known paths and AI for unpredictable ones is Stagehand's real selling point. It's model-agnostic (works with any LLM or computer-use agent), supports SDKs across TypeScript, Java, Rust, C#, Go, and more, and has 10,000+ GitHub stars. Browserbase, the company behind it, raised $40M in Series B at a $300M valuation in 2025.

The trade-off: for production use, it's tightly coupled to Browserbase's cloud infrastructure. You can run it locally, but the managed browser sessions, stealth mode, and proxy rotation that make it production-ready are paid features. And like the other tools here, it's an automation layer. It doesn't tell you what to test or whether you tested enough.

So Which One Should You Pick?

It depends on what you're optimizing for. If you want cross-browser coverage with strong IDE integration, Playwright MCP is the natural starting point. If you need deep performance diagnostics and debugging, pair it with Chrome DevTools MCP. For quick local automation that uses your existing browser session and login state, one of the community Chrome MCP extensions will get you there fast. If you're building with AI coding agents and speed matters most, Vercel Agent Browser is purpose-built for that. And if you want a full automation framework with self-healing and the flexibility to mix code with natural language, Stagehand is the most complete option, especially paired with Browserbase's infrastructure.

You could even combine a few of these. They're not mutually exclusive.

What They're Great At, and What's Still Missing

These tools have genuinely moved the needle on browser automation. Getting an AI agent to navigate your app, click through flows, and verify page state used to require serious custom engineering. Now you can set it up in an afternoon. That's a real win.

But browser automation is one layer of the testing story. Once you have an agent that can click buttons, a whole set of questions opens up. Which tests should you run for this release? Did the AI actually cover the right flows? How do you run 200 browser tests in parallel without them stepping on each other? When a test fails, is it a real bug or did the AI hallucinate a wrong assertion? Who reviews the results? Where's the audit trail?

These are test management and orchestration problems, and they're outside the scope of what browser automation tools are designed to do. That's not a criticism. Playwright MCP is excellent at driving browsers. Stagehand is excellent at self-healing automation. They're focused tools, and that focus is what makes them good.

The gap is in everything that wraps around them: test case tracking, run management, coverage visibility, guardrails for AI agents, and the infrastructure to execute at scale. That's a different layer entirely, and it's where teams tend to get stuck once the initial "AI can control my browser" excitement wears off.

Where qtrl Fits

qtrl isn't another browser automation tool competing with Playwright MCP or Stagehand. It's the test management and execution infrastructure that sits above them.

qtrl gives you structured test management (organized cases, tracked runs, full traceability) combined with the infrastructure to run AI-powered tests at scale. That means parallel execution across environments, so your 200-test regression suite finishes in minutes, not hours. It means guardrails for AI agents, so they stay focused on the right test paths and produce results you can actually trust. And it means visibility: dashboards that answer the release-readiness question with data instead of gut feeling.

The browser automation tools are getting better every month. Pick the one that fits your stack. qtrl is the layer that ties it all together: what to test, proof that it was tested, and the infrastructure to do it at the speed your release cycle demands. See how it works.