Browser-Use: Teaching LLMs to Drive the Web with Playwright and Vision

Hook

With 81,972 GitHub stars, Browser-Use has become one of the fastest-growing browser automation frameworks—not by replacing Playwright, but by giving it a brain.

Context

Traditional browser automation requires developers to write explicit scripts for every click, form field, and navigation step. Playwright and Selenium excel at this deterministic approach, but they crumble when faced with dynamic UIs, A/B tests, or workflows that vary by context. Meanwhile, LLMs have proven capable of reasoning about tasks in natural language, but lack the grounding to interact with actual web pages.

Browser-Use bridges this gap by wrapping Playwright in an agent loop: the LLM observes the current DOM state, reasons about what action to take next, and executes browser commands through a structured action space. Instead of writing 50 lines of selectors to fill a job application, you write one line: task="Fill in this job application with my resume". The library handles the perception-action loop, retry logic, and state management. It’s backed by a dual model: an open-source library for local control and a paid cloud service offering stealth browsing and a proprietary ChatBrowserUse model that completes tasks 3-5x faster than other models with claimed state-of-the-art accuracy.

Technical Insight

System architecture — auto-generated

The architecture follows a classic agent pattern with three phases: perception, reasoning, and action. On each iteration, Browser-Use captures the current browser state by extracting a simplified DOM representation (text content, clickable elements with indices, form fields) and optionally takes a screenshot. This state gets serialized into the LLM’s context window alongside the task description and conversation history.

Here’s the minimal setup from the README:

from browser_use import Agent, Browser, ChatBrowserUse
import asyncio

async def main():
    browser = Browser()
    agent = Agent(
        task="Find the number of stars of the browser-use repo",
        llm=ChatBrowserUse(),
        browser=browser,
    )
    await agent.run()

asyncio.run(main())

Under the hood, the Browser class manages a Playwright browser instance. The Agent orchestrates the loop: it feeds the current state to the LLM, parses the structured response (which specifies an action type like click, type, or navigate with parameters), executes that action via Playwright, and repeats until the task is complete or max steps are reached.

The action space is deliberately constrained. Instead of exposing raw Playwright APIs, Browser-Use provides a fixed set of high-level commands: click(index), type(text), navigate(url), scroll(direction), extract_content(), and done(). Elements are identified by integer indices assigned during DOM extraction, avoiding the fragility of CSS selectors or XPath in dynamic UIs. This abstraction means the LLM doesn’t need to understand selector syntax—it just sees “Element 5 is a ‘Submit’ button” and decides whether to click it.

The library supports multiple LLM providers through a unified interface. You can swap ChatBrowserUse() for ChatGoogle(model='gemini-3-flash-preview') or ChatAnthropic(model='claude-sonnet-4-6') without changing agent logic. The proprietary ChatBrowserUse model is optimized specifically for browser tasks.

Extensibility comes via custom tools. The README mentions template options including a tools template that demonstrates injecting custom actions into the agent’s capabilities. This lets you add domain-specific operations (like “check_inventory” for an e-commerce workflow) that the LLM can invoke alongside standard browser actions. The template system (uvx browser-use init --template default|advanced|tools) generates boilerplate code for common patterns.

A standout feature is the CLI for persistent browser sessions. Commands like browser-use open https://example.com, browser-use state, and browser-use click 5 let you manually drive a browser session while the agent maintains state. This is invaluable for debugging—you can step through an agent’s thought process, inspect what elements it’s seeing, and manually override actions when it gets stuck. The README shows:

browser-use open https://example.com
browser-use state                       # See clickable elements
browser-use click 5                     # Click element by index
browser-use type "Hello"
browser-use screenshot page.png
browser-use close

The cloud offering (Browser(use_cloud=True)) addresses two pain points: stealth browsing and infrastructure. Many sites block automation via bot detection (Cloudflare, PerimeterX, etc.). The cloud service appears to manage residential proxies, browser fingerprinting, and CAPTCHA handling based on the stealth-enabled browser description. It also scales horizontally—instead of managing Playwright instances on your servers, you outsource execution to Browser-Use’s infrastructure and pay per task.

Gotcha

Browser-Use inherits all the non-determinism and cost of LLM-driven systems. Tasks that would take 10 lines of Playwright code (like clicking a known button) now require multiple LLM calls, each costing tokens and adding latency. While the README doesn’t publish detailed benchmarks, the proprietary ChatBrowserUse model is claimed to complete tasks 3-5x faster than other models, though this introduces vendor lock-in and ongoing API costs beyond your base LLM provider.

Reliability is fundamentally probabilistic. The agent can get stuck in loops (repeatedly clicking the wrong element), hallucinate actions that don’t exist (trying to click element 99 when only 20 exist), or misinterpret ambiguous UIs. Multi-step workflows with conditional logic (“if the price is over $100, apply coupon code X”) are fragile—the LLM might skip steps or apply incorrect reasoning. The DOM extraction likely works well for simple pages but may struggle with heavily obfuscated markup, infinite scroll (elements not yet rendered aren’t visible to the agent), and SPAs that mutate state without navigation events. You’ll need fallback logic and monitoring for production use cases.

The CLI’s persistent session feature, while useful for debugging, isn’t extensively documented beyond the basic command list shown in the README. There’s no mention of session persistence across restarts, state serialization, or detailed integration with the agent loop. The template system generates helpful starter code, but the README doesn’t provide detailed explanations of what configuration options exist in the advanced template or what specific custom tools are demonstrated in the tools template—you have to generate them to find out.

Verdict

Use Browser-Use if you’re prototyping web automation for unpredictable UIs (job applications, form filling, data extraction from varied sites) and want to skip writing brittle selectors. It shines for one-off tasks, internal tools where latency and cost aren’t critical, and workflows where human-in-the-loop oversight is acceptable. The cloud service makes sense if you’re hitting bot detection on production sites and don’t want to manage proxy infrastructure. Skip it if you need deterministic, production-grade automation where every action must succeed predictably—stick with vanilla Playwright for that. Also skip if you’re cost-sensitive (LLM calls add up fast on high-volume tasks) or require fine-grained control over timing, network interception, or advanced Playwright features that aren’t exposed through the agent’s action space. This is a high-level abstraction that trades control for convenience—know which you need before committing.

Browser-Use: Teaching LLMs to Drive the Web with Playwright and Vision

Browser-Use: Teaching LLMs to Drive the Web with Playwright and Vision

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Browser-Use: Teaching LLMs to Drive the Web with Playwright and Vision

Hook

Context

Technical Insight

Gotcha

Verdict

// RELATED

LLM Scraper: When AI Replaces CSS Selectors in Web Scraping

Firecrawl: The Web Scraping API Built for LLMs That Actually Handles JavaScript

AgentQL: Natural Language Web Scraping That Survives UI Changes

Authomatic: The Framework-Agnostic Python Authentication Library You've Probably Never Heard Of

LLM Scraper: When AI Replaces CSS Selectors in Web Scraping

Firecrawl: The Web Scraping API Built for LLMs That Actually Handles JavaScript

AgentQL: Natural Language Web Scraping That Survives UI Changes

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]