Building AI Agents That Remember: Inside browser-use/web-ui’s Persistent Browser Architecture
Hook
Most AI web agents fail at simple tasks like checking your email because they can’t persist login sessions. browser-use/web-ui fixes this by letting you connect to your actual Chrome browser—cookies, sessions, and all.
Context
AI agents that interact with websites face a fundamental challenge: authentication. Traditional browser automation creates isolated browser contexts that forget everything when they close. Need to check your Gmail? The agent has to log in from scratch. Want to pull data from a site with 2FA? That’s a problem.
The browser-use library was designed to make websites more accessible to AI agents by combining browser automation with LLM reasoning capabilities. browser-use/web-ui, built by WarmShao and the browser-use community (with over 15,700 GitHub stars), addresses the session persistence problem by adding a critical feature: the ability to connect AI agents to your existing Chrome browser profile. This isn’t just convenient—it’s transformative. Your agent can now interact with authenticated sessions, access sites behind paywalls you’re subscribed to, and build upon previous interactions instead of starting from zero each time.
Technical Insight
The architecture is built in layers. At the foundation is Playwright for browser automation. The browser-use library provides AI orchestration on top of that. browser-use/web-ui wraps everything in a Gradio interface that exposes configuration for both the LLM provider and browser behavior.
The key innovation is in how it handles browser context management. In standard mode, the system uses Playwright’s managed browser instances—isolated contexts that launch, execute tasks, and terminate cleanly, losing all state.
But when you enable “Use Own Browser” mode and configure BROWSER_PATH and BROWSER_USER_DATA in your .env file, the system switches strategies. Instead of launching a fresh browser instance, it connects to Chrome using your existing user data directory:
# Windows example from the README
BROWSER_PATH="C:\Program Files\Google\Chrome\Application\chrome.exe"
BROWSER_USER_DATA="C:\Users\YourUsername\AppData\Local\Google\Chrome\User Data"
This approach has profound implications. The AI agent now has access to your full browser state. When you ask the agent to “check my GitHub notifications,” it doesn’t need GitHub credentials—it’s already logged in as you. The README explicitly warns you to close all Chrome windows before running the agent in this mode, because Chrome doesn’t allow multiple instances to access the same user data directory simultaneously.
The LLM integration is provider-agnostic. The system supports OpenAI’s GPT models, Anthropic’s Claude, Google’s Gemini, Azure OpenAI, DeepSeek (including the reasoning-enhanced DeepSeek-r1 model added in January 2025), and local models via Ollama. You configure the provider through environment variables, and the Gradio UI exposes model selection. The changelog specifically mentions DeepSeek-r1’s “deep thinking” capabilities—referring to enhanced reasoning that enables more sophisticated multi-step web workflows.
The Docker deployment adds another layer of practicality. The Docker setup includes a VNC server running on port 6080, allowing you to watch the browser interactions in real-time through a web-based VNC viewer:
# Standard Docker setup
docker compose up --build
# ARM64 systems (Apple Silicon)
TARGETPLATFORM=linux/arm64 docker compose up --build
Access the UI at http://localhost:7788 and the VNC viewer at http://localhost:6080/vnc.html with the password set via VNC_PASSWORD in your .env file (defaults to “youvncpassword”). This makes browser-use/web-ui viable for remote deployments where you need to debug agent behavior without local access.
Another standout feature introduced in the January 2025 changelog is the “keep browser open between tasks” option (contributed by @casistack). Previously, the browser would close after each agent run, losing all context. Now you can maintain the browser session across multiple tasks, letting the agent build upon previous interactions. Ask it to “research competitors,” then in a second task “create a comparison spreadsheet”—the tabs and state from the first task remain available.
Gotcha
The custom browser feature is powerful but requires careful setup. The README explicitly warns that you must manually close all Chrome windows before launching the agent, and instructs you to open the web UI in a different browser (Firefox or Edge)—not Chrome—when using Chrome as your persistent browser. This creates a workflow where you’re juggling browser instances and hoping you didn’t miss closing a stray Chrome tab that will cause conflicts.
Platform support reveals some friction. ARM64 systems (like Apple Silicon Macs) require a special TARGETPLATFORM environment variable for Docker builds. The installation also requires running playwright install chromium --with-deps, where the --with-deps flag pulls in system-level dependencies—suggesting the setup can fail on systems missing required libraries.
The Gradio UI provides a convenient interface but abstracts away the underlying Python API. If your use case needs custom error handling or programmatic control beyond what the UI exposes, you’ll need to work with the browser-use library directly. The README focuses on setup and configuration but doesn’t detail how the system handles common real-world issues like network failures or unexpected page states.
Verdict
Use if: You need a zero-code way to experiment with AI web agents, especially for tasks requiring authenticated sessions (checking personal accounts, interacting with subscription sites, building on previous agent work across multiple runs). It’s perfect for rapid prototyping, demonstrating AI capabilities to non-technical stakeholders, or one-off automation tasks. The persistent browser feature is genuinely novel and solves a real pain point that pure-API solutions can’t address.
Skip if: You need production reliability for business-critical workflows, want programmatic control over agent behavior (use the browser-use library directly instead), or require custom integration with existing codebases. The UI is a convenient wrapper, but convenience comes at the cost of flexibility. Also skip if you’re uncomfortable with the security implications of giving an AI agent full access to your authenticated browser sessions—the tool uses your actual browser profile with all its permissions and data.