Back to Articles

Magentic-UI: The Web Agent That Shows Its Work Before Acting

[ View on GitHub ]

Magentic-UI: The Web Agent That Shows Its Work Before Acting

Hook

Most AI web agents are black boxes that silently execute your requests—sometimes booking the wrong flight or ordering 100 pizzas instead of one. Magentic-UI takes the opposite approach: it shows you its plan before doing anything risky.

Context

Browser automation has evolved from simple Selenium scripts to LLM-powered agents that can navigate arbitrary websites, but this power creates a new problem: trust. When an agent can fill forms, make purchases, or modify data, how do you prevent catastrophic mistakes? Existing solutions like Anthropic’s Computer Use or simple LangChain + Playwright combinations operate autonomously—they execute actions and only report back when done (or when something breaks).

Microsoft Research’s Magentic-UI addresses this by treating web automation as a collaborative activity rather than a delegation task. Built on the AutoGen multi-agent framework, it’s designed for complex web tasks that require both transparency and human judgment: customizing food orders with specific preferences, navigating unindexed content like internal dashboards, or combining web scraping with code execution to generate visualizations. The system runs as a research prototype focused on human-agent interaction patterns, explicitly prioritizing controllability over full autonomy.

Technical Insight

AutoGen Framework

Submit Task

Generate Plan

Approve/Edit Plan

Coordinate Actions

Coordinate Actions

DOM Manipulation

Browser Events

Execute Code

Isolated Environment

Real-time Status

Real-time Status

Task Completion

User via React UI

Planning Agent

Web Browsing Agent

Playwright

Code Execution Agent

Docker Container

Target Websites

Execution Results

System architecture — auto-generated

Magentic-UI’s architecture centers on three specialized agents orchestrated by AutoGen: a web browsing agent using browser automation for DOM manipulation, a code execution agent running inside isolated Docker containers, and a planning agent that coordinates the workflow. What makes this interesting is how the system enforces human checkpoints throughout execution.

The planning phase happens entirely before any action executes. When you submit a task like “Find vegetarian pizza options under $20 and add one to cart,” the planner generates a step-by-step plan visible in the UI through what the system calls “co-planning.” You can edit this plan directly using the plan editor—reordering steps, adding constraints, or removing actions you don’t want. The system won’t proceed until you explicitly approve.

Here’s how you launch the system with custom model configuration:

# Install with Azure support
pip install magentic-ui[azure]

# Create a config file for your LLM client
cat > azure_config.json << EOF
{
  "model": "gpt-4",
  "api_type": "azure",
  "api_version": "2024-02-15-preview",
  "azure_endpoint": "https://your-endpoint.openai.azure.com"
}
EOF

# Launch with config
export AZURE_OPENAI_API_KEY="your-key"
magentic-ui --port 8081 --config azure_config.json

During execution, the browser agent operates in a visible browser instance. Unlike headless automation, you can watch every click, form fill, and navigation in real-time. If the agent goes off track—say it’s searching the wrong category—you can interrupt mid-execution through what the system calls “co-tasking,” either by directly interacting with the browser or sending a correction through the chat interface. The agent reacts to these interventions, replanning as needed.

Action guards provide a second safety layer. Before executing sensitive operations (form submissions, purchases, file deletions), the system pauses and requests explicit approval. This creates a natural checkpoint for irreversible actions without slowing down the entire workflow.

The most novel feature is plan learning and retrieval. Successful task executions get saved to a “gallery” with metadata about the task type and steps taken. When you submit a new task, the system uses semantic similarity to retrieve relevant past plans. If you’ve ordered pizza three times before, the fourth attempt can reuse that proven workflow, just adapting parameters. You can retrieve plans manually from the gallery or let the system auto-suggest them.

For long-running tasks, Magentic-UI introduces a “Tell me When” mode that supports monitoring over minutes to days. You can set up tasks like “Notify me when GitHub stars exceed 10,000” or “Check flight prices daily until they drop below $400.” The system periodically executes the monitoring logic (web scraping or API calls) and triggers actions when conditions are met:

# CLI mode for monitoring tasks
magentic-cli --work-dir ./monitoring-data
# Then in the interface:
# "Tell me when microsoft/magentic-ui reaches 10,000 stars"

The architecture also supports Model Context Protocol (MCP) agents through the “MCP Agents” feature, letting you extend capabilities with your favorite MCP servers. The Airbnb price analysis demo shows this—an MCP agent fetches listing data while the code execution agent generates comparative charts, all orchestrated through the planning layer.

Integration with Microsoft’s Fara-7B agentic model is particularly interesting for developers wanting to run smaller, locally-hosted models instead of cloud APIs. The system achieved 42.52% on the GAIA benchmark and 82.2% on WebVoyager, competitive with larger commercial models, suggesting the multi-agent architecture compensates for individual model limitations through specialized task decomposition.

Gotcha

The biggest limitation is right in the description: this is a “research prototype,” not production software. Expect rough edges, incomplete error handling, and occasional failures that require restarting the Docker containers. Microsoft hasn’t committed to long-term maintenance or a production roadmap.

Docker is mandatory for the full feature set because code execution happens in isolated containers. This adds infrastructure overhead and complicates deployment. You can run with --run-without-docker, but you lose the code execution agent entirely—no data processing, no chart generation, no file manipulation beyond browsing. Windows users face additional friction since the entire stack requires WSL2, creating a Linux subsystem dependency that can conflict with native Windows tooling.

The plan learning feature sounds powerful but depends entirely on the quality of your saved plans—if early runs produce suboptimal workflows, the gallery becomes a library of bad patterns. There’s no built-in mechanism explicitly documented to rate or filter low-quality plans, so you’ll likely need to manually curate the gallery.

Finally, the human-in-the-loop design is a feature for sensitive tasks but becomes a bottleneck for simple automation. If you just need to scrape 100 product pages, waiting for approval on each navigation step wastes time compared to a fully autonomous agent.

Verdict

Use Magentic-UI if you’re automating web tasks where mistakes are costly—purchasing workflows, form submissions with financial impact, or data collection requiring validation before processing. The transparent co-planning and action guards prevent the “runaway agent” problem that plagues autonomous systems. It’s ideal for repetitive tasks that benefit from plan learning, like weekly report generation that combines web scraping and data visualization. The monitoring mode excels at tasks like price tracking or content change detection that need to run over days.

Skip it if you need production reliability or can’t tolerate experimental software instability. Avoid it for simple, one-off automation where the Docker setup overhead exceeds the task complexity. If you’re on Windows and can’t use WSL2, or if your deployment environment restricts Docker, look elsewhere. Finally, skip it if you want true “set and forget” automation—the human-in-the-loop design assumes you’re available to approve actions and course-correct, which defeats the purpose of full delegation.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-agents/microsoft-magentic-ui.svg)](https://starlog.is/api/badge-click/ai-agents/microsoft-magentic-ui)