Building Stateful AI Agents at the Edge with Cloudflare’s Durable Objects
Hook
What if each of your users could have their own dedicated AI agent running 24/7, but you only pay when they’re actually using it? Cloudflare just made that possible with agents that hibernate like processes on a mainframe.
Context
AI agents have a state problem. Most frameworks treat agents as ephemeral—spin up a chain of LLM calls, execute tools, return results, then throw away everything. If you want persistence, you bolt on Redis or Postgres and manage serialization yourself. If you want real-time updates to connected clients, you’re wiring up WebSockets manually. If you want to scale to millions of concurrent agents, you’re provisioning infrastructure and watching costs spiral.
Cloudflare Agents takes a radically different approach by building on Durable Objects, Cloudflare’s distributed coordination primitive. Instead of treating agents as stateless functions, each agent becomes a long-lived object with its own SQLite storage, in-memory state, and lifecycle. Agents hibernate when idle—consuming zero resources—and wake instantly when called. This isn’t just a framework abstraction; it’s infrastructure-level support for persistent agents that enables economics previously impossible: one agent per user, one per chat session, one per game instance, all costing nothing when dormant.
Technical Insight
The architecture centers on decorated TypeScript classes that compile to Durable Objects. The @callable decorator transforms methods into type-safe RPC endpoints, while the Agent base class manages WebSocket connections, state synchronization, and lifecycle. Here’s what a basic agent looks like:
import { Agent, callable } from '@cloudflare/agents';
export class ChatAgent extends Agent {
private messages: Array<{ role: string; content: string }> = [];
@callable
async sendMessage(content: string) {
this.messages.push({ role: 'user', content });
const response = await this.env.AI.run('@cf/meta/llama-3-8b-instruct', {
messages: this.messages
});
this.messages.push({ role: 'assistant', content: response.response });
return response.response;
}
@callable
getHistory() {
return this.messages;
}
}
When you call agent.sendMessage() from a client, the RPC layer routes it to the correct Durable Object instance (creating it if necessary), executes the method, and automatically syncs the updated state to all connected WebSocket clients. The this.messages array persists in memory between calls. When the agent goes idle, Durable Objects serialize state to SQLite and hibernate the instance. Next request? It wakes with state intact.
The React integration eliminates boilerplate. The useAgent hook manages connection lifecycle, method invocation, and reactive state updates:
import { useAgent } from '@cloudflare/agents-react';
function ChatUI() {
const agent = useAgent<ChatAgent>('chat-session-123');
const [input, setInput] = useState('');
const handleSend = async () => {
await agent.sendMessage(input);
setInput('');
};
// agent.state automatically updates when server state changes
return (
<div>
{agent.state?.messages.map((msg, i) => (
<div key={i}>{msg.role}: {msg.content}</div>
))}
<input value={input} onChange={e => setInput(e.target.value)} />
<button onClick={handleSend}>Send</button>
</div>
);
}
Behind the scenes, useAgent establishes a WebSocket to the agent’s Durable Object. When you call agent.sendMessage(), it sends an RPC message over the socket. The agent processes it, updates state, and broadcasts changes to all connected clients. Your React component re-renders automatically because the hook wraps state in a proxy that triggers updates.
The Model Context Protocol (MCP) integration is where things get interesting for sophisticated workflows. Agents can expose MCP servers (making their capabilities available to LLMs as tools) or act as MCP clients (consuming external tools). This enables compositional architectures where agents call other agents or external services:
export class ResearchAgent extends Agent {
private mcpClient: MCPClient;
async onStart() {
this.mcpClient = new MCPClient({
transport: 'cloudflare-do',
servers: {
database: { id: 'database-agent-1' },
search: { id: 'search-agent-1' }
}
});
}
@callable
async research(topic: string) {
// LLM can call tools from other agents via MCP
const tools = await this.mcpClient.listTools();
return await this.env.AI.run('@cf/meta/llama-3-8b-instruct', {
messages: [{ role: 'user', content: `Research: ${topic}` }],
tools: tools,
tool_choice: 'auto'
});
}
}
The hibernate/wake semantics change how you think about resource usage. Traditional serverless functions bill per execution time. Durable Objects bill per request plus wall-clock time active, but hibernation means an agent serving 100 requests scattered over an hour only pays for seconds of active processing. For applications with bursty user activity—chatbots that wait for user input, game sessions with pauses, monitoring agents that react to events—the cost model is transformative. You can genuinely run millions of agents because 99% are asleep at any moment.
Cloudflare also added workflows with human-in-the-loop support, enabling agents to pause execution and wait for external input before continuing. This is critical for approval flows, multi-step processes, or any scenario where the agent needs to hand control back to a human mid-stream without losing context or state.
Gotcha
The platform lock-in is absolute. This isn’t “Cloudflare-optimized” code you could port with effort—it’s fundamentally coupled to Workers, Durable Objects, and Cloudflare’s runtime. Local development works via Wrangler emulation, but you’re always targeting Cloudflare’s deployment model. If your organization has multi-cloud requirements or you’re building on AWS/GCP/Azure, this is a non-starter.
The SDK is evolving fast, and Cloudflare explicitly states they’re not accepting external contributions while stabilizing the API. Experimental features like code mode (where LLMs generate and execute TypeScript dynamically) and upcoming capabilities (voice agents, web browsing, sandboxed execution) signal the surface area is still growing. If you need API stability for a large production system, you’re signing up for potential refactoring as the framework matures. The 4,400+ stars suggest strong interest, but the closed contribution model means you’re dependent on Cloudflare’s roadmap priorities. Also worth noting: Durable Objects have per-object rate limits (around 1,000 requests per second per instance). For extremely high-throughput scenarios on a single agent, you’ll need sharding strategies.
Verdict
Use if: You’re already on Cloudflare Workers and need persistent, stateful AI agents with real-time synchronization—especially for multi-tenant applications where each user/session gets isolated state (chat platforms, collaborative tools, gaming, personalized assistants). The economics for bursty workloads are genuinely compelling, and the React integration is exceptionally clean. Also use if you’re building MCP-based agent systems and want tight integration with Cloudflare’s AI inference. Skip if: You need cloud portability, require API stability for critical production systems, or want community-driven extensibility. Also skip if your agents need extremely high sustained throughput on single instances (thousands of requests per second per agent), or if you’re heavily invested in another cloud ecosystem and don’t want platform migration costs.