How Cloudflare's MCP Server Cuts Context Usage by 99.5% with Code Mode
Hook
What if exposing 2,500 API endpoints to an AI agent consumed the same tokens as exposing two? Cloudflare's MCP server proves it's possible—and challenges how we think about AI-to-API integration.
Context
The Model Context Protocol (MCP) from Anthropic promised to solve a real problem: standardizing how AI agents connect to external tools and data sources. The traditional approach is straightforward—register each API endpoint as a tool with a detailed JSON schema describing parameters, responses, and examples. The agent sees all available tools in its context window and calls them as needed.
But this approach implodes at scale. Cloudflare's API spans over 2,500 endpoints covering everything from DNS management to Workers deployment to Analytics queries. Representing each endpoint as an MCP tool requires approximately 244,000 tokens just to describe what's available—before the agent does anything useful. For context, that's nearly half of Claude 3.5 Sonnet's 200k context window consumed by tool definitions alone. When you're building agents that need to compose multiple MCP servers or maintain conversation history, this token budget is simply unacceptable. The cloudflare/mcp repository introduces a radically different pattern called "Code Mode" that reduces this to just 1,000 tokens while maintaining full API coverage.
Technical Insight
The core innovation is deceptively simple: instead of registering 2,500+ individual tools, expose exactly two. The first tool, search, lets the agent query the OpenAPI specification to discover relevant endpoints. The second tool, execute, runs agent-generated JavaScript code in an isolated Cloudflare Worker that makes the actual API calls. The OpenAPI spec never enters the agent's context window—it lives server-side and is only queried on-demand.
Here's what agent-generated code looks like for a simple DNS record creation:
const response = await cfApi(
'POST',
'/zones/{zone_id}/dns_records',
{
path: { zone_id: 'abc123...' },
body: {
type: 'A',
name: 'api.example.com',
content: '192.0.2.1',
ttl: 3600,
proxied: true
}
}
);
return response.result;
The cfApi function is injected into the execution environment by the MCP server and handles authentication, account ID resolution, and the actual HTTP request to Cloudflare's API. The agent writes the code based on information retrieved via the search tool, executes it via the execute tool, and receives only the result. This architectural shift transforms the token economics: the 244k-token specification becomes an on-demand database rather than an ever-present context burden.
The execution happens through Cloudflare's Dynamic Worker Loader API, which spins up isolated JavaScript sandboxes. Each execution is ephemeral and sandboxed from other executions, providing strong security boundaries. The agent can't persist state between calls or access sensitive internals—it simply generates code, the server executes it in a fresh Worker, and returns the result. This is crucial because you're essentially letting an AI agent run arbitrary code against your infrastructure.
Authentication supports both OAuth and API tokens. The OAuth implementation is particularly thoughtful—when users authorize access, they see Cloudflare's standard permission selection UI, providing granular control over what the agent can do. API tokens work similarly but with a clever addition: the server automatically resolves the account ID from the token, so agents don't need to hardcode account-specific identifiers.
For complex workflows, agents can chain multiple API calls in a single execution:
// Get all zones, filter for example.com, then list its DNS records
const zonesResponse = await cfApi('GET', '/zones');
const targetZone = zonesResponse.result.find(z => z.name === 'example.com');
if (!targetZone) {
throw new Error('Zone not found');
}
const dnsResponse = await cfApi(
'GET',
'/zones/{zone_id}/dns_records',
{ path: { zone_id: targetZone.id } }
);
return dnsResponse.result.map(record => ({
name: record.name,
type: record.type,
content: record.content
}));
This kind of exploration and chaining would require multiple back-and-forth exchanges in a traditional tool-based approach, each consuming tokens and latency. With Code Mode, it's a single execution. The agent discovers what it needs via search, writes the logic, and gets the answer.
The MCP server also handles Cloudflare's GraphQL Analytics API alongside REST endpoints through the same unified interface. This is non-trivial because GraphQL requires fundamentally different request structures, but the abstraction holds: the agent still writes JavaScript using the same cfApi function, just with GraphQL query strings.
Under the hood, the server is built with TypeScript and leverages the official MCP SDK. The implementation is relatively compact—a few hundred lines—because most complexity is delegated to the execution environment. The server doesn't need to understand Cloudflare's API semantics; it just needs to safely proxy code execution and API calls. This architectural separation means updates to Cloudflare's API are automatically reflected without server changes, since the OpenAPI spec is the source of truth.
Gotcha
The most significant limitation is that Code Mode requires agents to write syntactically correct, semantically meaningful JavaScript. Not all LLMs excel at code generation, and even capable models occasionally produce buggy code—missing error handling, malformed parameters, or incorrect logic. When an agent generates broken code, the execution fails, and it must retry. For simple, single-endpoint operations like "list my zones," this adds unnecessary complexity compared to a traditional tool call with a structured schema that guides the agent toward correctness.
IP-filtered API tokens are explicitly unsupported. If your Cloudflare account uses Client IP Address Filtering for API tokens, this MCP server won't work because execution happens on Cloudflare's infrastructure, not your local machine. This is a showstopper for security-conscious organizations that rely on IP allowlisting as a defense layer. Additionally, if you disable Code Mode (which is possible for compatibility), you lose the entire token efficiency benefit—reverting to 244k tokens defeats the purpose of using this server. Finally, because Code Mode is a relatively new pattern, debugging can be less intuitive than traditional tools. When something goes wrong, you're troubleshooting agent-generated JavaScript execution in a remote sandbox rather than inspecting a failed tool call with clear parameter validation.
Verdict
Use if: You need comprehensive Cloudflare API access for AI agents without exhausting context budgets, you're building exploratory workflows where the agent discovers and chains operations dynamically, you trust your LLM's code generation capabilities (GPT-4/Claude-class models), or you're composing multiple MCP servers and need to minimize per-server token overhead. This server is particularly valuable for automation agents that manage complex Cloudflare configurations—DNS orchestration, Workers deployment pipelines, analytics data extraction—where predicting all required endpoints upfront is impractical. Skip if: You only need 5-10 specific Cloudflare endpoints (build a custom traditional MCP server instead), your use case involves simple, repetitive operations where structured tool schemas provide better reliability, you require IP-filtered API tokens for compliance reasons, or you're using an LLM with weak code generation abilities. The Code Mode pattern is brilliant for large API surfaces but overkill for narrow integrations.