Back to Articles

How Cloudflare Fit 2,500 API Endpoints Into Two MCP Tools

[ View on GitHub ]

How Cloudflare Fit 2,500 API Endpoints Into Two MCP Tools

Hook

Most MCP servers create one tool per API endpoint. Cloudflare’s approach would need 2,594 tools and consume 244,000 tokens just to describe them. Instead, they built two tools that do everything.

Context

The Model Context Protocol (MCP) has a scaling problem. The traditional pattern—one tool per API endpoint—works beautifully for small APIs. Need to send emails? Create a send_email tool with a clean schema. Need to query a database? Add a run_query tool. But this approach collapses under its own weight when you encounter APIs like Cloudflare’s.

Cloudflare’s API spans their entire product catalog: Workers, Pages, KV storage, R2 object storage, D1 databases, DNS management, WAF rules, analytics, and dozens more services. Their OpenAPI specification contains 2,594 endpoints and weighs in at 2 million tokens. Using the standard MCP pattern would mean defining 2,594 individual tools, consuming between 244,000 and 1.2 million tokens of your agent’s context window just to describe what’s available—before doing any actual work. For many models, that’s the entire context budget. Even for frontier models with million-token windows, it’s a massive tax that leaves little room for conversation history, retrieved documents, or other tools.

Technical Insight

Security Context

search query

query spec

matching endpoints

endpoint results

execute + JS code

load & run code

cloudflare.request

response

result

execution result

AI Agent

MCP Server

OpenAPI Spec

2M tokens

Cloudflare Worker

Isolated Sandbox

Cloudflare API

System architecture — auto-generated

Cloudflare’s solution introduces what they call the “Code Mode” pattern: instead of exposing thousands of tools, expose two meta-tools that let the agent write JavaScript code to explore and interact with the API. The search tool queries the OpenAPI specification to discover endpoints, while the execute tool runs JavaScript that makes actual API calls. The full 2-million-token spec never enters the agent’s context—it stays server-side, queried on demand.

Here’s what agent interaction looks like. First, the agent searches for relevant endpoints:

// Agent uses the 'search' tool with a query
{
  "tool": "search",
  "query": "workers script upload"
}

// Returns matching endpoints from the OpenAPI spec
[
  {
    "path": "/accounts/{account_id}/workers/scripts/{script_name}",
    "method": "PUT",
    "summary": "Upload a Worker script"
  },
  {
    "path": "/accounts/{account_id}/workers/scripts/{script_name}/content",
    "method": "PUT",
    "summary": "Upload Worker script content"
  }
]

Then the agent writes code to make the actual API call:

// Agent uses the 'execute' tool with JavaScript code
const response = await cloudflare.request(
  'PUT',
  '/accounts/{account_id}/workers/scripts/my-worker',
  {
    body: scriptContent,
    headers: {
      'Content-Type': 'application/javascript'
    }
  }
);

return response;

Under the hood, the server executes this code using Cloudflare’s Dynamic Worker Loader API. This isn’t just eval() in a sandbox—it’s full Worker isolation. Each execution runs in the same security context that hosts production Cloudflare Workers, with the same resource limits and isolation guarantees. The agent-generated code can only access the cloudflare.request() function, which is pre-configured with the user’s authentication credentials (either OAuth token or API key).

The architecture is surprisingly straightforward. The MCP server maintains a persistent connection to the agent (via stdio or SSE transport), holds the OpenAPI spec in memory, and maintains an authenticated Cloudflare API client. When search is called, it performs a simple text search over spec.paths and returns matching endpoint metadata. When execute is called, it spawns a Worker, injects the code, provides the cloudflare global, and streams the response back.

Authentication supports two modes. For interactive use, the server implements a full OAuth flow with granular permission selection—users can approve exactly which Cloudflare resources the agent can access. For programmatic use, it accepts API tokens directly. There’s a clever detail here: when using account-scoped API tokens, the server automatically detects the account ID by calling the verification endpoint, so users don’t need to manually specify it.

The token math is remarkable. A traditional MCP implementation would serialize all 2,594 endpoints into tool definitions, consuming 244,000+ tokens. With Code Mode, the agent’s context contains only:

  • Two tool definitions (~500 tokens)
  • Recent conversation history (variable)
  • Search results from queries (typically 100-500 tokens per query)
  • Code execution results (typically <500 tokens)

Total: roughly 1,000-2,000 tokens for the entire Cloudflare API surface. That’s a 99.5% reduction. The tradeoff is latency—agents must search first, then execute—and reliability—code generation can fail where structured tool calls wouldn’t. But for APIs measured in thousands of endpoints, the math is overwhelming.

Gotcha

The Code Mode pattern shifts complexity from context management to code generation reliability. Your agent must now write valid JavaScript that correctly constructs API requests, handles responses, and manages errors. In testing, this works remarkably well with frontier models (GPT-4, Claude 3.5), but smaller or older models struggle. They generate code with syntax errors, forget to await promises, or construct malformed request bodies. Traditional MCP tools with strict schemas catch these errors before execution—with Code Mode, you only discover them at runtime.

Security deserves careful consideration. Worker isolation prevents truly dangerous operations (filesystem access, network requests to arbitrary hosts), but the agent can still make any API call permitted by the authentication token. If you grant broad permissions, a confused agent could delete DNS records, purge cache, or modify security rules. The OAuth flow helps by letting users grant narrow permissions, but API token users need to carefully scope their tokens. There’s no additional safety layer—if the token can do it, the agent can do it. This isn’t a limitation unique to Code Mode, but the indirection makes it less obvious what operations the agent might attempt compared to seeing explicit tool names like delete_dns_record.

Verdict

Use if: You’re building agents that need comprehensive Cloudflare API access, working with context-constrained models, or combining Cloudflare operations with many other tools where context budget is tight. The Code Mode pattern shines brightest with large APIs (1000+ endpoints) and capable code-generating models. It’s also ideal for exploratory workflows where the agent needs to discover and try different API endpoints dynamically rather than following a predetermined script. Skip if: You only need 5-10 specific Cloudflare endpoints (just build a traditional MCP with those tools—you’ll get better type safety and validation), your agent struggles with code generation, or you require guaranteed schema validation before execution. Also skip if you’re working in a high-security environment where you can’t tolerate the risk of an agent writing unintended API calls within the granted permission scope, or if the added latency of search-then-execute doesn’t fit your performance requirements.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-agents/cloudflare-mcp.svg)](https://starlog.is/api/badge-click/ai-agents/cloudflare-mcp)