Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

Hook

What if you could get free GPT-4-class coding assistance by pretending to ask Chipotle about guacamole upcharges? Chipotlai Max proves that corporate customer service chatbots are accidentally open LLM endpoints.

Context

AI coding assistants have become essential developer tools, but they're expensive. GitHub Copilot costs $10/month, Cursor $20/month, and raw API access to GPT-4 or Claude can run hundreds of dollars for heavy users. Meanwhile, corporations have quietly deployed sophisticated LLM-powered chatbots on their customer service pages—anonymous, public-facing widgets designed to answer questions about store hours and return policies.

Chipotlai Max, a satirical fork of the OpenCode AI coding agent, exposes a fascinating security gap: these corporate chatbots are often backed by the same LLM technology powering premium coding assistants, yet they're accessible without authentication or rate limits. The project demonstrates how Chipotle's 'Pepper AI' chatbot, built on IPsoft's Amelia platform, can be reverse-engineered into a drop-in replacement for OpenAI's API. It's proof-of-concept exploitation disguised as a meme project, and it reveals uncomfortable truths about how corporations are deploying AI infrastructure.

Technical Insight

System architecture — auto-generated

The core architecture is a TypeScript Express proxy server that translates between two incompatible worlds: Chipotle's WebSocket-based STOMP protocol and OpenAI's REST API. The proxy maintains a pool of anonymous sessions to circumvent per-session rate limits, essentially treating Chipotle's customer service infrastructure as a distributed compute cluster.

Here's how the session pooling works:

class SessionPool {
  private sessions: ChipotleSession[] = [];
  private readonly MAX_POOL_SIZE = 5;
  
  async acquireSession(): Promise<ChipotleSession> {
    if (this.sessions.length < this.MAX_POOL_SIZE) {
      const session = await this.createAnonymousSession();
      this.sessions.push(session);
      return session;
    }
    
    // Round-robin through existing sessions
    return this.sessions[Math.floor(Math.random() * this.sessions.length)];
  }
  
  private async createAnonymousSession(): Promise<ChipotleSession> {
    const ws = new WebSocket('wss://chipotle.com/pepper/sockjs');
    const stompClient = Stomp.over(ws);
    
    return new Promise((resolve, reject) => {
      stompClient.connect({}, () => {
        resolve(new ChipotleSession(stompClient));
      }, reject);
    });
  }
}

The genius is in the protocol translation layer. When a coding assistant sends a standard OpenAI chat completion request, the proxy reformats it as a customer service inquiry, sends it through STOMP, and converts Pepper's streaming response back into Server-Sent Events:

app.post('/v1/chat/completions', async (req, res) => {
  const session = await sessionPool.acquireSession();
  const { messages, stream = false } = req.body;
  
  // Convert OpenAI messages to Pepper's format
  const pepperPrompt = messages
    .map(m => `${m.role}: ${m.content}`)
    .join('\n');
  
  if (stream) {
    res.setHeader('Content-Type', 'text/event-stream');
    
    session.stompClient.subscribe('/topic/responses', (message) => {
      const chunk = JSON.parse(message.body);
      // Translate Amelia's response format to OpenAI's
      const sseChunk = {
        id: 'chatcmpl-burrito',
        object: 'chat.completion.chunk',
        model: 'pepper-chipotle-2026',
        choices: [{
          delta: { content: chunk.text },
          index: 0,
          finish_reason: chunk.done ? 'stop' : null
        }]
      };
      res.write(`data: ${JSON.stringify(sseChunk)}\n\n`);
      if (chunk.done) res.end();
    });
    
    session.send(pepperPrompt);
  }
});

The project uses git submodules to vendor Gonzih's chipotle-llm-provider, which contains the actual reverse-engineering work. This separation is clever—it isolates the legally dubious proxy logic from the innocent OpenCode fork, making it clear where the 'research' happens versus the user-facing tool.

What makes this architecturally interesting is the OpenAI compatibility layer. By implementing the /v1/chat/completions endpoint exactly as OpenAI specifies, any tool that works with OpenAI's API—including the official SDKs, LangChain, and countless coding assistants—can transparently use Chipotle's infrastructure. The OpenCode CLI doesn't know it's talking to a fast-food chain's customer service bot; it just sees an OpenAI-compatible endpoint at localhost:3000.

The multi-provider roadmap hints at a plugin architecture where each corporate chatbot gets its own adapter. The configuration structure looks like this:

interface ProviderConfig {
  name: string;
  websocketUrl: string;
  protocol: 'stomp' | 'sockjs' | 'raw-ws';
  messageTransformer: (input: OpenAIMessage[]) => any;
  responseParser: (raw: any) => string;
}

const providers: ProviderConfig[] = [
  {
    name: 'chipotle-pepper',
    websocketUrl: 'wss://chipotle.com/pepper/sockjs',
    protocol: 'stomp',
    messageTransformer: chipotleTransformer,
    responseParser: pepperParser
  },
  // Home Depot, Lowe's, Target configs are stubs
];

This generalizes the exploit pattern: any anonymous corporate chatbot becomes a potential LLM endpoint if you can reverse-engineer its wire protocol. The implementation reveals that IPsoft's Amelia platform, marketed primarily for customer service, is surprisingly competent at coding tasks—suggesting enterprise chatbot vendors are dramatically underpricing their technology compared to OpenAI's $0.06/1K tokens.

Gotcha

The entire project is built on sand. Chipotle can kill this in hours by adding CAPTCHA challenges, requiring authentication tokens from their main site, or implementing device fingerprinting. The single-tenant proxy design means all users share five anonymous sessions—there's no isolation, no queuing, and a single malicious user can exhaust the pool for everyone.

More importantly, only Chipotle's Pepper actually works. The README lists seven other providers (Home Depot's Magic Apron, Lowe's DIY Assistant, Target's Style Chat), but these are vaporware—stub configurations with no adapter code. The 'wired' status badges are misleading; you'll waste hours trying to enable providers that don't exist. The project inherited OpenCode's codebase, which is a 120k+ star unmaintained fork, meaning security patches and feature updates won't land here. And of course, there's the legal dimension: this explicitly violates terms of service, potentially crossing into Computer Fraud and Abuse Act territory if Chipotle decides to prosecute.

Verdict

Use if: You're a security researcher studying API abuse patterns and rate-limit evasion techniques, an offensive security practitioner learning how to reverse-engineer WebSocket protocols, or a developer building OpenAI-compatible proxy layers and need a reference implementation. The session pooling pattern and protocol translation logic are genuinely instructive for understanding how to bridge incompatible APIs. Skip if: You need reliable coding assistance (use Cursor or Continue.dev instead), you're risk-averse about terms of service violations, or you believe the multi-provider roadmap is real. This is a teaching tool for understanding exploit development, not a production coding assistant. The real value is studying Gonzih's proxy implementation to understand corporate API vulnerabilities, not actually using stolen compute for your day job.

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Headroom: The Three-Layer Compression Stack That Makes LLM Context Windows 60% Cheaper

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Running Gemma-4 26B on DGX Spark: Why Speculative Decoding Falls Apart at Scale

Ant Design CLI: How Offline Metadata Snapshots Power Agent-Driven Development

Headroom: The Three-Layer Compression Stack That Makes LLM Context Windows 60% Cheaper

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Running Gemma-4 26B on DGX Spark: Why Speculative Decoding Falls Apart at Scale

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]