CodeBurn: Tracking Where Your AI Coding Budget Actually Goes

Hook

If you're using AI coding assistants daily, you're probably spending hundreds of dollars a month without knowing which features actually drive those costs. CodeBurn found that 30% of AI coding tokens in typical workflows go to 'waste'—suggestions that get immediately rejected or reverted.

Context

The explosion of AI coding assistants in 2023-2024 created a new operational blindspot for development teams. Tools like Cursor, GitHub Copilot, Claude Code, and Windsurf became essential parts of developer workflows, but unlike traditional SaaS products with predictable per-seat pricing, AI assistants bill by token consumption. A single complex refactoring session with Claude Sonnet 3.5 can burn through $5-10 in minutes if you're iterating on a large codebase. Monthly bills can easily hit $500-2000 per developer, yet most teams have no visibility into what drives those costs.

The problem isn't just the total spend—it's the lack of attribution. When your Anthropic bill shows $1,200 for the month, you can't tell if that came from valuable architectural work or someone accidentally asking Claude to refactor the same file 50 times. Provider dashboards show aggregate token counts, but they don't break down costs by project, task type (autocomplete vs. chat vs. command), or which models actually delivered value. CodeBurn emerged to solve this observability gap by analyzing the session logs that AI coding tools already write to your disk, transforming raw usage data into actionable cost insights without requiring any changes to your development workflow.

Technical Insight

CodeBurn's architecture is refreshingly simple: it's a filesystem scraper with a TUI frontend. The tool doesn't intercept API calls, proxy requests, or require API keys. Instead, it reads the session logs that AI coding tools naturally persist to disk, parses them with provider-specific extractors, calculates costs using LiteLLM's pricing database, and renders everything through an interactive terminal interface built with React Ink.

The core insight is that most AI coding assistants are chattier than you'd expect with their local storage. Cursor writes a SQLite database to ~/.cursor-tutor, Windsurf logs JSONL to ~/.codeium, and Claude Code saves session data as JSON files. CodeBurn includes 18 provider-specific parsers that understand these formats. Here's a simplified version of how the Cursor parser works:

// Cursor stores sessions in SQLite with a workspaceChats table
const db = new Database(path.join(os.homedir(), '.cursor-tutor', 'tutor.db'));
const sessions = db.prepare(`
  SELECT 
    id,
    conversationId,
    model,
    bubbles, -- JSON array of messages
    createdAt
  FROM workspaceChats
  WHERE createdAt > ?
`).all(startDate);

for (const session of sessions) {
  const messages = JSON.parse(session.bubbles);
  for (const msg of messages) {
    // Cursor often logs model as 'Auto' - infer actual model from context
    const actualModel = inferModel(msg.model, msg.provider);
    const tokens = estimateTokens(msg.text, actualModel);
    const cost = calculateCost(tokens, actualModel, msg.type);
    
    yield {
      provider: 'cursor',
      model: actualModel,
      tokens,
      cost,
      type: msg.type, // 'chat', 'edit', 'generate'
      timestamp: session.createdAt
    };
  }
}

The token estimation layer is where CodeBurn gets clever. Providers like GitHub Copilot and Windsurf don't log actual token counts, so CodeBurn estimates based on character length using the tiktoken library for OpenAI models and similar approximations for others. For Cursor's notorious 'Auto' mode—where the UI doesn't reveal which model handled your request—CodeBurn assumes Sonnet 3.5 pricing since that's Cursor's default. It's not perfect, but testing against known API bills shows estimates within 5-10% accuracy.

The caching strategy is particularly smart for large datasets. When you first run CodeBurn against Cursor's database (which might contain months of history across dozens of projects), it can take 30-60 seconds to parse everything. CodeBurn writes parsed results to its own SQLite cache (~/.codeburn/cache.db) with checksums of the source files. Subsequent runs only parse new sessions, making the TUI feel instant even with gigabytes of historical data.

The terminal UI uses React Ink to render an interactive dashboard with keyboard navigation. You can drill down from a monthly overview ($1,243 spent, 15M tokens) into per-project breakdowns, then filter by task type or model. The 'optimize' view is especially useful—it shows potential waste like autocomplete suggestions that were immediately deleted or chat sessions where you generated code but never saved the file. This isn't just cost tracking; it's workflow intelligence.

One architectural limitation worth noting: CodeBurn is fundamentally retrospective. It reads completed session logs, so it can't prevent runaway costs in real-time. If a teammate accidentally spawns 100 parallel Claude sessions, you'll only discover the damage after the fact. The tool is designed for post-hoc analysis and budget planning, not active cost controls. That said, for most teams, the visibility alone changes behavior—once developers see that multi-file refactoring sessions cost $8 each, they naturally become more deliberate about when to invoke them.

Gotcha

CodeBurn's biggest limitation is its dependency on provider logging implementations, which vary wildly in quality and consistency. Cursor's SQLite database is well-structured and comprehensive, but GitHub Copilot's local cache is sparse—CodeBurn can't distinguish between ghost text suggestions (shown but not accepted) and actual completions, leading to potential undercounting. Windsurf sometimes omits model information entirely, forcing CodeBurn to guess based on response characteristics. When providers update their logging formats or storage locations, CodeBurn breaks until maintainers add support for the new schema. This happened with Cursor's v0.40 release, which restructured the SQLite schema and broke parsing for two weeks.

The tool also struggles with enterprise deployments and custom AI provider configurations. If your company routes all AI requests through an internal proxy or uses a custom fork of an open-source coding assistant, CodeBurn won't find the session logs in expected locations. There's no plugin system for adding custom parsers—you'd need to fork the repo and write your own provider extractor. Additionally, CodeBurn assumes you're using standard model pricing. If you have negotiated enterprise rates with Anthropic or OpenAI, the cost calculations will be inflated, though you can manually adjust pricing constants in the codebase.

Token estimation accuracy is another gotcha. For providers that do expose actual token counts (Claude Code, Codex), CodeBurn's numbers are reliable. But for tools like Kiro and GitHub Copilot that only log text content, the estimates can drift 10-20% from reality depending on tokenization quirks. Non-English codebases and files with heavy Unicode or special characters tend to be underestimated since the approximation algorithm is optimized for ASCII-heavy source code.

Verdict

Use CodeBurn if you're spending $200+ monthly across multiple AI coding tools and need to understand cost attribution by project, model, or task type—it's the fastest way to audit AI coding spend without modifying your workflow or routing traffic through proxies. It's especially valuable for engineering managers doing budget planning, freelancers tracking client project costs, or anyone investigating surprise bills from providers like Cursor or Windsurf. The zero-config experience means you get insights in under 60 seconds. Skip it if you only use a single AI assistant casually (the native provider dashboard is sufficient), you need real-time cost controls rather than retrospective analysis, or you're working in an enterprise environment with custom AI infrastructure that deviates from standard installation paths. Also skip if your team uses providers outside the supported 18—CodeBurn won't magically parse unknown log formats.

CodeBurn: Tracking Where Your AI Coding Budget Actually Goes

CodeBurn: Tracking Where Your AI Coding Budget Actually Goes

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

CodeBurn: Tracking Where Your AI Coding Budget Actually Goes

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]