How code-review-graph Cuts AI Code Review Tokens by 8.2× Using Persistent Knowledge Graphs
Hook
Your AI coding assistant re-reads your entire codebase for every code review, burning tokens like a developer who skips the docs. What if it only read the 12 files that actually matter?
Context
AI coding assistants have a dirty secret: they're embarrassingly inefficient. Every time you ask Claude or Cursor to review a pull request, they scan hundreds—sometimes thousands—of files to understand context. A two-line bug fix in a React component triggers a full traversal of your frontend, backend API contracts, and test suites. The cost isn't just API bills; it's latency. Large context windows take 10+ seconds to process, and you hit token limits on medium-sized repos.
The fundamental problem is statelessness. LLMs don't maintain memory between requests, so conventional AI tools solve this by cramming as much code as possible into each prompt. Some use embeddings for similarity search, but that's semantic—it can't tell you which files actually call the function you modified. Others use git diffs plus heuristics, but miss transitive dependencies. code-review-graph attacks this differently: it builds a persistent, queryable map of your codebase's structure—functions, classes, imports, calls, inheritance—and exposes it to AI assistants through the Model Context Protocol. Instead of reading everything, the AI queries the graph for blast radius: the precise set of files affected by a change.
Technical Insight
The architecture is refreshingly straightforward. code-review-graph uses Tree-sitter—the same incremental parsing library that powers GitHub's code navigation—to extract Abstract Syntax Trees from your source files. For each file, it identifies nodes (functions, classes, methods), edges (calls, imports, inheritance), and metadata (test coverage associations). This graph gets stored in SQLite with tables for files, symbols, relationships, and change history.
Here's what a typical MCP tool call looks like from Claude's perspective:
# AI assistant queries: "What's affected by changes to user_auth.py?"
# code-review-graph executes:
result = graph.get_blast_radius(
changed_files=['src/auth/user_auth.py'],
depth=2, # Traverse 2 levels of callers
include_tests=True
)
# Returns:
# {
# "direct_dependents": ["src/api/login.py", "src/api/register.py"],
# "transitive_dependents": ["src/middleware/session.py"],
# "related_tests": ["tests/auth/test_user_auth.py", "tests/api/test_login.py"],
# "token_count": 2847 # vs 19,230 for full codebase
# }
The blast-radius algorithm is the real innovation. When you modify a function, the system walks the graph backward through call edges, identifies all callers, then traces their dependents. It follows import relationships bidirectionally and maps test files through naming conventions (test_*.py) and import patterns. The result is a minimal set of files that could be affected by your change.
Incremental updates solve the staleness problem. Every file gets SHA-256 hashed, and the digest is stored alongside the graph. On each update cycle (triggered by file watch or manual refresh), code-review-graph compares hashes, identifies changed files, and re-parses only those nodes. For a 2,900-file Django project, incremental updates complete in under 2 seconds. The system is smart about cascading: if you change a function signature, it invalidates callers but not unrelated modules.
The MCP integration is what makes this universally useful. Instead of writing custom plugins for Cursor, Zed, Continue, and a dozen other tools, code-review-graph exposes a standardized MCP server that any compliant AI assistant can query. You run one command—npx @code-review-graph/mcp init—and it auto-detects your editor, injects the configuration, and starts the graph server. From the AI's perspective, it gains new tools like analyze_blast_radius, get_symbol_references, and find_test_coverage.
Language support is comprehensive because Tree-sitter does the heavy lifting. The system bundles grammars for Python, JavaScript, TypeScript, Go, Rust, Java, C++, and 16 more languages. It even handles Jupyter notebooks by parsing each code cell independently and tracking cross-cell dependencies. The graph schema is language-agnostic—a Python function and a Rust fn are both stored as callable symbols with similar edge types.
The token savings are measurable and dramatic. In benchmarks against real repositories, code-review-graph achieved 6.8× reduction on PR reviews (median across 50 PRs) and up to 49× on monorepos where changes touched core utilities with hundreds of downstream dependents. The express.js repo showed 0.7× efficiency on single-file changes—the only case where graph overhead exceeded raw file size—but averaged 4.2× across typical multi-file PRs.
Gotcha
The token math reverses on tiny projects. For codebases under 100 files, the graph metadata—table schemas, relationship indices, symbol signatures—actually consumes more tokens than just reading the raw source. The express.js benchmark revealed this: single-file changes in small repos hit 0.7× efficiency because you're sending structured graph data instead of plain text. The break-even point sits around 200-300 files, depending on coupling density.
Tree-sitter parsing isn't perfect, especially for cutting-edge language features. If you're using TypeScript 5.3's decorators or Python 3.12's new syntax, the grammar might lag behind. The system gracefully falls back—unparseable files get marked as blobs—but you lose graph edges for those nodes. Complex macro systems (Rust, C++) and dynamic imports (JavaScript) can confuse the static analysis. The blast radius becomes a best-effort approximation rather than a guarantee, and you might miss dependencies that only exist at runtime. The project is young enough (despite 15k stars) that edge cases in large polyglot monorepos remain unexplored territory.
Verdict
Use if: You're working on codebases over 500 files where AI assistant latency and token costs have become painful, especially monorepos with deep dependency chains. The 8.2× average token reduction translates directly to faster reviews, lower API bills (critical if you're paying per-token), and better context utilization. Teams standardizing on AI-assisted development should adopt this immediately—the MCP integration means one setup works across Cursor, Zed, Continue, and Claude Code. It's particularly valuable for PR review workflows where blast-radius analysis shines. Skip if: Your project is under 200 files, or you're doing mostly greenfield development without frequent reviews. The overhead isn't justified for small codebases, and the token savings evaporate. Also skip if your stack uses heavy metaprogramming (Rails magic, C++ templates) where static analysis struggles—you'll get incomplete graphs and miss dependencies. Finally, if you're not already using AI coding assistants, this is a solution in search of a problem.