Code2Prompt: Engineering LLM Context from Entire Codebases with Rust-Powered Precision
Hook
The average developer now spends more time crafting prompts for LLMs than writing commit messages. Yet most still copy-paste files manually, losing git context and blowing past token limits without warning.
Context
As LLMs became coding assistants, a new workflow bottleneck emerged: getting your codebase into the model's context window. Early adopters cobbled together bash scripts—chaining find, cat, and tree commands—to concatenate files. But these brittle solutions ignored .gitignore rules, couldn't handle different file formats, and provided no feedback about token consumption until ChatGPT or Claude rejected your 200K token prompt.
The problem intensified with AI agents and RAG pipelines. Tools like AutoGPT and LangChain needed programmatic codebase ingestion, not one-off manual exports. Developers building documentation generators, code review bots, and migration assistants all faced the same challenge: transforming directory trees into structured, token-aware prompts that respected version control semantics. Code2Prompt emerged as a Rust-first solution to this exact pain point—treating codebase-to-prompt conversion as a legitimate infrastructure problem requiring performance, correctness, and ecosystem integration.
Technical Insight
Code2Prompt's architecture reveals sophisticated engineering choices that go beyond simple file concatenation. At its core sits code2prompt-core, a Rust library implementing a four-stage pipeline: scan, filter, template, and output. The scan phase leverages ignore crate's gitignore parser—the same battle-tested library used by ripgrep—ensuring that your .gitignore and .git/info/exclude rules are respected with perfect fidelity. This isn't trivial: gitignore parsing has edge cases around negation patterns, directory-specific rules, and precedence that naive implementations get wrong.
The filter stage supports both inclusion and exclusion via glob patterns, with an interesting architectural decision: filters are compiled once and applied during traversal, not post-collection. This means for a 10,000-file monorepo, you're not building a massive Vec and then filtering—you're rejecting files during the walk. Here's how you'd use it to grab only TypeScript files from a specific directory:
code2prompt \
--path ./src \
--include "**/*.ts" \
--exclude "**/*.test.ts" \
--exclude "**/node_modules/**" \
--tokens
The --tokens flag hooks into the tiktoken-rs library for GPT-family models, giving you real-time feedback: "This prompt will consume 45,239 tokens (GPT-4 context limit: 128,000)". This seemingly simple feature required careful design—tokenization happens in a streaming fashion to avoid loading the entire output into memory before counting.
But the most powerful feature is Git-aware context injection. Code2Prompt can include commit history, diffs, and branch comparisons directly in your prompt:
code2prompt \
--path . \
--git-diff "main..feature-branch" \
--git-log-branch "feature-branch" \
--template ./templates/code-review.hbs
This surfaces in the template context as structured data. Your Handlebars template receives variables like {{git_diff}}, {{git_log}}, and {{files}} as an array of objects with path, content, and extension properties. The template system enables output customization without recompiling:
# Code Review Request
## Branch: {{git_branch}}
### Changes:
{{git_diff}}
### Modified Files:
{{#each files}}
**{{this.path}}** ({{this.lines}} lines)
```{{this.extension}}
{{this.content}}
{{/each}}
Recent Commits:
{{git_log}}
This template-driven approach means the same tool serves documentation generation (using a docs-focused template), code migration (with before/after sections), and security audits (emphasizing sensitive patterns). The Rust implementation keeps this flexible system performant—processing a 50MB codebase typically takes under 2 seconds on modern hardware.
The Python bindings via PyO3 expose this functionality programmatically for AI agent builders:
```python
from code2prompt import Code2Prompt
generator = Code2Prompt(
path="./my-project",
include=["**/*.py"],
template="templates/refactor.hbs"
)
prompt = generator.generate()
token_count = generator.count_tokens()
if token_count < 100000:
response = llm.complete(prompt)
This SDK pattern makes Code2Prompt a building block rather than just a CLI toy. Teams are embedding it in CI pipelines to generate documentation on every merge, in Slack bots that answer codebase questions, and in VSCode extensions that feed context to local LLMs. The recent addition of an MCP (Model Context Protocol) server takes this further—exposing Code2Prompt as a resource that Claude Desktop and other MCP clients can query directly, turning static codebases into queryable knowledge sources for agentic workflows.
Gotcha
Code2Prompt's text-centric design means it gracefully ignores what it can't parse. Binary files, compiled artifacts, and images are skipped without error—which is usually what you want, but can be confusing when you expect a binary configuration file to appear in your prompt. The tool makes no attempt to extract metadata from PDFs, analyze image assets, or decompile binaries. If your codebase documentation lives in binary formats or you need to include database schemas stored as binary dumps, you'll need preprocessing steps.
Token counting, while useful, is an approximation. Code2Prompt uses tiktoken-rs for OpenAI models, which is accurate for GPT-family tokenization, but Claude and other models use different tokenizers. A prompt that Code2Prompt reports as 95,000 tokens might be 102,000 tokens in Claude's accounting, potentially causing rejected API calls. The tool doesn't yet support pluggable tokenizers, so cross-model accuracy requires manual verification. Additionally, template complexity can balloon output size unpredictably—a template with aggressive formatting might double your token count compared to raw concatenation, and there's no pre-render token estimation to warn you before generation completes.
Verdict
Use Code2Prompt if you're building AI agents, RAG pipelines, or automation that needs programmatic codebase access; if you regularly feed large codebases to LLMs for documentation, migration, or analysis tasks; if you value Git-aware context (diffs, logs, branch comparisons) in your prompts; or if you need template-driven output customization across different LLM workflows. The Rust performance and Python SDK make it particularly compelling for production systems where speed and programmability matter. Skip it if you only occasionally share small code snippets (manual copy-paste is faster); if your workflow requires binary file analysis or non-text artifact processing; if you need pixel-perfect token counting across multiple LLM providers (the approximations may cause API rejections); or if you're building a GUI-heavy tool for non-technical users (Handlebars templates assume developer comfort with templating languages).