repo2file: The Zero-Dependency Context Dumper for LLM-Assisted Development
Hook
You’ve hit the context window limit mid-conversation with Claude, frantically copying files one by one. There’s a better way, and it’s a simple Python script.
Context
The explosion of LLM-assisted development has created a mundane but frustrating workflow problem: how do you give Claude, ChatGPT, or other models enough context about your codebase without manually copying dozens of files? Developers resort to tedious copy-paste sessions, losing precious time and often forgetting critical files that would help the model understand their architecture.
This context-sharing problem is particularly acute for one-off tasks—code reviews, architecture discussions, debugging sessions, or “explain this codebase to me” queries. While sophisticated RAG pipelines and IDE integrations solve this for ongoing work, they’re overkill when you just need to dump a small project into a prompt window. repo2file targets this specific pain point: a Python script that traverses your repository, respects your .gitignore patterns, and produces a structured text file ready to paste into any LLM interface.
Technical Insight
The architecture of repo2file is refreshingly straightforward, which is precisely why it works. The tool performs two primary operations: building a visual tree representation of your directory structure, and concatenating file contents with clear delimiters.
The basic invocation works as documented:
python dump.py /path/to/your/repo output.txt .gitignore py js tsx
This command starts at your repo root, reads exclusion patterns from .gitignore, and filters to only Python, JavaScript, and TypeScript files. The output file contains two distinct sections: a tree-like structure showing your project hierarchy, followed by complete file contents with path headers.
The tree generation produces output like:
Directory Structure:
-------------------
/
├── package.json
├── next.config.js
├── public/
│ └── images/
│ ├── astro.png
│ └── astro-logo.svg
├── src/
│ ├── app/
│ │ ├── layout.tsx
│ │ └── page.tsx
This tree serves a critical purpose beyond aesthetics: it gives LLMs spatial reasoning about your codebase. When you ask “where should I add authentication?”, the model can reference the tree structure to understand your routing architecture before suggesting specific files.
The gitignore integration is where repo2file shows unexpected sophistication for such a minimal tool. According to the README, it respects .gitignore patterns to exclude unnecessary files, ensuring that build artifacts, node_modules, and .env files never make it into your dump. This is crucial for both performance and security when sharing code context with external LLM services.
The file concatenation strategy uses clear delimiters to help models distinguish between files:
File: src/app/page.tsx
--------------------------------------------------
Content of src/app/page.tsx:
[actual file contents]
File: src/app/layout.tsx
--------------------------------------------------
Content of src/app/layout.tsx:
[actual file contents]
This delimiter approach works because modern LLMs are trained on similar formats (documentation, concatenated code examples, log files). The models naturally parse these boundaries without requiring special markup or XML-style tags.
The optional file extension filtering is deceptively powerful. For a large monorepo, you might dump only TypeScript files for a frontend question, then create a separate dump of Python files for a backend query. This surgical approach helps you stay within token limits while maintaining relevant context:
# Frontend context dump
python dump.py ./monorepo frontend-context.txt .gitignore tsx ts
# Backend context dump
python dump.py ./monorepo backend-context.txt .gitignore py
The beauty of repo2file’s architecture is what it doesn’t do. There’s no token counting, no chunking strategies, no vector embeddings, no API integrations. It’s a Unix-philosophy tool that does one thing well: transform repository structure into LLM-digestible text.
Gotcha
The simplicity that makes repo2file useful also defines its limitations. The tool appears to read files as-is, which could produce issues with binary files in your output. There’s no documented file size checking, so you’re responsible for ensuring your .gitignore patterns are comprehensive enough to exclude problematic files.
The bigger issue is scalability. The README’s best practices section explicitly warns to “be mindful of sensitive information” and notes that “for large repositories, consider dumping only relevant sections to stay within LLM token limits.” For any substantial repository, you may exceed LLM context windows. repo2file provides no chunking, no smart truncation, and no guidance on staying within limits beyond the general warning. You’ll discover you’ve exceeded the context window only after pasting the entire dump and receiving an error. For ongoing development work or large codebases, this workflow may break down, and you might need tools with built-in context management.
Verdict
Use if: You’re working with small-to-medium projects, need to share context for one-off LLM conversations like architecture reviews or bug investigations, want zero setup time and minimal dependencies (Python only), or need to create periodic snapshots of specific subsections of larger repositories using file extension filtering. It’s perfect for consultants reviewing client code, educators preparing teaching materials, or developers doing focused work with AI assistance. Skip if: Your repository is very large or you’re regularly hitting context window limits, you need ongoing AI integration during active development, you’re working with repositories containing many binary files or large data files, or you want semantic search and intelligent context selection. repo2file is brilliantly simple for its specific purpose: dumping repository structure and content into a single LLM-ready text file.