Back to Articles

o1-engineer: Teaching OpenAI's Reasoning Model to Write Your Codebase

[ View on GitHub ]

o1-engineer: Teaching OpenAI’s Reasoning Model to Write Your Codebase

Hook

What if you could say ‘build me a REST API with authentication’ and watch as an AI not just generates code snippets, but creates the entire directory structure, writes the files, and explains its architectural decisions?

Context

The explosion of AI coding assistants has created a fragmented landscape. GitHub Copilot completes your functions. ChatGPT answers your questions. IDEs like Cursor embed AI into your editor. But there’s a conspicuous gap: tools that translate high-level project intent into actual file system operations. You can ask GPT-4 to design a microservice architecture, but you’re still manually creating folders, copying code snippets, and stitching everything together.

o1-engineer emerged as a command-line solution to this specific friction point. Built around OpenAI’s o1 model—specifically designed for enhanced reasoning rather than just pattern completion—it provides a conversational interface where developers describe what they want to build, and the tool orchestrates file creation, editing, and project structure. Created by Doriandarko and now sporting nearly 3,000 GitHub stars, it represents a different approach: rather than augmenting your editor or completing your code, it acts as a pair programmer who can actually touch your file system.

Technical Insight

commands & queries

parsed input

/add /edit /create

file paths

conversation history

file contents & structure

context-aware prompt

structured instructions

create/modify files

results & status

formatted output

response text

User CLI Input

Rich Terminal UI

Command Parser

Conversation Context Store

Added Files Cache

Prompt Builder

OpenAI/Grok API

File System Executor

System architecture — auto-generated

At its core, o1-engineer is a stateful conversation loop that maintains context about your project while translating natural language into file operations. The architecture is deliberately simple: a single Python script that orchestrates three key components—user input handling via Rich library prompts, OpenAI API calls with conversation history, and file system operations based on AI-generated instructions.

The magic happens in how it structures prompts to the o1 model. When you use the /add command to include files in context, o1-engineer doesn’t just send file contents—it builds a semantic map of your project that persists across the conversation. Here’s how a typical interaction flows:

# Simplified conceptual flow from o1-engineer
conversation_history = []
added_files = {}

while True:
    user_input = get_user_command()
    
    if user_input.startswith('/add'):
        file_path = extract_path(user_input)
        added_files[file_path] = read_file(file_path)
        # AI now knows about this file in subsequent calls
        
    elif user_input.startswith('/create'):
        # Build context-aware prompt
        prompt = build_prompt(
            task=user_input,
            existing_files=added_files,
            history=conversation_history
        )
        
        # o1 model returns structured instructions
        response = openai.chat.completions.create(
            model="o1-preview",
            messages=[{"role": "user", "content": prompt}]
        )
        
        # Parse AI response for file operations
        instructions = parse_instructions(response)
        execute_file_operations(instructions)

The /planning command demonstrates why the o1 model choice matters. Unlike GPT-4, which optimizes for speed and fluency, o1 is designed to ‘think before speaking’—performing multi-step reasoning before generating responses. When you ask for project planning, o1-engineer leverages this to produce architectural decisions with justifications:

$ o1-engineer
> /planning Create a Flask API with JWT authentication, PostgreSQL database, and Docker deployment

[AI generates structured plan]
1. Project Structure:
   - /app (Flask application)
   - /config (environment configs)
   - /migrations (Alembic migrations)
   - /tests (pytest suite)
   
2. Technology Choices:
   - Flask-JWT-Extended for auth (simpler than rolling custom JWT)
   - SQLAlchemy ORM (team familiarity, migration support)
   - Docker Compose (local dev parity with production)
   
3. Implementation Order:
   - Database models first (schema is foundation)
   - Auth endpoints (security before features)
   - Business logic (once auth is solid)
   - Containerization (after local validation)

> /create
[Executes the plan, creating all files and structure]

What makes this approach powerful is the separation between planning and execution. The AI isn’t just vomiting code—it’s reasoning about dependencies, explaining trade-offs, and then systematically implementing its own plan. The conversation history ensures each /create or /edit command builds on previous context, enabling iterative refinement without re-explaining your project structure.

The file editing mechanism uses a diff-style approach where the AI generates before/after code blocks. When you run /edit app.py, the tool sends the current file content to o1, receives modification instructions, and applies them. This is more reliable than asking the AI to regenerate entire files, which often introduces subtle bugs or loses existing logic.

One architectural choice that deserves attention: o1-engineer uses synchronous API calls with streaming disabled. While this makes the interface feel slower compared to tools that stream responses token-by-token, it’s actually more appropriate for file operations. You want the complete, reasoned response before touching your file system, not partial writes that might fail halfway through.

Gotcha

The single biggest limitation is the absence of safety rails. o1-engineer will execute whatever file operations the AI suggests, with no sandboxing, confirmation prompts for destructive changes, or rollback mechanism. If the AI misinterprets your instruction and decides to delete files or overwrite critical code, there’s no undo button. You’re trusting the AI’s reasoning completely, which is a risky proposition even with o1’s enhanced capabilities. Always run it in git-tracked directories and review diffs before committing.

The security model is also concerning for any shared or production environment. API keys are expected to be hardcoded in the script rather than managed through environment variables or secure vaults. The tool provides no audit logging, so you can’t trace which AI decisions led to which file changes. And because it’s a single-file script without modular architecture, extending it to add these safety features requires significant refactoring. This isn’t a tool you want running with elevated permissions or in CI/CD pipelines—it’s strictly for local development experimentation where you’re actively monitoring what it does.

Verdict

Use if: You’re prototyping new projects and want to move from idea to scaffold in minutes rather than hours. You’re comfortable with git and will review every AI-generated change before committing. You value conversational project planning and want an AI that explains its architectural reasoning, not just generates code. You work solo or in small teams where everyone understands the tool is experimental. Skip if: You need production-grade safety controls, audit trails, or team collaboration features. You’re working on established codebases where destructive changes could cause significant problems. You require IDE integration or prefer visual interfaces over command-line workflows. You’re uncomfortable with the security implications of hardcoded API keys and unrestricted file system access. For those use cases, stick with Cursor for IDE integration or Aider for safer git-aware code modification.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-dev-tools/doriandarko-o1-engineer.svg)](https://starlog.is/api/badge-click/ai-dev-tools/doriandarko-o1-engineer)