Back to Articles

o1-engineer: When Your Terminal Becomes a Conversational IDE

[ View on GitHub ]

o1-engineer: When Your Terminal Becomes a Conversational IDE

Hook

What if you could scaffold an entire microservice architecture by simply describing it in plain English to your terminal—no IDE, no plugins, just a conversation that writes code?

Context

The AI coding assistant landscape has exploded with IDE extensions and web-based tools, but they all share a common constraint: they're tethered to heavyweight editors or browser tabs. For developers who live in the terminal—managing servers via SSH, working in resource-constrained environments, or simply preferring the raw speed of CLI workflows—these solutions break the flow. You're writing infrastructure as code in vim, then context-switching to VS Code just to ask Copilot a question.

o1-engineer emerged from this friction. It's a Python CLI tool that brings conversational AI assistance directly to your command line, treating your terminal as a stateful development environment. Rather than one-off code generations, it maintains conversation history, tracks which files you've added to context, and translates slash commands into file operations. The creator designed it around OpenAI's o1 reasoning models (hence the name), but it now supports multiple LLM backends including XAI's Grok, positioning itself as a model-agnostic development companion for terminal-first workflows.

Technical Insight

At its core, o1-engineer implements a surprisingly elegant architecture: a REPL loop that intercepts slash commands before passing natural language to the LLM. The conversation state is maintained in a simple list structure that accumulates messages, while a separate dictionary tracks files added to context via /add. This stateful approach differentiates it from simpler prompt-and-response tools.

The command routing system is straightforward but effective. Here's how the /create command works under the hood:

# Simplified illustration of the /create workflow
def handle_create_command(conversation_history, api_client):
    # AI has already generated folder structure in /planning mode
    prompt = """Based on our conversation, generate the complete code 
    for each file in the planned structure. Format as:
    <file path='src/main.py'>
    # code here
    </file>"""
    
    conversation_history.append({"role": "user", "content": prompt})
    response = api_client.chat(conversation_history)
    
    # Parse XML-style tags from response
    files = parse_file_blocks(response)
    
    # Write directly to filesystem—no confirmation, no rollback
    for filepath, content in files.items():
        os.makedirs(os.path.dirname(filepath), exist_ok=True)
        with open(filepath, 'w') as f:
            f.write(content)

This reveals both the tool's strength and its Achilles' heel. The strength: minimal abstraction between intent and execution. You describe what you want, the LLM generates structured output, and files materialize. The weakness: those files overwrite existing content with zero safety rails. No git check, no backup, no dry-run mode.

The /planning command showcases where o1-engineer excels. When you describe a project architecture, it leverages the reasoning capabilities of models like o1-preview to break down requirements into folder structures and file stubs:

$ o1-engineer
You: /planning Create a FastAPI microservice with Redis caching, 
PostgreSQL database, and Docker deployment

Assistant: I'll structure this as:
/app
  /api
    /routes
      - users.py
      - health.py
    /models
      - user.py
    /services
      - cache.py
      - database.py
  - main.py
  - config.py
/tests
  - test_users.py
- Dockerfile
- docker-compose.yml
- requirements.txt

You: /create
[Generates all files with boilerplate code]

The context management system deserves attention. When you /add src/main.py, the tool doesn't just reference the filename—it reads the entire file content into the conversation history. This means the LLM sees your actual code when you ask for modifications:

def add_file_to_context(filepath, conversation_history):
    with open(filepath, 'r') as f:
        content = f.read()
    
    conversation_history.append({
        "role": "user",
        "content": f"File: {filepath}\n\n{content}"
    })

This simple approach works brilliantly for small files but becomes a token-burning liability with large codebases. Add a 2000-line module, and you've consumed significant context window space that persists for the entire conversation. There's no chunking, no summarization, no intelligent context pruning—just raw file dumping.

The multi-model support is implemented through a provider abstraction, though calling it an "abstraction" is generous. It's essentially branching logic that swaps API endpoints and authentication headers based on a flag. For OpenAI:

client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(
    model="o1-preview",
    messages=conversation_history
)

For Grok, it switches to XAI's endpoint but maintains the same message structure. This works because both providers implement OpenAI-compatible APIs, but it means you're locked into that interface pattern. Want to use Claude with its specific system message requirements? You'd need to fork the code.

The streaming response implementation deserves mention for user experience. Rather than waiting for complete responses, it yields tokens as they arrive:

for chunk in client.chat.completions.create(
    model="o1-preview",
    messages=messages,
    stream=True
):
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

This makes long code generations feel responsive, turning what would be 30-second black-box waits into incremental feedback. It's a small detail that significantly improves the developer experience during complex operations.

Gotcha

The security model is concerning for any shared environment. API keys are either hardcoded in the script or loaded from a .env file in the repository directory. There's no XDG config directory support, no system keychain integration, no scoped permissions. If you clone this into a shared project directory and forget about that .env file, you've just committed your OpenAI key to version control.

Token consumption is completely opaque. The tool provides no cost estimation, no token counting, no warnings before expensive operations. When you /add five large files to context and start a lengthy conversation, you're burning through tokens with zero visibility. At $15 per million input tokens for o1-preview, a single session analyzing a medium-sized codebase could cost several dollars without any indication. The lack of conversation reset or context pruning commands means your only option is to restart the entire session, losing all accumulated context.

File operations lack any safety mechanisms. The /create command will happily overwrite existing files, the /edit command applies diffs without confirmation, and there's no integration with git to at least show you what changed. You're essentially trusting the LLM's output blindly. For a tool designed to modify code, the absence of a --dry-run flag or preview mode is a glaring omission. Experienced developers will instinctively run this only in git repositories with clean working trees, but nothing in the tool enforces or encourages this practice.

Verdict

Use if: You're a terminal-native developer who prototypes rapidly in isolated environments, you maintain strict git hygiene anyway, and you want conversational AI assistance without leaving the CLI. It's exceptional for scaffolding new projects from scratch, generating boilerplate that you'll review and refine, or getting quick code reviews on experimental branches. The planning-to-creation workflow genuinely accelerates going from architecture ideas to runnable code. Skip if: You need guardrails for production code modifications, work in shared environments where API key security matters, require cost controls for API usage, or want sophisticated features like git integration, undo functionality, or multi-file refactoring with preview. For those scenarios, reach for Aider (git-native with better safety) or stick with IDE-integrated assistants that provide rollback mechanisms and token usage transparency.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-dev-tools/doriandarko-o1-engineer.svg)](https://starlog.is/api/badge-click/ai-dev-tools/doriandarko-o1-engineer)