Back to Articles

Building an AI Command Completion Plugin: Inside ZSH Codex's Architecture

[ View on GitHub ]

Building an AI Command Completion Plugin: Inside ZSH Codex's Architecture

Hook

What if your terminal could understand 'find all PDFs modified in the last week and compress them' and translate that into a working command? ZSH Codex does exactly this by putting AI completion one keystroke away.

Context

The command line has remained stubbornly unchanged for decades—you either know the exact syntax of commands or you spend time searching Stack Overflow and man pages. This friction multiplies when dealing with complex tools like find, awk, or ffmpeg where constructing the right incantation often requires consulting documentation repeatedly. While GUI applications benefited from autocomplete and intelligent suggestions for years, terminal users were stuck with basic tab completion and command history.

When OpenAI released Codex (the model powering GitHub Copilot) and made it available via API, tom-doerr saw an opportunity to bring code generation directly into the shell. ZSH Codex emerged as one of the first practical implementations of AI-assisted command construction, letting developers describe what they want in natural language and receive syntactically correct shell commands instantly. Rather than building a separate CLI tool that requires explicit invocation, the plugin integrates directly into the ZSH prompt, making AI assistance feel native to the shell experience.

Technical Insight

ZSH Codex uses a surprisingly simple architecture that demonstrates how to bridge shell environments with external APIs effectively. At its core, the plugin consists of two components: a ZSH keybinding that captures buffer state and a Python script that handles API communication.

The ZSH integration lives in a single function that binds to Ctrl+X. When triggered, it reads the current command line buffer (using ZSH's $BUFFER variable), calls the Python backend, and inserts the completion back into the buffer. Here's the essential pattern:

bindkey '^X' create_completion

create_completion() {
    BUFFER_BEFORE="$BUFFER"
    BUFFER=$(echo "$BUFFER" | python3 /path/to/zsh_codex.py)
    CURSOR=$#BUFFER
    zle reset-prompt
}
zle -N create_completion

This approach keeps the ZSH side minimal—it's purely responsible for I/O between the shell and Python. All API logic lives in Python, which provides better error handling, HTTP client libraries, and configuration parsing than you'd get with pure shell scripting.

The Python backend implements a flexible provider system that abstracts different LLM services behind a common interface. Configuration happens through an INI file where each section defines a provider with its API key, endpoint, and model:

import configparser
import openai

config = configparser.ConfigParser()
config.read('~/.zsh_codex/config.ini')

active_provider = config.get('settings', 'active_provider')
provider_config = config[active_provider]

if provider_config['type'] == 'openai':
    openai.api_key = provider_config['api_key']
    completion = openai.Completion.create(
        model=provider_config.get('model', 'code-davinci-002'),
        prompt=f"# {command_buffer}\n#",
        max_tokens=100,
        temperature=0
    )
    return completion.choices[0].text.strip()

The prompt engineering is deliberately minimal—wrapping the user's input in comment syntax (# user input\n#) signals to the model that it should complete the thought as executable code. Setting temperature to 0 ensures deterministic outputs, which is crucial for command generation where creativity is less important than correctness.

One of the more sophisticated features is optional context injection through pre-execution hooks. The plugin can run shell commands before generating completions to provide environmental context:

# In config.ini:
# [context_commands]
# git = git diff --cached
# files = ls -la

context = ""
for cmd_name, cmd in config['context_commands'].items():
    try:
        result = subprocess.run(cmd, shell=True, capture_output=True, 
                              timeout=2, text=True)
        context += f"\n# {cmd_name}:\n{result.stdout}\n"
    except subprocess.TimeoutExpired:
        pass

full_prompt = context + f"# {command_buffer}\n#"

This allows prompts like 'commit these changes' to work because the model receives the actual diff output as context. However, this feature is a double-edged sword—automatically executing commands on keystroke can have unintended consequences if you're not careful about what's configured.

The plugin's provider-agnostic design means you can hot-swap between OpenAI, Gemini, Groq, or self-hosted models without changing your workflow. This matters because API pricing, rate limits, and model capabilities vary significantly across providers. You might use GPT-4 for complex transformations but switch to a local Ollama instance for simple completions to avoid API costs.

Gotcha

The most significant limitation is the lack of filesystem awareness. When you type 'compress all videos in this directory,' the AI doesn't actually know what files exist unless you explicitly list them in your prompt or configure a context command to run ls. This creates a disconnect—the AI can generate syntactically perfect commands for scenarios that don't match your actual filesystem state. You might get a beautiful ffmpeg pipeline that operates on files that don't exist.

The pre-execution context feature, while powerful, is genuinely dangerous if misconfigured. Because it runs commands automatically when you press Ctrl+X, you could accidentally trigger expensive operations or state changes. Imagine configuring docker ps as a context command, then fat-fingering Ctrl+X while composing a destructive Docker command—you'd execute a potentially risky command just to get context. There's no sandbox or dry-run mode; context commands execute with your full shell permissions. Additionally, the plugin adds latency to your typing experience. Every completion requires a network round-trip to an API (unless using local models), which typically takes 1-3 seconds. This breaks the flow of command construction and makes trial-and-error iteration slower than just consulting documentation for users who already have a rough idea of the command structure they need.

Verdict

Use if: you frequently work with complex command-line tools where syntax is difficult to remember (ffmpeg, find, awk, git), you're already comfortable with API-based AI tools and their cost/privacy implications, you use ZSH and oh-my-zsh extensively, or you want to experiment with integrating LLMs into developer workflows. The plugin shines when translating high-level intent into low-level commands, especially for tools you use occasionally but not enough to memorize their flags. Skip if: you're primarily using simple commands that don't benefit from AI completion, you're concerned about API costs and don't want to self-host models, you need guaranteed offline functionality, or you're uncomfortable with the security implications of context pre-execution. For users who need shell assistance but want more control, consider invoking a tool like Shell-GPT explicitly rather than having it bound to a keystroke—you'll lose the inline experience but gain predictability and avoid accidental API calls.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/tom-doerr-zsh-codex.svg)](https://starlog.is/api/badge-click/developer-tools/tom-doerr-zsh-codex)