Back to Articles

Open Interpreter: How LLMs Execute Code on Your Local Machine Without Sandboxing

[ View on GitHub ]

Open Interpreter: How LLMs Execute Code on Your Local Machine Without Sandboxing

Hook

Open Interpreter has 62,817 GitHub stars for doing something that sounds terrifying: giving GPT-4 the ability to execute arbitrary Python, JavaScript, and shell commands directly on your local machine with your full system permissions.

Context

When OpenAI launched Code Interpreter in 2023, developers finally had a way to ask ChatGPT to analyze data, manipulate files, and write executable code. But the implementation was frustratingly constrained: no internet access, a 100MB file size limit, 120-second runtime caps, and a locked-down package environment that reset after every session. For developers who wanted the conversational interface of ChatGPT combined with the full power of their local development environment—complete with custom libraries, unlimited runtime, and filesystem access—there was no solution.

Open Interpreter emerged to fill that gap. Rather than asking “what if we made Code Interpreter safer,” it asked “what if we removed the guardrails entirely?” The result is a natural language interface that lets any LLM (GPT-4, Claude, or local models) generate and execute code on your machine with the same permissions you have. It’s ChatGPT’s Code Interpreter running locally, with full internet access, no pre-imposed time limits, and the ability to install any package or access any file you can.

Technical Insight

Natural language command

API call

Response with code blocks

Extracted code

Execution results

Context + results

Display output

Maintains state

User Input

LiteLLM Abstraction Layer

Language Model

GPT-4/Claude/Local

Code Block Parser

Code Executor

Python/JS/Shell

Conversation History

System architecture — auto-generated

Open Interpreter’s architecture is deceptively simple: it’s an interactive loop that maintains conversation state, sends prompts to an LLM, parses code blocks from responses, executes them locally, and feeds results back to the model. The critical design decision is using LiteLLM as an abstraction layer, which means the same codebase works with OpenAI’s GPT-4, Anthropic’s Claude, or local models running through Ollama or LM Studio without modification.

Here’s how you can use it programmatically to build stateful, multi-turn interactions:

from interpreter import interpreter

# Single-shot execution
interpreter.chat("Plot AAPL and META's normalized stock prices")

# Streaming responses for real-time feedback
message = "What operating system are we on?"
for chunk in interpreter.chat(message, display=False, stream=True):
    print(chunk)

# Multi-turn conversation with memory
interpreter.chat("My name is Killian.")
interpreter.chat("What's my name?")  # Remembers context

# Save and restore conversation state
messages = interpreter.chat("Analyze this dataset")
interpreter.messages = []  # Reset
interpreter.messages = messages  # Restore previous context

The system message is fully customizable, allowing you to modify the LLM’s behavior and permissions. Want to skip confirmation prompts for shell commands? You can inject instructions directly:

interpreter.system_message += """
Run shell commands with -y so the user doesn't have to confirm them.
"""

Under the hood, Open Interpreter parses the LLM’s response for code blocks (marked with language identifiers like python or javascript), executes them in the appropriate interpreter, and captures stdout, stderr, and return values. It then appends this execution result to the conversation history and sends it back to the LLM, creating a feedback loop where the model can see the results of its code and iterate.

The multi-language execution engine is what sets this apart from simple Python REPL wrappers. You can execute Python for data analysis, JavaScript for browser automation, and shell commands for system administration—all in the same conversation thread. The LLM decides which language to use based on context:

interpreter.chat("Control a Chrome browser to research the latest Python releases")
# LLM generates appropriate code for browser control

interpreter.chat("Now analyze those release dates with pandas")
# LLM switches to Python for data manipulation

The approval gate is the only explicitly mentioned security mechanism: before any code executes, you’re prompted to review and approve it. This puts trust entirely in the operator’s ability to spot malicious or buggy code. The conversation state persists across executions through the interpreter.messages array, which stores the full dialogue history including user prompts, LLM responses, and execution results. This stateful design enables complex workflows that would be impossible with stateless API calls—the LLM builds context across multiple code executions, learning from errors and refining its approach.

Gotcha

The security model is effectively “trust but verify,” and that verification step is entirely manual. Open Interpreter executes code with your full user permissions—it can delete files, make network requests, install packages, and modify system settings. The approval prompt before execution is your primary defense, which means you need to carefully read every code block and understand what it does. For developers comfortable reviewing code, this is manageable. For anyone else, it’s a potential risk.

Performance and quality are completely dependent on the underlying LLM. GPT-4 generates relatively robust, well-structured code. Local models running through Ollama or LM Studio might produce code with varying quality. The system doesn’t validate code before execution—it just runs it and returns the error if it fails. This means you may spend time debugging LLM mistakes with weaker models. The 100MB upload limit and 120-second runtime cap that OpenAI imposed weren’t arbitrary—they prevented runaway processes and resource exhaustion. Open Interpreter, by running in your local environment without these specific restrictions, allows processes to consume available resources until you manually intervene.

Verdict

Use Open Interpreter if you need AI-assisted automation for personal productivity tasks—data analysis, file manipulation, system administration—where you can carefully review each execution and you’re working in a development environment where mistakes aren’t catastrophic. It’s perfect for exploratory data science, automating repetitive local tasks, or prototyping workflows where the conversational interface saves time over writing scripts manually. The ability to use local models via Ollama or LM Studio makes it viable for privacy-sensitive work where sending data to OpenAI isn’t acceptable. Skip it if you’re building production systems that need deterministic behavior, working in environments where code execution requires audit trails and formal approval processes, or if you’re not comfortable reading and verifying code in multiple languages before it runs. The project uses the AGPL license, which may have implications for how you can use it in your projects. For those use cases, stick with OpenAI’s Code Interpreter despite its limitations, or use more constrained tools like GitHub Copilot CLI that suggest commands without executing them.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/openinterpreter-open-interpreter.svg)](https://starlog.is/api/badge-click/developer-tools/openinterpreter-open-interpreter)