Back to Articles

smolagents: Why Hugging Face Built an Agent Framework That Writes Python Code Instead of Calling Functions

[ View on GitHub ]

smolagents: Why Hugging Face Built an Agent Framework That Writes Python Code Instead of Calling Functions

Hook

What if your AI agent could write actual Python code to solve problems instead of being limited to predefined function calls? That’s exactly what Hugging Face’s smolagents does—and it fits in ~1,000 lines of code.

Context

Traditional agent frameworks like LangChain and AutoGPT operate on a function-calling paradigm: you define tools, the LLM decides which to invoke, and the framework handles execution. This works well for simple, predictable workflows, but creates constraints when tasks require creative problem-solving or multi-step reasoning that your predefined tools didn’t anticipate.

Hugging Face’s smolagents takes a radically different approach. Instead of asking “which function should I call?”, its CodeAgent asks “what Python code should I write?” The agent generates executable code, runs it in a sandboxed environment, observes results, and iterates. This code-first paradigm trades some reliability for massive flexibility—your agent isn’t limited to your tool definitions, it can compose solutions in arbitrary ways using standard Python libraries and logic.

Technical Insight

Model Backends

Generated Python Code

Python Code

Tool Execution Results

Final Answer

User Task

CodeAgent

Model Adapter

LLM Backend

Tool Registry

HuggingFace Hub

LangChain Tools

MCP Servers

HF Spaces

Execution Sandbox

E2B/Modal/Docker/WASM

OpenAI

Anthropic

LiteLLM

Local Models

System architecture — auto-generated

At its core, smolagents is remarkably simple. The entire agent logic lives in ~1,000 lines in agents.py, providing minimal abstraction over the code generation loop. Here’s what a basic agent looks like:

from smolagents import CodeAgent, WebSearchTool, InferenceClientModel

model = InferenceClientModel()
agent = CodeAgent(tools=[WebSearchTool()], model=model, stream_outputs=True)

agent.run("How many seconds would it take for a leopard at full speed to run through Pont des Arts?")

Under the hood, the CodeAgent generates Python code that uses the provided tools, executes it in a sandbox, and continues until the task completes. The architecture separates three critical concerns: model inference, tool definitions, and execution environments.

For model inference, smolagents is genuinely agnostic. You can use local transformers models, 100+ LLMs via LiteLLM integration, Hugging Face’s inference providers, or any OpenAI-compatible server:

from smolagents import LiteLLMModel
import os

model = LiteLLMModel(
    model_id="anthropic/claude-4-sonnet-latest",
    temperature=0.2,
    api_key=os.environ["ANTHROPIC_API_KEY"]
)

The tool system is equally flexible. You can source tools from the Hugging Face Hub, convert LangChain tools, connect to MCP servers, or even use an entire Hugging Face Space as a tool. Sharing is trivial—push your agent to the Hub as a Space:

agent.push_to_hub("m-ric/my_agent")
# Later: agent.from_hub("m-ric/my_agent")

But the most critical architectural decision is sandboxed execution. Because CodeAgent generates arbitrary Python code, running it directly would be catastrophic from a security perspective. smolagents addresses this by supporting multiple sandboxing backends: Blaxel, E2B, Docker, and even a Pyodide+Deno WebAssembly sandbox. This means generated code runs in isolated environments that protect your system from malicious or buggy agent-written code.

The code-first approach enables genuinely creative problem-solving. Instead of being constrained to predefined tool sequences, the agent can write loops, conditionals, data transformations, and complex logic that combines tools in ways you never anticipated. The tradeoff is that LLMs can generate syntactically incorrect or inefficient code—but the agent can observe errors and iterate, often fixing its own mistakes through the feedback loop.

The framework also supports multimodal inputs out of the box. You can pass text, images, video, or even audio to agents, making it suitable for vision-based tasks like web browsing or document analysis. This modality-agnostic design reflects Hugging Face’s broader ecosystem philosophy: models and tools should compose without friction.

Gotcha

The code generation paradigm introduces failure modes that structured function-calling avoids. LLMs can produce syntactically invalid Python, inefficient algorithms, or code that makes incorrect assumptions about data structures. While the agent can often recover through iteration, this makes CodeAgent less reliable than traditional agents for straightforward tasks where a simple function call would suffice. You’re trading determinism for flexibility.

Sandbox setup may add operational complexity depending on your chosen backend. While smolagents supports multiple backends—whether that’s Docker containers, cloud-based E2B environments, or WebAssembly runtimes—you’ll need to configure your execution environment. This setup and potential performance cost (sandbox initialization, network latency) can make smolagents heavier than in-process function execution frameworks. For production deployments, you’ll need to carefully manage sandbox lifecycle and resource limits.

Verdict

Use smolagents if you need maximum agent flexibility and want to solve complex, multi-step problems that benefit from code-level reasoning rather than predefined tool sequences. It’s ideal for research, prototyping, and creative problem-solving where you value transparency (that ~1,000-line core is genuinely readable) and want to leverage Hugging Face’s ecosystem for model/tool sharing. The multimodal support and model agnosticism make it particularly compelling if you’re experimenting with different LLMs or working with vision/audio tasks. Skip it if you need production-grade reliability for simple workflows, prefer the safety guarantees of structured function-calling, or want extensive batteries-included integrations without building your own tooling. The sandbox requirement and potential for code generation failures make it less suitable for deterministic enterprise workflows where consistency matters more than flexibility.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-agents/huggingface-smolagents.svg)](https://starlog.is/api/badge-click/ai-agents/huggingface-smolagents)