Pydantic AI: Type-Safe GenAI Agents with Dependency Injection and Durable Execution
Hook
The same team that built the validation layer used by OpenAI, Anthropic, and virtually every major LLM SDK just released their own agent framework—and it’s designed to make GenAI apps feel as ergonomic as FastAPI.
Context
If you’ve built any production GenAI application in Python, you’ve almost certainly used Pydantic—even if you didn’t realize it. The OpenAI SDK, Anthropic SDK, LangChain, LlamaIndex, and dozens of other frameworks rely on Pydantic for validation. Yet despite this ubiquity, the Pydantic team found themselves frustrated when building LLM-powered features for Pydantic Logfire. Existing frameworks felt heavy, type-unsafe, and nothing like the delightful developer experience FastAPI had pioneered.
Pydantic AI emerged from this gap: a GenAI agent framework built by the people who understand validation and type safety better than anyone. It’s not trying to be everything to everyone—instead, it focuses on bringing Python’s modern type system, dependency injection, and structured validation to agent development. The result is a framework where your IDE can catch errors before runtime, where streaming outputs are validated incrementally, and where production concerns like observability and durable execution are first-class features rather than afterthoughts.
Technical Insight
The architecture of Pydantic AI centers on agents configured with models, system prompts, and tools—but the implementation details reveal why it feels different from alternatives. Every interaction is type-safe by default, leveraging Python’s type hints to provide IDE autocomplete and static analysis throughout your agent code.
Here’s a minimal agent that demonstrates the core pattern:
from pydantic_ai import Agent
agent = Agent(
'anthropic:claude-3-5-sonnet-latest',
instructions='You are a helpful assistant.'
)
result = agent.run_sync('What is the capital of France?')
print(result.output)
The real power emerges when you add tools and structured outputs. Tools in Pydantic AI are just Python functions decorated with type hints—the framework introspects these hints to generate LLM-compatible schemas automatically. Dependency injection allows tools to access runtime context (databases, APIs, user sessions) without coupling your agent logic to specific implementations:
from pydantic import BaseModel
from pydantic_ai import Agent, RunContext
from dataclasses import dataclass
@dataclass
class DatabaseDeps:
conn: DatabaseConnection
user_id: str
class WeatherResult(BaseModel):
location: str
temperature: float
conditions: str
agent = Agent(
'openai:gpt-4o',
result_type=WeatherResult
)
@agent.tool
async def get_user_location(ctx: RunContext[DatabaseDeps]) -> str:
# Access injected dependencies with full type safety
return await ctx.deps.conn.fetch_location(ctx.deps.user_id)
@agent.tool
async def fetch_weather(location: str) -> dict:
# Tool logic separated from dependency concerns
return await weather_api.get(location)
result = await agent.run(
'What\'s the weather like for me?',
deps=DatabaseDeps(conn=db, user_id='123')
)
# result.output is a validated WeatherResult instance
print(result.output.temperature)
The RunContext pattern is elegant: tools receive typed context that your IDE understands, enabling autocomplete for ctx.deps fields. The separation between agent definition and runtime dependencies makes testing trivial—you can inject mock dependencies without modifying agent code.
Streaming support extends beyond text generation to structured outputs. As the LLM generates a response, Pydantic AI validates each chunk incrementally, letting you access partial results with confidence that they’ll match your schema:
async with agent.run_stream('Generate weather data') as result:
async for message in result.stream_text():
print(message, end='', flush=True)
# Final validated result available after streaming completes
validated_output = result.output
Durable execution, a standout feature for production systems, allows agents to survive failures and restarts. By persisting conversation state and tool call results, long-running workflows can resume exactly where they left off—critical for agents that interact with humans on unpredictable timelines or perform multi-step operations that might encounter transient failures.
The observability story integrates tightly with Pydantic Logfire, though the framework supports any OpenTelemetry-compatible backend. Every agent run, tool call, and validation error is traced automatically, giving you visibility into token usage, latency, and decision paths. For teams already invested in observability infrastructure, the OpenTelemetry support means you’re not locked into a specific vendor.
Pydantic AI is model-agnostic with adapters for every major provider: OpenAI, Anthropic, Google, DeepSeek, and platforms like Bedrock, Vertex AI, and Ollama. The adapter pattern means switching providers is a one-line change, and implementing custom adapters for proprietary models is straightforward. The framework also integrates emerging standards like the Model Context Protocol (MCP) for external tool access and Agent2Agent (A2A) for inter-agent communication, positioning it well for evolving GenAI architectures.
Gotcha
The type-safety that makes Pydantic AI powerful requires buy-in to Python’s type hint ecosystem. If your team isn’t using mypy or IDE type checking, you’ll miss half the value proposition. Setting up proper tooling—type checkers, IDE plugins—is essential but not automatic. Developers unfamiliar with Pydantic’s validation model will face a learning curve around BaseModel definitions, field validators, and how schemas map to LLM calls.
The framework is young. While the project has garnered 15,660 GitHub stars indicating strong interest, the ecosystem of community examples, integration patterns, and battle-tested recipes is still forming. LangChain, despite its complexity, has years of Stack Overflow answers and blog posts covering edge cases. With Pydantic AI, you’re more likely to be charting new territory. The tight integration with Pydantic Logfire for observability is excellent if you’re starting fresh, but teams committed to DataDog, New Relic, or proprietary platforms may feel pressure to adopt another tool—even though OpenTelemetry support theoretically allows flexibility.
Verdict
Use Pydantic AI if you’re building production GenAI applications in Python where reliability and type safety matter more than ecosystem maturity. It’s the right choice for teams already using Pydantic or FastAPI, for applications requiring durable execution or human-in-the-loop workflows, and for developers who want their IDE to catch agent errors at write-time instead of discovering them in production logs. The dependency injection pattern alone makes it worthwhile for complex agents with multiple data sources. Skip it if you need a mature ecosystem with thousands of examples and integrations, if your team doesn’t use type hints or static analysis tools, if you’re locked into non-OpenTelemetry observability platforms, or if you’re building simple LLM wrappers that don’t justify the architectural overhead. For quick prototypes or exploratory projects, lighter-weight libraries might get you running faster—but when you’re ready to ship agents to production, Pydantic AI’s design decisions start paying dividends.