Inside the AI Agent Framework Wars: A Hands-On Comparison Repository
Hook
There are now more AI agent frameworks than JavaScript testing libraries—and just like those early testing wars, most developers are choosing based on Twitter hype rather than actual code. One repository is trying to fix that.
Context
The AI agent ecosystem has exploded in the past 18 months. What started with LangChain and AutoGen has fragmented into dozens of frameworks, each claiming to be the best way to build autonomous agents. CrewAI promises role-based collaboration. AG2 touts conversational patterns. Google's ADK brings Google's infrastructure muscle. Claude's Agent SDK offers Anthropic's opinionated approach. For developers trying to choose, the official documentation for each framework naturally presents a rose-tinted view of capabilities.
The traditional approach to framework evaluation—reading documentation, watching conference talks, scrolling through Twitter threads—burns days of research time without providing the tactile understanding that comes from running actual code. You need to see how each framework handles tool calling, manages multi-agent conversations, orchestrates workflows, and integrates with LLMs. The martimfasantos/ai-agents-frameworks repository emerged as a structured learning environment where developers can compare these frameworks through runnable examples rather than marketing materials. It's not another framework attempting to unify them all; it's an educational playground that respects each framework's unique approach while making comparison frictionless.
Technical Insight
The repository's architecture is deliberately simple: isolated directories for each framework, version-locked dependencies, and parallel example implementations that demonstrate the same concepts across different paradigms. This isn't accidental—it's a teaching tool optimized for comparison rather than production deployment.
Consider how the repository approaches tool usage, one of the most fundamental agent patterns. In AG2 (formerly AutoGen), tools are registered to agents through function definitions with type hints:
from autogen import ConversableAgent, register_function
import os
def get_weather(location: str) -> str:
"""Get weather for a location."""
return f"The weather in {location} is sunny, 72°F"
agent = ConversableAgent(
name="assistant",
llm_config={"model": "gpt-4", "api_key": os.getenv("OPENAI_API_KEY")}
)
register_function(
get_weather,
caller=agent,
executor=agent,
description="Get current weather for a location"
)
Contrast this with CrewAI's approach, which wraps tools in a more declarative, role-based structure:
from crewai import Agent, Task, Crew
from crewai_tools import tool
@tool("Weather Tool")
def get_weather(location: str) -> str:
"""Get weather for a location."""
return f"The weather in {location} is sunny, 72°F"
weather_agent = Agent(
role="Weather Specialist",
goal="Provide accurate weather information",
backstory="Expert meteorologist with real-time data access",
tools=[get_weather],
verbose=True
)
task = Task(
description="Get weather for San Francisco",
agent=weather_agent,
expected_output="Weather report"
)
crew = Crew(agents=[weather_agent], tasks=[task])
result = crew.kickoff()
The same underlying operation—calling a function to get weather data—requires completely different mental models. AG2 treats agents as conversational entities that can call functions during dialogue. CrewAI frames everything around roles and tasks, with agents as specialized team members. Neither approach is inherently superior; they solve different organizational problems.
The repository's version-locking strategy deserves attention. Each framework directory includes pinned dependencies (AG2 0.11.5, CrewAI 1.14.2, etc.) because AI agent frameworks are moving fast enough that examples break between minor versions. This is pragmatic engineering—reproducibility matters more than cutting-edge features when you're learning core patterns. The tradeoff is maintenance burden, but it ensures that when you clone the repository and run an example, it actually works.
Multi-agent patterns reveal even deeper architectural differences. AG2 excels at modeling conversations between agents with different capabilities, using group chat patterns and nested conversations. CrewAI structures collaboration through sequential or hierarchical task delegation. Google's ADK takes a more procedural approach with explicit workflow definitions. By implementing similar multi-agent scenarios across frameworks, the repository exposes these philosophical differences in ways that documentation often obscures. You see that choosing a framework isn't about feature checklists—it's about which mental model matches your problem domain.
The repository also covers integration patterns with different LLM providers. Some frameworks are tightly coupled to OpenAI's API structure, while others abstract provider details more cleanly. This matters enormously for production systems where you might need to switch providers for cost, latency, or capability reasons. Seeing how each framework handles model configuration, streaming responses, and error handling in runnable code provides signal that documentation can't.
Gotcha
The repository's greatest strength—being a comparative learning tool—is also its primary limitation. These are introductory examples, not production templates. You won't find error handling strategies, retry logic, observability integration, or cost management patterns. There's no guidance on scaling agent systems beyond toy examples or managing the inevitable failures when LLMs hallucinate tool schemas or agents get stuck in conversation loops.
Maintenance is the other elephant in the room. AI agent frameworks are evolving rapidly, with breaking changes arriving in minor version bumps. The version-locked approach provides stability for learners but means examples can drift from current best practices. A framework that looked clunky in version 0.8 might have elegant patterns in 1.2 that aren't reflected in the repository. The maintainer is fighting against entropy—every framework update potentially invalidates examples. For users, this means you're learning patterns that might already be outdated, and you'll need to verify approaches against current framework documentation before building anything serious. The repository is a starting point for evaluation, not a definitive guide to production deployment.
Verdict
Use if: You're choosing between AI agent frameworks for a new project and need hands-on code comparison beyond marketing claims. You learn better from running code than reading documentation. You want to understand architectural differences between conversation-based, role-based, and workflow-based agent patterns before committing to one paradigm. You're teaching or learning AI agent concepts and want a structured progression through multiple implementations. Skip if: You've already selected a framework and need deep expertise in its advanced features—official documentation and community examples will serve you better. You need production-ready templates with error handling, observability, and deployment patterns. You require bleeding-edge framework features rather than stable, educational examples. You want a unified abstraction layer that lets you switch frameworks without rewriting code—this repository explicitly avoids creating such abstractions in favor of showing each framework's authentic approach.