MotleyCrew: Multi-Framework Agent Orchestration with Knowledge Graph-Backed Task Management
Hook
Most multi-agent frameworks lock you into their ecosystem. What if you could orchestrate a LangChain research agent, a CrewAI writer, and an AutoGen code executor in the same workflow—without writing adapter code?
Context
The explosion of AI agent frameworks has created a new problem: vendor lock-in at the orchestration layer. If you build your multi-agent system with CrewAI, you're stuck with CrewAI agents. Choose LangChain, and you're limited to their agent implementations. This fragmentation forces teams to either commit early to a single framework (risking future limitations) or build custom integration layers to combine the best tools from different ecosystems.
MotleyCrew emerged as an integration framework that sits above these competing standards. Instead of forcing you to choose between LangChain's extensive tool ecosystem, CrewAI's role-based abstractions, or AutoGen's conversational patterns, it provides a common orchestration layer. The key architectural decision: using a knowledge graph as the universal data store and task manager, allowing complex dependencies that go beyond simple DAG-based workflows. This approach positions MotleyCrew as middleware for multi-agent systems, similar to how Apache Airflow orchestrates data pipelines without caring which specific tools run inside each task.
Technical Insight
At its core, MotleyCrew wraps agents from different frameworks in a unified Task abstraction backed by a knowledge graph (implemented via Kuzu or other graph databases). This design allows you to chain heterogeneous agents using simple Python operators while maintaining sophisticated execution dependencies under the hood.
Here's what a basic multi-framework workflow looks like:
from motleycrew import MotleyCrew
from motleycrew.agents.langchain import LangChainMotleyAgent
from motleycrew.agents.crewai import CrewAIMotleyAgent
from motleycrew.tasks import SimpleTask
from langchain_community.tools import DuckDuckGoSearchRun
from crewai import Agent as CrewAgent
# Create a LangChain agent for research
research_agent = LangChainMotleyAgent(
name="Researcher",
tools=[DuckDuckGoSearchRun()],
verbose=True
)
# Create a CrewAI agent for writing
writer = CrewAIMotleyAgent(
agent=CrewAgent(
role="Technical Writer",
goal="Write clear technical content",
backstory="Expert at explaining complex topics"
)
)
# Define tasks and chain them
crew = MotleyCrew()
research_task = SimpleTask(
crew=crew,
agent=research_agent,
name="Research topic",
description="Research recent developments in {topic}"
)
writing_task = SimpleTask(
crew=crew,
agent=writer,
name="Write article",
description="Write an article based on: {research_output}"
)
# Chain tasks using the >> operator
research_task >> writing_task
# Execute with automatic dependency resolution
result = crew.run()
The >> operator is syntactic sugar over knowledge graph edges. When you write task_a >> task_b, MotleyCrew creates a dependency relationship in the graph, ensuring task_b receives task_a's output as context. This is more powerful than simple function composition because the graph can track task state, cache results, and handle partial failures.
The knowledge graph backend serves three purposes: it stores task definitions and their relationships, maintains execution state across runs, and acts as a queryable data store for agent outputs. Unlike frameworks that use in-memory state or simple databases, the graph structure naturally represents complex workflows where tasks might have multiple dependencies or conditional execution paths.
MotleyCrew's HTTP caching layer (motleycache) deserves special attention. During development, you'll frequently re-run agent workflows while tweaking prompts or logic. Without caching, each run makes expensive LLM API calls. MotleyCrew automatically caches HTTP requests (including OpenAI, Anthropic, and other LLM APIs) based on request signatures:
from motleycrew import MotleyCrew
from motleycrew.caching import enable_cache
# Enable caching for all HTTP requests
enable_cache(cache_dir=".motley_cache")
crew = MotleyCrew()
# Subsequent identical LLM calls hit cache instead of API
# Saves money and speeds up debugging significantly
This simple feature has outsized impact: a multi-agent workflow that costs $2 in API calls during the first run might cost $0.10 on subsequent debugging runs, with execution time dropping from 30 seconds to 2 seconds.
The framework implements LangChain's Runnable protocol throughout, making every component compatible with LCEL (LangChain Expression Language). This means you can drop MotleyCrew tasks into existing LangChain chains or vice versa:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
# Mix MotleyCrew tasks with pure LangChain runnables
chain = (
RunnablePassthrough()
| research_task # MotleyCrew task
| StrOutputParser() # LangChain parser
| writing_task # Another MotleyCrew task
)
result = chain.invoke({"topic": "vector databases"})
This interoperability is crucial for gradual adoption—you don't need to rewrite existing LangChain infrastructure to start using multi-framework orchestration.
The observability integration with Lunary provides distributed tracing across agent execution. Each agent action, tool call, and LLM interaction gets logged with timing and token usage data, creating a complete audit trail. For production deployments where debugging multi-agent interactions becomes critical, this built-in instrumentation saves weeks of custom logging implementation.
Gotcha
The knowledge graph abstraction adds real complexity that you'll feel immediately. For simple sequential workflows (agent A → agent B → agent C), you're trading straightforward function calls for graph traversal logic. The overhead isn't just conceptual—you need to run a graph database backend (Kuzu by default), which adds deployment dependencies and potential failure points.
Documentation remains sparse for advanced patterns. The repository shows basic chaining and parallel execution, but you're largely on your own for sophisticated graph structures or error recovery strategies. With only 405 GitHub stars, the community is small—expect to read source code rather than finding Stack Overflow answers. Production deployment stories are essentially nonexistent in the wild, making it hard to anticipate scaling issues or operational gotchas. The framework also makes strong assumptions about agent output formats for task chaining; if your agents produce complex structured outputs, you'll need custom glue code to map between task boundaries, somewhat undermining the seamless integration promise.
Verdict
Use if: You're building genuinely complex multi-agent systems that benefit from different frameworks' strengths (LangChain's tools + CrewAI's role abstractions + AutoGen's conversational patterns), you need sophisticated task dependency graphs beyond linear DAGs, or you're prototyping agent workflows and want built-in caching to reduce API costs during development. The knowledge graph backend particularly shines when task relationships are dynamic or you need queryable execution history. Skip if: Your workflow is primarily sequential with 3-4 agents where simple function composition suffices, you need battle-tested production reliability with extensive community support and debugging resources, or you're committed to a single framework ecosystem where native solutions (like LangGraph for LangChain) provide deeper integration. The operational overhead of running a graph database isn't justified for straightforward agent chains that could run on basic task queues.