Back to Articles

NPC-Powered AI Teams: A Deep Dive into npcpy's Multi-Provider Agent Framework

[ View on GitHub ]

NPC-Powered AI Teams: A Deep Dive into npcpy’s Multi-Provider Agent Framework

Hook

While most agent frameworks lock you into a single LLM provider or force you to rewrite code when switching between local and cloud models, npcpy lets you swap from Ollama’s local models to GPT-4 by changing a single parameter—without touching your agent logic.

Context

The LLM tooling ecosystem suffers from fragmentation. Build an agent with OpenAI’s API, and you’re committed to their pricing and rate limits. Experiment with local Ollama models, and you lose access to advanced features like function calling consistency. Want to add a second agent? You’re stitching together multiple libraries. npcpy emerged as a research-focused framework that treats LLM providers as interchangeable backends while layering multi-agent orchestration, tool calling, and multimodal generation into a unified API. It’s designed for rapid prototyping and experimentation, targeting researchers and developers who need to compare model behaviors, run hybrid local/cloud workflows, or teach AI development without framework lock-in.

The framework centers on the NPC abstraction—AI personas defined by directives, equipped with tools, and capable of delegation. This isn’t just anthropomorphic naming; it reflects a design philosophy borrowed from game development where NPCs have explicit roles, scoped capabilities, and interact through defined protocols. By encoding agents as version-controlled YAML files and providing built-in code execution, npcpy reduces the boilerplate between “idea” and “working multi-agent system” to a few dozen lines.

Technical Insight

Provider Abstraction Layer

At npcpy’s core sits get_llm_response, a unified function that normalizes interactions across OpenAI, Gemini, Ollama, Anthropic, and others. The implementation uses a simple provider routing pattern:

from npcpy import get_llm_response

# Local model via Ollama
response = get_llm_response(
    "Explain the CAP theorem",
    model='qwen3:4b',
    provider='ollama'
)

# Same code, cloud provider
response = get_llm_response(
    "Explain the CAP theorem",
    model='gpt-4o',
    provider='openai'
)

This abstraction extends to streaming, JSON mode, and structured outputs via Pydantic schemas. The format parameter deserves special attention—it handles both simple JSON and Pydantic models by injecting schema definitions directly into the LLM prompt:

from pydantic import BaseModel
from typing import List

class CodeReview(BaseModel):
    issues: List[str]
    severity: str
    suggested_fixes: List[str]

review = get_llm_response(
    "Review this function for race conditions: [code]",
    model='qwen3.5:2b',
    provider='ollama',
    format=CodeReview
)
# response['response'] is already parsed—access review['issues'] directly

The framework automatically converts the Pydantic schema to JSON Schema, appends it to the system prompt, and parses the response. This eliminates the manual schema-wrangling common in raw API calls.

The NPC and Agent Hierarchy

NPCs represent stateless AI personas. Agents extend NPCs with tool-calling capabilities. The framework provides three agent flavors:

  1. Agent: Comes with default tools (shell execution, Python REPL, file editing, web search)
  2. ToolAgent: Lets you add custom tools alongside defaults
  3. CodingAgent: Automatically executes code blocks from LLM responses

Here’s where the design gets interesting. Tools are just Python functions with type hints and docstrings:

from npcpy import ToolAgent
import subprocess

def run_benchmarks(suite: str = "all") -> str:
    """Execute performance benchmarks and return timing results."""
    result = subprocess.run(
        ["pytest", "benchmarks/", "-k", suite, "--benchmark-only"],
        capture_output=True, text=True
    )
    return result.stdout

perf_agent = ToolAgent(
    name='perf_engineer',
    primary_directive='Analyze performance regressions and suggest optimizations.',
    tools=[run_benchmarks],
    model='qwen3.5:2b',
    provider='ollama'
)

result = perf_agent.run("Check if the latest commit slowed down query processing")

The agent introspects the function signature to build tool schemas, calls the LLM with available tools, and executes selected functions—all without explicit tool registration boilerplate.

Team Orchestration and the Jinx System

Teams coordinate multiple NPCs through explicit delegation. Unlike frameworks that use implicit message-passing, npcpy uses @mentions for routing:

from npcpy import NPC, Team

architect = NPC(
    name='architect',
    primary_directive='Design system architecture. Delegate implementation to @backend and @frontend.'
)
backend = NPC(name='backend', primary_directive='Implement APIs and databases.')
frontend = NPC(name='frontend', primary_directive='Build user interfaces.')

team = Team(npcs=[architect, backend, frontend], forenpc='architect')
result = team.orchestrate("Build a real-time chat application")

The forenpc parameter designates the coordinator. When the architect’s response contains @backend, the team routes that message to the backend NPC, maintaining conversation context.

Jinxes are YAML-defined tools with Jinja templating, stored in jinxes/ directories within team folders. They can be bound to specific NPCs via the npc: field, and can access team-level context variables defined in team.ctx. This enables shared state across agents without global variables—particularly useful for maintaining database connections, API credentials, or shared knowledge bases.

Multimodal Generation

Unlike frameworks that treat text, images, and audio as separate concerns, npcpy provides unified generation functions:

from npcpy.llm_funcs import gen_image, gen_video
from npcpy.gen.audio_gen import text_to_speech

# Image generation—abstracts DALL-E, Gemini Imagen, or local diffusers
images = gen_image(
    "A network topology diagram showing microservices architecture",
    model='dall-e-3',
    provider='openai'
)
images[0].save("architecture.png")

# Audio generation for accessibility or voice interfaces
audio = text_to_speech(
    "Deployment completed successfully",
    engine="elevenlabs",
    voice="professional_male"
)

The video generation support via Gemini’s Veo models is particularly novel for an agent framework, enabling workflows like “generate a tutorial video from this documentation” entirely within the same codebase.

Gotcha

Documentation Depth

While the README provides working examples for core features, some advanced capabilities require experimentation to fully understand. For instance, the specifics of Jinx syntax beyond basic examples, the full range of team context variables, and provider-specific configuration options may require source code inspection. Features like Ollama cloud model support (model='minimax-m2.7:cloud') are demonstrated in examples but not extensively documented.

Code Execution Considerations

CodingAgent automatically executes LLM-generated code, and the default Agent includes shell execution (sh) capabilities. While this enables rapid prototyping and is well-suited to research environments, developers should be aware of these execution capabilities when designing their systems. For production deployments or when processing untrusted prompts, consider the appropriate security boundaries for your use case—such as containerization or execution approval workflows.

Framework Maturity

With 1,270 GitHub stars and active development, npcpy appears to be a growing framework. As with any evolving project, developers building production systems should evaluate whether the current feature set and stability align with their requirements. The framework’s research focus means it prioritizes flexibility and experimentation over rigid API guarantees.

Verdict

Use npcpy if: You’re researching multi-agent systems and need to compare how different LLMs handle delegation (Ollama’s Qwen vs. GPT-4 vs. Gemini). You want version-controlled agent definitions via YAML files for reproducible experiments. You’re building prototypes that mix local and cloud models—maybe Ollama for development, OpenAI for production. You need quick multimodal generation (text → image → audio) without integrating multiple separate libraries. You’re teaching AI development and want students to experiment with agents without learning provider-specific APIs.

Consider alternatives if: You require deep integration with a specific provider’s features (like OpenAI’s Assistant API with persistent threads). You need a framework with extensive documentation covering every edge case and configuration option—you may need to supplement the README with source code exploration. You only need simple, single-agent LLM calls where the abstraction overhead doesn’t provide clear benefits. You’re building in environments where the default code execution capabilities don’t align with your security requirements without additional safeguards.

// QUOTABLE

While most agent frameworks lock you into a single LLM provider or force you to rewrite code when switching between local and cloud models, npcpy lets you swap from Ollama's local models to GPT-4 b...

[ Tweet This ]
// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/npc-worldwide-npcpy.svg)](https://starlog.is/api/badge-click/developer-tools/npc-worldwide-npcpy)