Back to Articles

Building Multi-Agent Systems with AWS Bedrock: Inside Omnimesh's Signal-Based Orchestration

[ View on GitHub ]

Building Multi-Agent Systems with AWS Bedrock: Inside Omnimesh’s Signal-Based Orchestration

Hook

Most multi-agent systems fail because they treat agent coordination as a prompt engineering problem. Omnimesh proves that production-grade agent orchestration requires something entirely different: a signal-based communication protocol wrapped in enterprise authentication.

Context

The explosion of agentic AI has created a new challenge: how do you coordinate multiple specialized AI agents without devolving into an unpredictable mess of LLM calls? Single-agent systems hit capability ceilings quickly—your chatbot can’t be simultaneously expert at infrastructure troubleshooting, database optimization, and service desk workflows. The naive solution is to throw everything into one massive prompt, but that creates inconsistent behavior and astronomical token costs.

AWS released Omnimesh as a reference implementation for multi-agent orchestration using their Bedrock AgentCore primitives. Built around an enterprise IT support use case, it demonstrates how to coordinate domain-specific agents (Infrastructure, Development Tools, Database, Service Desk) through a central orchestrator. The architecture addresses real production concerns that most agent frameworks ignore: JWT authentication for inbound requests, OAuth 2.0 for outbound API calls, session isolation across conversations, and deterministic routing when you already know which agent should handle a request. This isn’t a toy framework—it’s a blueprint for the authentication, memory, and coordination logic that enterprise deployments actually need.

Technical Insight

Yes

No

Semantic Match

Signal: out_of_scope

Signal: complete

Signal: more_info_needed

Maintain Context

User Request

Has Service Context?

Direct Agent Route

Orchestrator Agent

DynamoDB Agent Registry

Select Domain Agent

AgentCore Gateway

MCP Interface Layer

Infrastructure Agent

Database Agent

Dev Tools Agent

AgentCore Memory

System architecture — auto-generated

Omnimesh’s core innovation is its signal-based agent communication protocol. Instead of relying on LLMs to magically coordinate through natural language, agents emit explicit signals: out_of_scope, more_info_needed, complete, and error. This creates a state machine for handoffs. When the Infrastructure agent receives a database query, it doesn’t hallucinate an answer—it returns out_of_scope and the orchestrator routes to the Database agent. When an agent needs clarification, more_info_needed keeps the conversation with that specialist until resolution.

The routing architecture implements two distinct paths. For requests with service context metadata (like a Slack message mentioning ‘Jenkins’), deterministic routing bypasses the orchestrator’s reasoning entirely and jumps straight to the Development Tools agent. For context-free queries, the orchestrator uses semantic matching against agent descriptions stored in DynamoDB. Here’s how agent registration looks:

# Agent registry entry in DynamoDB
{
    "agent_id": "dev-tools-agent",
    "agent_name": "Development Tools Agent",
    "description": "Handles Jenkins CI/CD, GitHub issues, build failures",
    "services": ["jenkins", "github", "gitlab"],
    "mcp_endpoint": "https://api.example.com/dev-tools",
    "openapi_spec_url": "https://api.example.com/dev-tools/openapi.json"
}

The Model Context Protocol (MCP) acts as the standardized interface layer. Each domain agent exposes its capabilities as MCP tools with OpenAPI specifications. The AgentCore Gateway translates incoming MCP tool calls into Bedrock Runtime API invocations. This abstraction means you can swap agent implementations without touching the orchestrator—as long as they speak MCP and return the expected signals.

Session management leverages AgentCore Memory for persistence. The active_plugin_session concept maintains stickiness—once a user engages with the Database agent, subsequent turns in that conversation stay with that agent until it emits a complete signal. This prevents context loss from ping-ponging between agents mid-troubleshooting session:

# Session stickiness check in orchestrator
if session.get('active_plugin_session'):
    agent_id = session['active_plugin_session']
    response = invoke_agent(agent_id, user_message, session_id)
    
    if response['signal'] == 'complete':
        session.pop('active_plugin_session')
    elif response['signal'] == 'out_of_scope':
        # Hand off to orchestrator for routing
        session.pop('active_plugin_session')
        return orchestrator_route(user_message)
    
    return response

The authentication flow showcases enterprise-grade security layering. Inbound requests authenticate via AWS Cognito JWT tokens validated at the API Gateway. When agents need to call external services (like fetching Jenkins build logs), the Gateway handles OAuth 2.0 token exchange. This dual authentication pattern—JWT for user identity, OAuth for service access—is boilerplate code that every production agent system needs but most frameworks ignore.

What makes this architecture particularly valuable is the DynamoDB-based agent registry. As you add new domain agents, you register them with their service mappings and MCP endpoints. The orchestrator discovers them dynamically. For an infrastructure team managing dozens of services, this beats hardcoding agent routing logic into your orchestrator prompt. When you onboard a new monitoring tool, you register a new agent entry—no prompt surgery required.

Gotcha

The elephant in the README: this is explicitly demonstration code, not production-ready. The security model assumes you’ll add your own access controls, input validation, and rate limiting. There’s no cost management—an infinite loop in agent handoffs could rack up massive Bedrock API bills. The error handling is bare-bones; production systems need circuit breakers when agents fail repeatedly.

The AWS lock-in is real. This architecture is married to Bedrock AgentCore, Cognito, and DynamoDB. If you’re on GCP or Azure, you’re rewriting 60% of the codebase. Even within AWS, you’re committing to Bedrock’s pricing model and regional availability. The agent registry in DynamoDB requires manual maintenance of service-to-agent mappings—as your service catalog grows to hundreds of entries, that operational overhead compounds. You’ll likely need tooling to auto-generate registry entries from your service catalog.

Verdict

Use if: You’re building multi-agent systems on AWS that need enterprise authentication, session management across conversations, and sophisticated agent handoff logic. This is the reference architecture for Bedrock AgentCore—study it if you’re using those primitives at scale. The signal-based communication pattern is worth stealing even if you’re not on AWS. Skip if: You need production code today without security hardening, want cloud-agnostic solutions for multi-cloud deployments, or you’re building simple single-agent assistants where this orchestration overhead is unnecessary. This is a blueprint for senior engineers designing agent platforms, not a drop-in library for quick MVPs.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-agents/aws-samples-sample-agent-omnimesh.svg)](https://starlog.is/api/badge-click/ai-agents/aws-samples-sample-agent-omnimesh)