Back to Articles

Caldera: Building Adversary Emulation with Fact-Based Planning Engines

[ View on GitHub ]

Caldera: Building Adversary Emulation with Fact-Based Planning Engines

Hook

Most C2 frameworks execute scripted playbooks. Caldera's planning engine makes tactical decisions mid-operation based on discovered credentials and network topology—which is brilliant for purple teaming and dangerous for security.

Context

Traditional red team operations follow predetermined playbooks: enumerate users, dump credentials, move laterally to target X. This rigidity creates a testing gap—defenders optimize detections against known sequences rather than adversary decision-making patterns. MITRE built Caldera to solve a research problem: how do you systematically test defensive controls against adaptive adversaries that make context-aware decisions? The MITRE ATT&CK framework catalogs hundreds of techniques, but stringing them into realistic attack chains required either expensive red team labor or brittle automation scripts.

Caldera emerged from MITRE's need to automate adversary emulation at scale. Rather than hardcoding "run Mimikatz, then PsExec to host Y," Caldera models operations as planning problems. Abilities (atomic commands) declare what facts they need to run (preconditions) and what facts they produce (postconditions). The planner evaluates which abilities are eligible based on discovered facts, executes them via deployed agents, and chains subsequent techniques using newly discovered information. This mirrors how real adversaries operate: they don't follow scripts, they adapt to what they find. The result is a platform that prioritizes flexibility and extensibility over operational security—perfect for defensive research, problematic for actual engagements.

Technical Insight

HTTP Requests

Initiates Operations

Evaluates Preconditions

Selects & Chains

Execute Commands

Output Data

New Facts Enable

Extend

Extend

Provide API

VueJS Frontend

Magma Plugin

REST API

Core Server

Planner Engine

Decision Logic

Fact Database

YAML Storage

Abilities

ATT&CK Techniques

Deployed Agents

Sandcat/Manx

Plugin System

Git Submodules

System architecture — auto-generated

Caldera's architecture centers on three components: agents (deployed implants), abilities (atomic commands mapped to ATT&CK techniques), and planners (decision engines that chain abilities into operations). The innovation lives in how these interact through the fact database. When an ability executes, it outputs data that the fact parser transforms into structured facts. Subsequent abilities declare preconditions using regex patterns against this fact database, creating emergent behavior chains.

Here's how an ability defines fact relationships in YAML:

- id: 7c42a30c-c8c7-44c5-80a8-862d364ac1e4
  name: Find domain admin
  description: Locate domain administrators via net group
  tactic: discovery
  technique:
    attack_id: T1069.002
    name: Domain Groups
  platforms:
    windows:
      psh:
        command: |
          net group "Domain Admins" /domain
        parsers:
          plugins.stockpile.app.parsers.basic:
            - source: domain.user.name
              edge: has_admin
              target: domain.group

The parsers block tells Caldera to extract usernames and create domain.user.name facts tagged with the has_admin relationship. A lateral movement ability can then declare a precondition requiring domain.user.name facts, and it becomes eligible only after enumeration succeeds. The planner doesn't hardcode "run enumeration then lateral movement"—it evaluates all abilities against current facts every planning cycle.

The plugin architecture extends this model dramatically. Plugins are git submodules with independent data directories, API routes, and even separate frontends. The Builder plugin demonstrates this power by embedding a complete Go toolchain:

# From plugins/builder/app/builder_api.py
async def compile_agent(self, platform, headers):
    output_path = f"/tmp/agent-{platform}-{uuid.uuid4()}"
    env = os.environ.copy()
    env['GOOS'] = platform
    env['GOARCH'] = 'amd64'
    
    subprocess.run(
        ['go', 'build', '-o', output_path, '-ldflags', 
         f'-X main.server={headers["server"]}', 
         './plugins/sandcat/gocat'],
        env=env
    )
    return output_path

This compiles agents on-demand with operation-specific configuration baked in—no static payloads to signature. The Builder plugin also generates HTA files, Excel macros, and shellcode variants using templates, turning Caldera into a payload factory. But this convenience has costs: the full Docker image balloons to 3GB+ with all compiler toolchains, and you need 2GB RAM just for Go compilation.

The v5 rewrite separated the VueJS frontend (Magma plugin) from the Python backend, exposing everything via REST. Operations become API orchestration:

import requests

session = requests.Session()
session.auth = ('red', 'admin')
base_url = 'http://caldera:8888'

# Create operation from adversary profile
operation = session.post(f'{base_url}/api/v2/operations', json={
    'name': 'Automated Emulation',
    'adversary': {'adversary_id': 'de07f52d-9928-4071-9142-cb1d3bd851e8'},
    'auto_close': False,
    'planner': {'id': 'e1bad1e4-541b-4b08-aef3-e7ee4fc849f6'},  # Buckets planner
    'source': {'id': 'ed32b9c3-9593-4c33-b0db-e2007315096b'},  # Basic fact source
    'jitter': '2/8'
}).json()

# Start operation
session.patch(f"{base_url}/api/v2/operations/{operation['id']}", 
              json={'state': 'running'})

This API-first design enables headless automation, CI/CD integration, and third-party tooling. The CTID plugin (Emu) uses this to import adversary profiles from the Center for Threat-Informed Defense, and the Compass plugin visualizes operation coverage against ATT&CK Navigator heatmaps. The plugin ecosystem transforms Caldera from a C2 framework into an adversary emulation platform.

The planner logic itself is pluggable. The default "Buckets" planner organizes abilities by ATT&CK tactic, executing all eligible discovery abilities before privilege escalation. The "Atomic" planner runs Atomic Red Team tests sequentially without fact chaining. Advanced users can implement custom planners in Python that optimize for specific objectives—maximize technique coverage, prioritize stealth, or focus on specific detection gaps. This is where Caldera's research DNA shows: commercial tools don't expose adversary decision logic because operators want control, but Caldera treats planning as a first-class research problem.

Gotcha

Caldera's security posture is fundamentally incompatible with production red teaming. CVE-2025-27364 exposed a path traversal RCE in file serving that went unnoticed for months, highlighting that the codebase isn't hardened. The default configuration uses basic authentication over HTTP, session tokens aren't rotated, and the documentation explicitly warns against internet exposure. For a C2 framework used in adversarial contexts, this is disqualifying—a compromised Caldera server gives attackers access to your entire operation's facts database, agent callbacks, and target network topology.

Agent OpSec is equally problematic. Sandcat, the primary Go agent, checks in every 60 seconds by default with no jitter configuration in older versions, uses predictable HTTP endpoints (/sand/instructions), and the default build is 15MB unobfuscated. There's no sleep obfuscation, no in-memory execution options, and no domain fronting support. Manx (reverse shell agent) is even more basic—it's literally a bash/PowerShell one-liner with no evasion primitives. The contact methods (HTTP, TCP, DNS) implement basic protocols without mimicry or traffic shaping. Modern EDR solutions catch these agents trivially.

Performance limitations emerge at scale. The fact database is an in-memory Python dictionary with no indexing—operations with thousands of facts slow the planner to a crawl. Ability execution is serial per agent by default, so an agent with a slow-running enumeration command blocks all other abilities until it completes. The planning loop recalculates all ability eligibility on every cycle by regex-matching every fact against every ability's preconditions, which is O(n*m) complexity. I've seen 100+ ability operations on 20 agents take 6+ hours when they should complete in 30 minutes with parallel execution.

The Docker deployment story is frustrating. Builder plugin explicitly fails in containers because it expects writable filesystem paths outside /tmp. Volume persistence requires manually matching the internal /usr/src/app/data structure or your abilities/adversaries vanish on restart. The container doesn't handle SIGTERM gracefully—killing it mid-operation leaves abilities running on target hosts with no way to clean up. For a platform designed for automation, the operational ergonomics are surprisingly poor.

Verdict

Use Caldera if you're building detection engineering pipelines that need systematic ATT&CK coverage testing, conducting security research that requires customizable adversary logic, training SOC analysts on attack chains with visual ATT&CK mapping, or running tabletop exercises where automated emulation demonstrates defensive gaps. The fact-based planning engine and plugin ecosystem provide unmatched flexibility for modeling adversary decision-making patterns, and the integration with Atomic Red Team, CTID adversary profiles, and ATT&CK Navigator creates a comprehensive purple team environment. It's particularly valuable in academic settings teaching threat intelligence concepts or defensive organizations that need repeatable, documented emulation runs for compliance.

Skip Caldera if you need production-grade C2 infrastructure for real engagements—the security vulnerabilities and weak agent OpSec make it a liability on mature networks. Skip it for red team operations requiring stealth, evasion, or advanced post-exploitation (choose Mythic or Sliver instead). Skip it if you're operating against aggressive EDR deployments where agent detection ends your operation immediately. Skip it for enterprise-scale testing with 100+ agents or time-sensitive engagements where performance matters. Also skip it if you want turnkey operation without ATT&CK framework expertise—Caldera punishes users who don't understand the planner's precondition logic, and you'll spend days reading YAML ability definitions before productive use. The learning curve is steep, and the operational payoff only materializes if you're optimizing for defensive research rather than offensive execution.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/asf-transfer-caldera.svg)](https://starlog.is/api/badge-click/cybersecurity/asf-transfer-caldera)