MITRE Caldera: Building Automated Adversary Emulation on ATT&CK
Hook
Most security teams know the ATT&CK framework by heart but struggle to actually execute those techniques at scale. MITRE Caldera transforms that static knowledge base into a living, breathing adversary emulation engine.
Context
The cybersecurity industry has long faced a fundamental gap: we understand adversary tactics through frameworks like MITRE ATT&CK, but translating that knowledge into realistic security testing requires expensive commercial tools or fragmented open-source scripts. Red teams manually chain exploits together. Blue teams struggle to validate their defenses against real attack patterns. Purple team exercises require extensive coordination and custom tooling.
MITRE Caldera emerged from this friction as an active research project at MITRE Corporation—the same organization behind the ATT&CK framework. Rather than building yet another penetration testing toolkit, Caldera creates a platform where ATT&CK techniques become modular, executable operations. It bridges three historically separate workflows: manual red team operations, automated adversary emulation, and incident response validation. The result is a system that can run both human-guided attacks and fully autonomous operations against deployed agents, all mapped directly to ATT&CK’s taxonomy.
Technical Insight
Caldera’s architecture revolves around a two-component design: a core asynchronous Python server providing REST API and web interface, extended by specialized plugins that add discrete capabilities. This isn’t plugin architecture as an afterthought—it’s the fundamental design pattern that makes Caldera scalable.
The core system handles command-and-control (C2) operations, operation planning, and the plugin framework itself. Everything else—agents, TTPs, reporting, even the documentation—lives in separate plugin repositories. Want cross-platform agents? Install Sandcat. Need a library of ATT&CK techniques? Add Stockpile. Building custom payloads? Bring in Builder for dynamic GoLang compilation. This separation means you’re not loading capabilities you don’t need, and extending Caldera doesn’t require forking the core.
The installation process reveals this modular thinking. Cloning recursively pulls all available plugins:
git clone https://github.com/mitre/caldera.git --recursive
cd caldera
pip3 install -r requirements.txt
python3 server.py --insecure --build
That --build flag is significant—it handles VueJS UI dependency installation and bundling for the new v5 Magma UI plugin. You only rebuild when adding plugins or modifying the interface, keeping iteration cycles fast during operation development.
Operations in Caldera appear to work as directed graphs of ATT&CK techniques based on the platform’s design. The platform doesn’t just execute commands—it appears to understand fact collection, decision points, and technique dependencies based on its description as an automated adversary emulation platform. An agent compromising a host might collect credentials, which become facts that unlock lateral movement techniques against newly discovered targets. This fact-based planning appears to allow autonomous operations to adapt based on what they discover, rather than blindly executing predefined scripts.
The plugin ecosystem demonstrates serious extensibility. The Atomic plugin integrates Atomic Red Team’s test library. Response flips the platform into blue team mode for incident response procedures. Training provides capture-the-flag exercises for learning the platform. GameBoard visualizes joint red-blue operations. There’s even Caldera for OT extending capabilities into industrial control systems. The Skeleton plugin generator scaffolds new plugins, complete with boilerplate for agents, data services, and API endpoints.
Agent diversity matters for realistic emulation. Sandcat serves as the default cross-platform agent, while Manx provides reverse shell capabilities. The Builder plugin dynamically compiles GoLang payloads, allowing custom agent modifications without pre-building binaries for every platform. This runtime compilation paired with multiple C2 protocols creates a flexible C2 infrastructure.
The REST API exposes programmatic control over operations, facts, agents, and adversary profiles. This enables integration with external systems or custom automation workflows—treating Caldera as an adversary emulation service rather than just a standalone tool.
Gotcha
Caldera’s power comes with complexity. This isn’t a point-and-click tool you’ll master in an afternoon. The plugin ecosystem, while modular, means essential capabilities are scattered across multiple repositories. Understanding which plugins you need, how they interact, and how to configure them requires significant investment. The official documentation helps, but expect to spend time in the Training plugin’s exercises before running meaningful operations.
Resource requirements are substantial. The documentation recommends 8GB+ RAM and 2+ CPUs, and that’s before considering GoLang compilation overhead from Builder or the NodeJS requirements (v16+ recommended) for the VueJS UI. Running Caldera on underpowered infrastructure can lead to performance issues. Docker images exist but the README notes they may be slightly outdated, recommending local builds instead. Data persistence requires careful volume configuration since the default Docker setup is ephemeral—lose your container, lose your operation history.
The platform assumes you already understand adversary emulation concepts, the ATT&CK framework, and have a test environment ready. If you’re starting from zero, the learning curve stacks: ATT&CK knowledge, Caldera’s operation planning model, plugin configuration, agent deployment, and operation monitoring. This makes Caldera poorly suited for one-off security tests or teams without dedicated red team capabilities.
Verdict
Use Caldera if you’re building a mature security testing program that needs ATT&CK-mapped adversary emulation, have the infrastructure to properly host it (8GB+ RAM, proper persistence), and can invest time in learning its operation model. It excels for organizations running regular purple team exercises, validating EDR deployments against realistic attack chains, or developing security training programs. The plugin architecture means it grows with your needs from basic technique testing to sophisticated autonomous operations. Skip it if you need lightweight, one-off security testing tools, lack the hardware to run it properly, want commercial-grade support and stability, or prefer simpler atomic test execution without C2 infrastructure. For quick ATT&CK technique validation, Atomic Red Team’s standalone scripts are faster to deploy. Caldera sits in a unique position: more sophisticated than script libraries, more accessible than commercial platforms, but demanding enough that only teams serious about adversary emulation will extract its full value.