Metta: Building an Adversarial Simulation Framework That Actually Tests Your Defenses

Hook

Most security teams think they have comprehensive detection coverage until a real attacker finds a blind spot. Metta systematically exposes those gaps before adversaries do.

Context

Traditional security testing often falls into two camps: penetration tests that happen once or twice a year with expensive consultants, or ad-hoc red team exercises that lack reproducibility and systematic coverage. Meanwhile, security instrumentation continues to evolve—new EDR rules, updated SIEM queries, modified alerting thresholds—but teams struggle to validate whether these changes actually improve detection or inadvertently create blind spots.

Uber's security team built Metta to solve this continuous validation problem. They needed a way to repeatedly execute adversarial techniques in a controlled manner, mapping each action to the MITRE ATT&CK framework to ensure comprehensive coverage. The goal wasn't to replace human red teamers but to provide an automated foundation for purple team exercises, allowing defenders to quickly iterate on detection logic and verify that instrumentation actually fires when it should. By open-sourcing Metta, Uber shared a battle-tested approach to adversarial simulation that other organizations could adapt to their own detection engineering workflows.

Technical Insight

System architecture — auto-generated

At its core, Metta orchestrates adversarial simulations by parsing YAML scenario files that define sequences of actions organized by MITRE ATT&CK phases. Each action specifies commands to execute, the target operating system, and cleanup procedures. Here's a simplified example of what a Metta scenario looks like:

phases:
  - name: initial_access
    actions:
      - name: spearphishing_attachment
        os: windows
        command: 'powershell -enc <base64_payload>'
        cleanup: 'del C:\\temp\\payload.exe'
  - name: execution
    actions:
      - name: command_line_interface
        os: windows
        command: 'cmd.exe /c whoami && net user'
        cleanup: ''

The architecture separates concerns elegantly: Redis serves as the message broker, Celery workers poll for tasks and dispatch commands, and Vagrant manages the isolated VirtualBox VMs where commands actually execute. This distributed approach allows Metta to scale across multiple target environments simultaneously while maintaining sequential execution within each attack chain—critical because many adversarial techniques depend on artifacts created by previous steps.

The execution flow works like this: Metta's CLI parses your scenario YAML, validates the structure, and queues tasks to Celery. Workers assigned to specific OS types (Windows or Linux) pick up tasks and use Vagrant's command execution capabilities to run the adversarial commands inside the appropriate VM. The worker captures output and stores it in a results directory, allowing security teams to correlate attack actions with detection events in their SIEM or EDR platform.

One particularly clever design decision is how Metta handles VM state. Rather than creating fresh VMs for each test run (which would be slow), Metta uses Vagrant snapshots to quickly reset VMs to a known-good baseline state between scenarios. This dramatically reduces testing cycle time while ensuring each scenario runs in a clean environment. The typical workflow looks like:

# Simplified conceptual example of Metta's task execution
from celery import Celery
import vagrant

app = Celery('metta', broker='redis://localhost:6379')

@app.task
def execute_action(action_name, command, vm_name, cleanup_command):
    v = vagrant.Vagrant()
    
    # Execute the adversarial command
    result = v.ssh(vm_name=vm_name, command=command)
    
    # Log results for correlation with security telemetry
    log_result(action_name, result.stdout, result.stderr)
    
    # Cleanup artifacts
    if cleanup_command:
        v.ssh(vm_name=vm_name, command=cleanup_command)
    
    return result

The YAML-driven approach provides several advantages for security teams. Scenarios become version-controlled documentation of adversarial techniques, making it easy to share test cases across the organization or incrementally expand coverage. Teams can fork Uber's example scenarios and adapt them to their specific environment—changing commands to match actual attacker tools they've observed or focusing on techniques most relevant to their threat model.

Metta's integration with the MITRE ATT&CK framework is more than cosmetic labeling. By organizing actions into attack phases (initial access, execution, persistence, privilege escalation, etc.), security teams can systematically verify detection coverage across the entire kill chain. You can run scenarios that test a single technique in isolation or complex multi-stage attacks that chain techniques together, revealing gaps in detection correlation logic that might miss an attack when viewed as disconnected events.

Gotcha

The infrastructure requirements are Metta's biggest practical limitation. You need Redis running, Celery workers configured and monitored, Vagrant installed with VirtualBox (or another provider), and base VM images prepared with your security instrumentation. This isn't a download-and-run tool; expect to invest several days in initial setup and ongoing maintenance overhead. For smaller security teams or those without dedicated infrastructure, this operational burden may outweigh the benefits.

Command escaping and execution reliability can be frustrating. Since Metta ultimately passes commands through multiple layers (Python → Celery → Vagrant → SSH → target shell), getting quoting right for complex commands requires careful testing. Commands that work perfectly when typed directly into a shell may fail when executed through Metta's pipeline due to shell metacharacter interpretation. The YAML syntax also requires escaping backslashes and quotes, which becomes tedious for Windows commands that heavily use both. Additionally, Metta executes commands without interactive feedback, so techniques requiring user interaction or CAPTCHA-style challenges won't work. Network-based attack simulation is also limited—while you can execute network tools inside VMs, testing lateral movement or multi-host attacks requires manual Vagrant networking configuration that Metta doesn't abstract away.

Verdict

Use Metta if you're a mid-to-large security organization with dedicated infrastructure resources and need systematic, repeatable adversarial simulation mapped to MITRE ATT&CK. It excels in purple team workflows where detection engineers iterate on instrumentation and need quick validation that changes improve coverage. The framework shines when you have multiple detection technologies to test (EDR, SIEM, custom agents) and need to correlate their behavior across complex attack chains. Skip Metta if you lack the infrastructure chops or cycles to maintain Redis/Celery/Vagrant stacks, need immediate out-of-the-box adversary emulation without setup investment, or primarily test network-based detections rather than host-based instrumentation. Teams wanting simpler adversarial testing should look at Atomic Red Team first—you can always graduate to Metta's automation when manual execution becomes a bottleneck. Also skip if you need autonomous adversary emulation with planning capabilities; Metta executes predefined scripts, not adaptive attacks.

Metta: Building an Adversarial Simulation Framework That Actually Tests Your Defenses

Metta: Building an Adversarial Simulation Framework That Actually Tests Your Defenses

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Metta: Building an Adversarial Simulation Framework That Actually Tests Your Defenses

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Caldera: When Your Red Team Needs a Planning Algorithm, Not Just Another C2

Caldera: Building Adversary Emulation with Fact-Based Planning Engines

Inside Mathias Bynens' Dotfiles: The Blueprint for 30,000 macOS Developer Environments

Glow: Why Rendering Markdown in the Terminal Shouldn't Require a Browser

Caldera: When Your Red Team Needs a Planning Algorithm, Not Just Another C2

Caldera: Building Adversary Emulation with Fact-Based Planning Engines

Inside Mathias Bynens' Dotfiles: The Blueprint for 30,000 macOS Developer Environments

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]