Back to Articles

OpenRange: Teaching AI Hackers to Fight on Procedurally-Generated Battlefields

[ View on GitHub ]

OpenRange: Teaching AI Hackers to Fight on Procedurally-Generated Battlefields

Hook

Every time a reinforcement learning agent masters a cybersecurity environment, it becomes useless—the agent has memorized exploit paths, not learned to think. OpenRange solves this by generating completely new enterprise networks on every training reset.

Context

Reinforcement learning has conquered games from Chess to StarCraft, but cybersecurity presents a unique challenge: environment stagnation. Train a RL agent on a static network topology and it memorizes the path from DMZ to domain controller without learning generalizable penetration testing skills. The agent becomes a sophisticated recording, not an intelligent adversary.

Traditional cyber ranges solve the realism problem but not the variation problem. Hand-crafted CTF challenges offer production-grade vulnerabilities but require weeks of expert time to design. Existing RL environments like CyberBattleSim use abstract graph representations that train quickly but don’t translate to real networks. OpenRange bridges this gap by treating network generation as a code generation problem: an LLM reads a high-level manifest describing organizational structure (“mid-size healthcare company with legacy systems”) and outputs complete Kubernetes specifications with realistic services, exploit chains, and validation tests. Every episode reset produces a novel battlefield.

Technical Insight

Generation

Infrastructure

Zones, Services,

Vulnerabilities

Attack Chains,

Resource Defs

K8s Resources

Deploy Pods,

NetworkPolicies

Verify

Exploitability

Ready Episode

Red/Blue Actions

YAML Manifest

Scenario Definition

LLM Builder

GPT-5.4

Network Specification

Exploits + Services

Kind Renderer

K8s Deployment

Kind Cluster

Namespaced Zones

Validator Gate

12-Check Suite

Gymnasium Interface

RL Training

System architecture — auto-generated

OpenRange’s architecture centers on four decoupled components that transform abstract scenarios into validated training grounds. The Builder consumes YAML manifests and produces Kubernetes resource definitions through LLM prompting. Here’s a simplified manifest:

scenario:
  organization: "Regional Hospital Network"
  complexity: medium
  zones:
    - name: dmz
      services: ["web", "email"]
      vulnerabilities: ["sql_injection", "weak_credentials"]
    - name: internal
      services: ["ldap", "file_share", "database"]
      data_sensitivity: high
  attack_path_depth: 3-5
  background_traffic: realistic

The Builder sends this to an LLM (currently GPT-5.4) with a specialized system prompt that generates not just service definitions but complete attack narratives. The LLM outputs a structured specification including vulnerable PHP applications with embedded SQLi, LDAP servers with predictable service accounts, and multi-hop pivot chains. Critically, it also generates the expected exploit sequence and success criteria.

The KindRenderer translates these specifications into running infrastructure. Each zone becomes a Kubernetes namespace with NetworkPolicies enforcing segmentation. Instead of simulating services, OpenRange deploys real containers: actual MySQL databases with seeded credentials, genuine Apache servers running vulnerable PHP, authentic LDAP directories. Here’s a generated NetworkPolicy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: dmz-isolation
  namespace: range-ep-1337-dmz
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          zone: external
    ports:
    - protocol: TCP
      port: 80
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          zone: internal
    ports:
    - protocol: TCP
      port: 389  # LDAP for auth

The ValidatorGate runs 12 mechanical checks before an episode begins. It’s not enough for the LLM to claim a SQL injection exists—the validator actually executes the exploit via kubectl exec, verifies privilege escalation, confirms lateral movement paths, and validates exfiltration channels. If any check fails, the environment is scrapped and regenerated. This prevents the RL catastrophe of training on impossible scenarios.

The most innovative component is the Gymnasium interface with coupled rewards. Red and Blue agents train simultaneously in the same environment, but their reward functions are interdependent. Red receives points for accessing sensitive data, but loses points proportional to Blue’s detection confidence. Blue gains rewards for accurate alerts but is penalized for false positives that would fatigue human analysts. This creates an adversarial co-evolution dynamic:

# Simplified reward calculation
red_reward = (
    data_exfiltrated_value * stealth_multiplier
    - detection_score * blue_confidence
)

blue_reward = (
    true_positive_rate * alert_precision
    - false_positive_count * analyst_fatigue_weight
    - (missed_exfiltration / total_sensitive_data)
)

As Blue improves detection, Red must discover stealthier techniques. As Red evolves evasion, Blue must refine detection heuristics. Neither can plateau without the other surpassing it.

Background traffic generation deserves attention. OpenRange deploys NPC agents that simulate legitimate enterprise activity: database queries from business intelligence tools, web requests from employee workstations, email traffic, authentication events. These NPCs use templated behaviors but randomized parameters, creating realistic noise that Blue agents must filter. A Blue agent that simply alerts on every database query will drown in false positives; it must learn contextual anomaly detection.

Gotcha

The experimental badge isn’t decorative—OpenRange sits at the intersection of three unstable technologies: LLM code generation, Kubernetes orchestration, and multi-agent RL. LLM hallucinations occasionally produce exploits that look valid syntactically but fail semantically (LDAP queries with correct structure but impossible bind logic). The validator catches most issues, but at the cost of regeneration cycles that can take 5-10 minutes when the LLM produces duds repeatedly.

Infrastructure requirements are non-trivial. Each training episode spins up 15-30 pods across multiple namespaces with real services consuming actual resources. Expect to provision 8+ CPU cores and 32GB RAM minimum for a single parallel environment. Running the 64 parallel environments needed for efficient RL training demands a legitimate Kubernetes cluster, not a laptop. LLM API costs accumulate quickly—generating complex enterprise scenarios consumes 50-100k tokens per environment, and you’ll regenerate frequently during early research iterations. Budget $500-1000/month in API costs for active development.

The containerization constraint is architectural. OpenRange can’t simulate kernel exploits, firmware attacks, or vulnerabilities requiring bare metal access. It excels at application-layer attacks (SQLi, XXE, deserialization) and network pivoting, but privilege escalation stops at container root. For research into novel exploit development or kernel-level defenses, you’ll need complementary tools.

Verdict

Use OpenRange if you’re researching adversarial reinforcement learning for cybersecurity and need training environments that mutate to prevent agent overfitting. It’s purpose-built for multi-agent red/blue co-evolution experiments where environmental diversity matters more than pixel-perfect realism. The LLM-generated scenarios provide sufficient variety to force generalization while the validator ensures quality. Skip if you need production-ready penetration testing tools (this is research infrastructure with research-grade stability), are building educational CTF platforms (hand-crafted challenges offer better pedagogical control), lack Kubernetes expertise and infrastructure (the operational overhead is substantial), or require deterministic security assessments where reproducibility trumps variation. OpenRange is a bet that the future of autonomous cyber agents requires them to train on ever-changing battlefields, not memorize static maps.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/open-cybernauts-open-range.svg)](https://starlog.is/api/badge-click/developer-tools/open-cybernauts-open-range)