Back to Articles

Hero: When AI Becomes the Penetration Tester

[ View on GitHub ]

Hero: When AI Becomes the Penetration Tester

Hook

What happens when you give an AI the ability to write its own exploit code and execute it without human oversight? Hero answers that question—and the result is both fascinating and terrifying.

Context

Traditional penetration testing tools follow deterministic patterns: they scan for known vulnerability signatures, fuzz inputs with predefined payloads, or execute user-configured test sequences. This approach works well for catalog vulnerabilities but struggles with novel attack vectors or context-specific logic flaws that require reasoning about application behavior.

Hero, created by jthack in 2023, represents an experimental leap into autonomous security testing powered by large language models. Instead of matching signatures or running scripted tests, Hero uses GPT to hypothesize about vulnerabilities, generate custom payloads, execute modified HTTP requests, and validate results—all in an iterative feedback loop. It's a proof-of-concept that asks: can AI reason about security flaws the way a human pentester does? The answer reveals both the promise of AI-driven security and the catastrophic risks of building autonomous agents without proper safeguards.

Technical Insight

Hero's architecture implements a five-stage loop that mirrors human penetration testing methodology. First, it ingests a raw HTTP request (typically captured from Burp Suite or similar proxy tools). Second, it prompts the LLM with context about the request and asks it to generate vulnerability hypotheses—SQL injection, XSS, authentication bypass, or whatever the model deems plausible based on the endpoint structure and parameters.

The third stage is where things get interesting and dangerous. Hero asks the LLM to generate Python code that modifies the original request to test each hypothesis. This isn't template-based payload insertion; it's dynamic code generation. The LLM might write code to inject SQL characters into a parameter, encode payloads differently, or manipulate headers based on its understanding of the vulnerability class. Here's a simplified version of what happens:

# Hero's core loop (simplified)
import openai
import requests

def test_vulnerability(raw_request):
    # Stage 1: Parse the request
    base_request = parse_http_request(raw_request)
    
    # Stage 2: Generate hypotheses
    prompt = f"""Given this HTTP request:
    {raw_request}
    
    Generate 3 potential vulnerability hypotheses and 
    Python code to test each one by modifying the request."""
    
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
    
    # Stage 3 & 4: Execute AI-generated code (THE DANGER ZONE)
    generated_code = extract_code_from_response(response)
    exec(generated_code)  # No sandboxing, no validation
    
    # Stage 5: Validate results with another LLM call
    validation_prompt = f"""Did this response indicate a vulnerability?
    Original: {base_request}
    Modified: {modified_request}
    Response: {test_response}"""
    
    verdict = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": validation_prompt}]
    )
    
    return parse_verdict(verdict)

The fourth stage executes these modifications and sends the requests to the target. The fifth stage feeds the responses back to the LLM for validation—did the response indicate a successful exploit? Should we dig deeper? The model analyzes HTTP status codes, response bodies, timing differences, and error messages to determine if the vulnerability exists.

This multi-stage prompting strategy is architecturally significant. Rather than asking the LLM to do everything in one shot, Hero chains together specialized prompts: ideation, code generation, and validation. This mirrors the ReAct (Reasoning + Acting) pattern that became popular in AI agent frameworks like LangChain and AutoGPT. Each stage has a focused responsibility, and the outputs flow through a pipeline.

The use of exec() for dynamic code execution is what makes Hero both powerful and catastrophic. The LLM can generate arbitrary Python code that gets executed in Hero's runtime environment. If you prompt-inject malicious instructions into the target application's responses, the LLM might generate code that exfiltrates data, establishes reverse shells, or compromises the testing machine. The author acknowledges this explicitly in the README: this is a proof-of-concept demonstrating autonomous AI security testing, not a production tool.

What makes Hero educationally valuable is how transparently it exposes the architecture of AI agents. You can see the prompt engineering strategies, the feedback loops, the validation patterns. It's a working implementation of concepts that security researchers were theorizing about in 2023. The code is readable, the approach is straightforward, and the dangers are obvious—making it an excellent case study for understanding both the potential and the perils of autonomous AI systems.

Gotcha

The exec() vulnerability isn't just a minor oversight—it's an existential flaw that makes Hero unsuitable for any real-world use. If you test an application that reflects user input in responses (even in error messages), and that reflection includes crafted instructions, the LLM might interpret those instructions as part of the testing strategy and generate malicious code. An attacker who knows you're using Hero could embed prompt injections in their application specifically to compromise your testing infrastructure.

Beyond the security flaw, Hero has no scope controls, rate limiting, or safeguards. It will test whatever target you point it at, potentially generating hundreds of requests in rapid succession. There's no mechanism to prevent it from testing production systems, no way to limit which endpoints it probes, and no built-in compliance with responsible disclosure practices. The token costs can also spiral quickly—each iteration requires multiple GPT-4 API calls, and complex applications could rack up significant OpenAI bills before finding anything useful. The tool assumes you have deep knowledge of both penetration testing and AI behavior to use it responsibly, which is a dangerous assumption for a publicly available repository.

Verdict

Use if: You're a security researcher studying AI agent architecture and want to understand how LLM-driven autonomous systems make decisions, chain prompts, and implement feedback loops. Hero is valuable educational reference material for learning about multi-stage AI workflows and the challenges of building agents that take real-world actions. It's also worth examining if you're building your own AI security tools and want to see what NOT to do from a safety perspective. Skip if: You need actual penetration testing capabilities, lack the expertise to safely contain this tool's dangerous exec() behavior, or expect a production-ready solution. For real security work, stick with Burp Suite, ZAP, or Nuclei. If you want to experiment with AI-assisted security testing, build your own implementation using LangChain with proper Docker sandboxing and input validation rather than running this code directly.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/jthack-hero.svg)](https://starlog.is/api/badge-click/cybersecurity/jthack-hero)