RedAI: When Your Security Scanner Needs to Actually Prove the Exploit Works
Hook
Most security scanners tell you there might be a vulnerability. RedAI spawns a Chrome instance, exploits it, takes screenshots, and hands you a working proof-of-concept script.
Context
Security teams drown in false positives. Run Semgrep against a typical web application and you'll get hundreds of findings—SQL injection here, XSS there, path traversal everywhere. The cruel reality? Maybe 10% are actually exploitable in your specific context. The other 90% require a human to manually verify: spin up the dev environment, authenticate, navigate to the vulnerable endpoint, craft the payload, observe the result, document the evidence. This manual triage is why security findings sit in JIRA for months.
RedAI attacks this problem by splitting the workflow into two distinct AI agents: scanner agents that perform static analysis to identify candidate vulnerabilities, and validator agents that actually exploit those findings in live environments. The validator agents aren't running in some sandboxed analyzer—they have full browser automation via Chrome DevTools Protocol, iOS Simulator control via xcrun simctl, or whatever custom tooling you expose through the plugin system. They write exploit scripts, manipulate UI elements, intercept network traffic, and collect artifacts exactly like a human penetration tester would. The output isn't a report saying 'this might be vulnerable'—it's a timestamped PoC script with screenshots and HTTP logs proving exploitation worked.
Technical Insight
The architecture implements a nine-stage pipeline that checkpoints state to ~/.redai/ between phases. It starts with threat modeling to identify high-risk areas, prioritizes files based on that model (saving LLM context on irrelevant code), breaks prioritized files into analysis units, scans those units for vulnerabilities, aggregates findings, generates validation plans for each finding, executes those plans in isolated environments, and produces evidence-backed reports.
What makes this interesting is the validator plugin interface. Validators aren't constrained to safe API calls—they get arbitrary tool access. Here's how the iOS Simulator validator exposes capabilities:
// From validators/ios-simulator/index.ts
export const tools = [
{
name: 'run_shell_command',
description: 'Execute shell commands on the host macOS system',
parameters: {
command: { type: 'string', description: 'Shell command to execute' }
}
},
{
name: 'interact_with_ui',
description: 'Tap, swipe, or type in the iOS Simulator',
parameters: {
action: { type: 'string', enum: ['tap', 'swipe', 'type'] },
coordinates: { type: 'object', properties: { x: 'number', y: 'number' } },
text: { type: 'string' }
}
},
{
name: 'collect_evidence',
description: 'Capture screenshot or extract app data',
parameters: {
type: { type: 'string', enum: ['screenshot', 'app_container', 'logs'] }
}
}
]
Validator agents get those tools injected into their LLM context. They reason about how to exploit a finding, then call tools to execute the attack. The validator doesn't know it's calling TypeScript functions—from its perspective, it's asking for capabilities and receiving results. This abstraction lets you swap environments without retraining: the same validator logic that exploits XSS in Chrome can exploit it in the iOS Simulator by calling different underlying implementations of the same tool interface.
The environment lifecycle is critical. Before any validators run, the environment plugin's setup() function executes once to establish authenticated state. For web targets, that means launching Chrome via CDP, navigating to the login page, filling credentials, waiting for the dashboard. For iOS, it clones a template simulator (so validators can trash state without corrupting the base image), boots it, launches the app, and navigates to the initial screen. That environment then stays alive across multiple validator executions—this is cheaper and faster than spinning up fresh environments per finding.
// Environment plugin interface
interface EnvironmentPlugin {
setup(): Promise<void>; // One-time authenticated state prep
teardown(): Promise<void>; // Cleanup after all validators finish
tools: ToolDefinition[]; // Capabilities exposed to validators
collectEvidence(): Promise<Evidence[]>; // Gather artifacts post-exploitation
}
The validation plan generation is a separate phase from execution, which is smarter than it sounds. After scanners identify findings, RedAI generates a validation plan for each one—essentially a strategy document explaining how a validator would prove exploitation. You can review these plans before execution, modify them, or skip expensive validations. This matters because validators burn LLM API credits and environment runtime. If the plan says 'I'll attempt SQL injection by sending 500 different payloads,' you can intervene before wasting $20 in API calls.
Evidence collection happens automatically during and after exploitation. Every tool call gets logged with timestamps. Network traffic flows through an interception proxy. Screenshots capture UI state. The iOS validator even extracts the app's container directory to grab SQLite databases and plist files. All of this gets serialized per finding:
interface Finding {
id: string;
vulnerability: string;
location: { file: string; line: number };
scanner_confidence: number;
validation_status: 'proved' | 'disproved' | 'error';
evidence: {
transcript: Message[]; // Full agent conversation
artifacts: {
screenshots: string[]; // Timestamped PNGs
http_logs: RequestResponse[];
poc_script: string; // Executable reproduction script
container_data?: string; // iOS app data extraction
}
}
}
This evidence structure is what separates RedAI from traditional scanners. You're not getting a severity rating and a CWE number—you're getting a complete exploitation narrative with proof. When the security team presents findings to engineering, they can show the actual screenshot of admin panel access via SQL injection, not just claim it's theoretically possible.
Gotcha
The biggest risk is the complete lack of validator sandboxing. Validators have full shell access and network capabilities of the host system. If an LLM hallucinates a command or gets prompt-injected by malicious code in the repository being scanned, it can exfiltrate your source code, pivot to internal networks, or corrupt the environment. The threat model assumes you're running authorized penetration tests against your own applications—if you point this at untrusted code, you're giving an AI unrestricted access to your workstation.
Validator 'disproved' verdicts are dangerously misleading. When a validator fails to exploit a finding, RedAI marks it as disproved. But that doesn't mean the vulnerability is invalid—it just means this particular AI agent, with this particular prompt, using these particular tools, couldn't figure out exploitation. Maybe the payload needed encoding the agent didn't try. Maybe the race condition requires precise timing. Maybe the vulnerability is real but the validator got unlucky. Consumers of RedAI reports need to understand that 'disproved' means 'automation failed,' not 'false positive confirmed.' You still need human review of disproved findings, which undermines the triage automation value proposition.
Environment setup friction scales badly. Before every scan, you manually prepare authenticated state—log in, navigate to the initial page, ensure the app is in the right state. There's no session recording and replay, no headless authentication flows. If you're testing 50 microservices, you're manually authenticating 50 times. And if your authentication involves 2FA or CAPTCHAs, you're stuck babysitting the setup phase instead of letting the pipeline run unattended.
Verdict
Use if: You're doing authorized red team assessments of web or mobile applications where you control the target environment and need validated exploits with evidence, not just vulnerability reports. The iOS Simulator validator alone justifies adoption for mobile security teams—most tools can't manipulate running binaries or collect runtime artifacts like RedAI does. You have budget for LLM API costs (each validation burns credits driving browser/simulator automation) and accept that validators might hallucinate or fail silently. Skip if: You need CI/CD integration (too slow, no sandboxing makes it dangerous in automated pipelines), you're doing bug bounty hunting (running exploits without explicit authorization is legally risky), you require audit trails for compliance (agent reasoning is opaque and findings may be hallucinated), or you're scanning untrusted code (validators have unrestricted host access). For those cases, stick with deterministic SAST tools like Semgrep and accept the manual triage burden.