Back to Articles

Red Teaming Your LLM: Inside Agentic Security's Vulnerability Scanner

[ View on GitHub ]

Red Teaming Your LLM: Inside Agentic Security’s Vulnerability Scanner

Hook

Your carefully crafted LLM guardrails might survive basic prompt injection tests, but can they withstand 416 adversarial jailbreaks from the aya-23-8B dataset, multimodal attacks, or reinforcement learning-based probes?

Context

As organizations rush to deploy LLM-powered agents and chatbots, a critical gap has emerged: traditional application security testing doesn’t translate to AI systems. You can’t just run OWASP ZAP against a chatbot and call it secure. LLMs are vulnerable to entirely new attack classes—prompt injections that exfiltrate training data, multi-step jailbreaks that bypass safety filters, and adversarial inputs that cause hallucinations or policy violations.

Agentic Security emerged to fill this void. It’s a specialized vulnerability scanner built for the AI era, treating LLM security testing as a first-class concern rather than an afterthought. With over 1,800 GitHub stars, it’s become a go-to tool for developers who need to answer one critical question before deployment: will my AI system hold up against adversarial users?

Technical Insight

Core Scanner

Attack Datasets

Configure scan

Load datasets

Attack vectors

Custom prompts

Curated prompts

Inject <>

POST with attack vector

Response

Detect violations

Display results

Web UI Dashboard

TOML Config Loader

Dataset Aggregator

Hugging Face Datasets

Local CSV Files

Scan Engine

HTTP Request Builder

LLM API Endpoint

Response Analyzer

Report Generator

System architecture — auto-generated

At its core, Agentic Security uses a deceptively simple architecture: it’s essentially a fuzzer for LLMs that speaks HTTP. You define your LLM endpoint using a plain-text HTTP specification, and the tool bombards it with attack vectors from curated datasets, analyzing responses for safety violations.

The HTTP spec format is refreshingly straightforward. Instead of forcing you into a proprietary SDK or configuration schema, you write literal HTTP requests:

POST https://api.openai.com/v1/chat/completions
Authorization: Bearer sk-xxxxxxxxx
Content-Type: application/json

{
     "model": "gpt-3.5-turbo",
     "messages": [{"role": "user", "content": "<<PROMPT>>"}],
     "temperature": 0.7
}

The <<PROMPT>> placeholder gets replaced with attack vectors during scanning. This design choice makes the tool remarkably flexible—it works with OpenAI, Anthropic, local models, custom endpoints, or any API that accepts JSON over HTTP. No adapter code required.

The real power comes from its dataset aggregation system. Agentic Security pulls attack vectors from multiple sources: Hugging Face datasets (like simonycl/aya-23-8B_advbench_jailbreak with 416 jailbreak attempts), local CSV files you provide, and dynamic backend modules. You can mix and match datasets to build comprehensive test suites:

[modules.aya-23-8B_advbench_jailbreak]
dataset_name = "simonycl/aya-23-8B_advbench_jailbreak"

[modules.AgenticBackend]
dataset_name = "AgenticBackend"
[modules.AgenticBackend.opts]
port = 8718
modules = ["encoding"]

The TOML configuration format enables CI/CD integration—you can version control your security policies, set failure thresholds, and gate deployments on scan results. The max_th = 0.3 parameter means if more than 30% of attack vectors succeed, the scan fails. You can tune this based on your risk tolerance and model maturity.

What sets Agentic Security apart from simple prompt injection testing is its support for sophisticated attack patterns. Multi-step jailbreaks simulate conversations where an attacker gradually manipulates context across multiple turns. Multimodal attacks test vulnerabilities in image and audio processing, not just text. The RL-based attack mode uses reinforcement learning to adaptively probe defenses, evolving attack strategies based on model responses.

The scanning workflow appears to run as a Uvicorn server (default port 8718, configurable via —port), providing both a web UI for interactive testing and a CLI for automation. Starting it requires:

pip install agentic_security
agentic_security
# or with custom configuration:
agentic_security --port=PORT --host=HOST

Within seconds, you’re greeted with log output showing loaded datasets and a running server. The UI displays real-time scan progress, success/failure rates, and detailed reports on which attack vectors breached your defenses. For CI pipelines, you initialize a configuration file with agentic_security init, customize thresholds, and run headless scans with agentic_security ci.

The modular backend system deserves special attention. The configuration shows support for specific attack modules (the example includes “encoding” in the modules list), suggesting the tool can test various attack vector types. This modularity means you’re not stuck with a one-size-fits-all test suite—you can focus on attack vectors relevant to your deployment context.

Gotcha

Agentic Security’s documentation reveals some rough edges. Multiple sections in the README are marked “TBD,” including critical workflows like adding custom LLM integration templates. If you need detailed guidance on interpreting scan results or building remediation strategies, you’ll be spelunking through GitHub issues.

The tool’s effectiveness is fundamentally constrained by its attack datasets. While it ships with curated collections like the aya-23-8B jailbreak prompts, these represent known attack patterns. Emerging jailbreak techniques, zero-day prompt injection vectors, or domain-specific vulnerabilities won’t appear until someone adds them to a dataset. You’re essentially testing against yesterday’s threats unless you invest in maintaining custom attack collections.

There’s also no built-in guidance on what constitutes a “successful” attack. The tool reports which prompts triggered responses, but determining whether a response actually violates your safety policies requires manual review or custom scoring logic. A threshold of 30% failures sounds precise, but it’s meaningless if you haven’t defined what failure means for your use case. Is a slightly off-color joke equivalent to leaked PII? The tool won’t tell you.

Verdict

Use Agentic Security if you’re deploying LLM-powered systems into production and need repeatable, automatable security testing that integrates with CI/CD pipelines. It’s particularly valuable for teams building agentic AI systems where multi-step interactions create complex attack surfaces, or when you need to demonstrate due diligence around AI safety to stakeholders. The HTTP-based integration model makes it trivial to test any API-accessible LLM, and the dataset aggregation system lets you build comprehensive test suites from community-contributed attack vectors. Skip it if you require enterprise-grade documentation and support, need testing beyond LLM-specific attacks (it won’t help with traditional SQL injection or XSS), or are working with non-API models where HTTP integration doesn’t apply. Also skip if you expect the tool to tell you what good looks like—it’s a scanner, not a policy engine, so you’ll need mature internal standards for what constitutes acceptable LLM behavior.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/msoedov-agentic-security.svg)](https://starlog.is/api/badge-click/cybersecurity/msoedov-agentic-security)