Awesome-AI-Hacking-Agents: Mapping the Frontier of Autonomous Security Testing
Hook
In less than four years, AI hacking agents have evolved from deep reinforcement learning experiments to LLM-powered autonomous penetration testers—and someone finally cataloged approximately 64 of them in one place.
Context
The intersection of AI and offensive security is moving faster than most security professionals can track. Since 2021, researchers and security companies have been building autonomous agents that can perform reconnaissance, exploit vulnerabilities, and even compete in CTF challenges. But there’s been no central registry—until now.
Evan Thomas Luke’s Awesome-AI-Hacking-Agents repository tackles a genuine problem: discovery paralysis in an emerging field. When you’re researching AI-powered security testing, you face a fragmented landscape spanning academic papers, GitHub repos with varying maturity levels, and commercial tools that may or may not be open-sourced. This repository consolidates what the author describes as “~64 open source AI hacking agents” chronologically from 2021 to 2025, creating a living document of how AI agents are reshaping offensive security. It’s explicitly work-in-progress, but that’s actually a feature—the field is moving too fast for a static reference to remain useful.
Technical Insight
What makes this repository architecturally interesting isn’t code execution—it’s a curated list, not a framework—but rather its integration layer for AI development workflows. Luke has connected each listed tool to two external documentation systems: DeepWiki MCP and Google CodeWiki. This isn’t just about human readers clicking links; it’s about making the catalog programmatically queryable by AI agents themselves.
The DeepWiki MCP (Model Context Protocol) integration is the clever bit. According to the README, a public MCP server at https://mcp.deepwiki.com/mcp exposes structured tools like ask_question, read_wiki_structure, and read_wiki_contents. The README indicates you can wire this into AI IDEs—Cursor, Claude Code, Windsurf, Continue.dev—typically with minimal configuration (the author describes it as “usually just a one-line config addition with no auth needed for public repos”). Once connected, your AI coding assistant can query documentation about any listed security tool while you’re developing.
Here’s how you’d typically configure an MCP server in a tools-compatible IDE (based on the repository’s guidance, though specific config syntax varies by tool):
{
"mcpServers": {
"deepwiki": {
"url": "https://mcp.deepwiki.com/mcp",
"transport": "http"
}
}
}
Then instruct your agent: “Use DeepWiki MCP to compare the tool architecture of PentestGPT and hackingBuddyGPT.” The agent calls read_wiki_structure on both repos, retrieves documentation, and synthesizes a comparison without you manually digging through codebases. This is meta-tooling for building security agents—using AI agents to research other AI agents.
The repository structure itself is deceptively simple: a single README with a markdown table. The visible portion shows entries numbered up to #12, though the author notes there are “~64 open source AI hacking agents listed” total. Each row tracks name, GitHub URL, creation date, and notes about papers or benchmarks. The chronological sorting tells a story. AutoPentest-DRL from March 2021 represents the deep reinforcement learning era—agents trained on simulated network environments. Jump to 2023, and you see PentestGPT (noted as appearing in USENIX Security 2024) leveraging large language models for natural language-driven pentesting. By 2024-2025, the table includes projects like D-CIPHER (associated with arXiv: 2502.10931) and other tools explicitly designed for CTF competitions.
What’s missing is equally revealing. Luke notes the repository needs benchmark scores, proper categorization, and completed CodeWiki/DeepWiki links. Many tools lack comparative data—no feature matrix showing which agents support memory persistence, which integrate with Metasploit, or which can operate autonomously versus requiring human-in-the-loop confirmation. The table includes an empty “Notes” column for several entries, and the author admits “some technically aren’t agents.” This incompleteness isn’t necessarily poor documentation; it reflects the chaotic, rapidly evolving state of the field itself.
The repository also establishes a community hub via an AI Hacking Discord (discord.gg/9AJnkNe6RE), positioning itself not just as a reference but as a gathering point for developers working on AI security agents. This community dimension is crucial for a work-in-progress catalog—users can submit missing tools via GitHub issues, turning the repository into a crowdsourced intelligence effort.
From a developer’s perspective, the real value proposition is time savings during the research phase. Instead of hunting through academic search engines, GitHub trending, and security conference proceedings, you get a single chronological index with direct links to both source code and structured documentation. The MCP integration means you can query this knowledge base programmatically while building your own agent, asking questions like “Which tools from 2024 support multi-step reasoning?” and potentially getting synthesized answers from the DeepWiki corpus.
Gotcha
The repository’s biggest limitation is right in the title: WORK IN PROGRESS. Luke is transparent about this, but it matters for practical use. Many entries are missing benchmark scores, which means you can’t easily compare effectiveness. The categorization is incomplete—the author notes needing to “re-sort and recategorize”—so you’ll find various types of tools listed together with no clear distinction about their maturity, approach, or intended use case. If you’re trying to select a tool for actual work, this repository won’t provide the comparative analysis you need.
The CodeWiki and DeepWiki links are incomplete. The table shows multiple empty cells with notes like “(empty are pending),” meaning Google hasn’t processed those repositories yet. You might find a tool that looks promising only to discover its documentation links are dead ends. The repository also includes a disclaimer that some entries “technically aren’t agents,” but doesn’t clearly mark which ones fall into that category. This lack of vetting means you could waste time investigating a tool that doesn’t actually provide the functionality you expect.
Finally, there’s the ethical dimension. The README includes a warning—“THESE REPOS ARE FOR EDUCATIONAL AND AUTHORIZED SECURITY TESTING PURPOSES ONLY”—but it’s a curated list of offensive security tools. The repository doesn’t assess which tools have safeguards, which require explicit authorization checks, or which might be trivially weaponized. If you’re a security leader evaluating whether to allow these tools in your organization, you won’t find guidance about responsible use policies or risk assessments.
Verdict
Use if: You’re researching the AI security agent landscape and need a comprehensive starting point, you’re building your own AI-powered pentesting tool and want to survey existing approaches, or you’re working in an AI IDE that supports MCP and want programmatic access to documentation about security tools. The chronological organization is genuinely useful for understanding how the field evolved from deep RL to LLM-based approaches, and the MCP integration is a notable feature for AI-assisted development workflows. Skip if: You need detailed tool comparisons or recommendations with analysis—the incomplete benchmarks and categorization make it unsuitable for immediate tool selection. Also skip if you’re expecting a static, polished reference; this is explicitly work-in-progress, and you’ll need to tolerate missing data and incomplete documentation links. Best suited for researchers, tool builders, and early adopters who value comprehensive discovery over polished curation and who are comfortable working with incomplete information.