Back to Articles

BurpGPT: When AI-Powered Vulnerability Scanning Met Reality

[ View on GitHub ]

BurpGPT: When AI-Powered Vulnerability Scanning Met Reality

Hook

BurpGPT garnered 2,285 GitHub stars by promising AI-powered vulnerability detection that traditional scanners miss—then its creators abandoned the open-source version entirely. What happened?

Context

Traditional web vulnerability scanners excel at pattern matching. They’ll catch SQL injection, XSS, and CSRF with impressive accuracy because these vulnerabilities follow predictable patterns. But ask them to detect business logic flaws—like a password reset flow that doesn’t validate token ownership, or an authorization bypass hidden in multi-step workflows—and they go blind. These bespoke vulnerabilities require contextual understanding of application behavior, something rule-based engines fundamentally cannot provide.

BurpGPT emerged in this gap, proposing a radical idea: what if we fed HTTP traffic directly to OpenAI’s GPT models and asked them to reason about security implications? Instead of writing thousands of detection rules, security researchers could describe what they’re looking for in natural language prompts. The extension promised to democratize sophisticated vulnerability detection, letting a well-crafted prompt uncover flaws that would otherwise require manual code review. It integrated directly into Burp Suite’s passive scanning workflow, analyzing traffic in real-time and injecting findings back into Burp’s native issue tracker. For a brief moment, it represented the vanguard of AI-augmented security testing.

Technical Insight

Burp Suite Environment

HTTP Traffic

Extract Components

Replace Placeholders

Constructed Prompt

GPT Analysis Response

Security Findings

Burp Suite Proxy

BurpGPT Extension

Prompt Builder

OpenAI API

Burp Issue Tracker

System architecture — auto-generated

BurpGPT’s architecture centers on a placeholder-based prompt construction system that bridges Burp Suite’s Java environment with OpenAI’s API. The extension hooks into Burp’s passive scanning framework using the Montoya API (version 2023.3.2+), which provides callback hooks for HTTP request/response processing. When traffic flows through Burp, BurpGPT extracts components and injects them into user-defined prompt templates.

The placeholder system is straightforward but powerful. The README documents placeholders including {REQUEST} for the scanned request, {URL} for the request URL, and {METHOD} for the HTTP method. Users craft prompts incorporating these placeholders, which BurpGPT replaces with actual traffic data before sending to OpenAI. This templating approach lets security professionals encode their expertise as reusable prompt patterns.

The extension manages OpenAI token limits through a configurable maximum prompt length parameter. The README notes GPT models have token caps around 2048 for GPT-3, creating an immediate tension: include too much context and you hit token limits; include too little and the model lacks information to reason effectively. The README emphasizes this constraint: “Enables granular control over the number of GPT tokens used in the analysis by allowing for precise adjustments of the maximum prompt length.”

Integration with Burp’s issue tracking happens through the Montoya API’s scanner facilities. When GPT responds with identified issues, BurpGPT parses the natural language output and creates Burp scanner issues at Informational severity. This means findings appear alongside native Burp detections in the Target tab, complete with full request/response context. However, the README explicitly warns: “While the report is automated, it still requires triaging and post-processing by security professionals, as it may contain false positives.”

The build system uses Gradle with the Shadow plugin to produce a fat JAR containing all dependencies—critical since Burp extensions must be self-contained. The shadowJar task bundles dependencies into a single artifact:

./gradlew shadowJar
# Produces lib/build/libs/burpgpt-all.jar

Loading into Burp follows the standard extension workflow: Extensions tab → Add → select the JAR. Configuration happens through a settings panel accessible from Burp’s menu bar, requiring users to provide an OpenAI API key, select a model, and define maximum prompt size.

The critical limitation revealed in the architecture is prompt engineering burden. The README states bluntly: “The effectiveness of this extension is heavily reliant on the quality and precision of the prompts created by the user for the selected GPT model.” Unlike traditional scanners with pre-built detection logic, BurpGPT shifts vulnerability detection expertise from tool developers to end users. You’re not buying a scanner; you’re buying an interface to build your own scanner through prompts. This requires deep security knowledge to craft effective queries, understand GPT’s reasoning patterns, and recognize when responses are hallucinated nonsense versus legitimate findings.

The data privacy model is stark: “Data traffic is sent to OpenAI for analysis.” Every HTTP request and response you analyze leaves your environment, travels to OpenAI’s infrastructure, and gets processed by their models. For penetration testers working with client data under NDA, or security teams analyzing internal applications with sensitive information, this creates significant concerns. The README acknowledges this with a warning directing users to OpenAI’s privacy policy, but offers no local alternative.

Gotcha

The most significant gotcha is prominently displayed at the top of the README: “Please note that the Community edition is no longer maintained or functional.” The open-source version is dead. The creators launched BurpGPT Pro, a commercial offering, and explicitly state “It is no longer useful to log Issues for the Community edition.” This means the GitHub repository exists primarily as historical artifact and marketing for the paid product. If you clone and build the current codebase, you’re getting a non-functional extension. Any actual usage requires purchasing BurpGPT Pro—pricing undisclosed in the repository.

Even if the community edition were maintained, the data privacy issue remains fundamental. The architecture requires sending all analyzed traffic to OpenAI’s servers. This appears incompatible with most penetration testing contracts, internal security policies, and likely problematic for regulated environments. The README’s warning about reviewing OpenAI’s privacy policy doesn’t solve the problem—it just acknowledges it exists.

Token costs and limits create operational friction. Analyzing hundreds or thousands of HTTP requests in a typical web application scan could generate substantial API bills. The token limit constraint means you’re constantly tuning maximum prompt length, balancing completeness against cost and model capabilities. There’s no guidance in the documentation about optimal token budgets for different scanning scenarios.

False positive rates are another acknowledged weakness. The README warns that findings “may contain false positives” requiring “triaging and post-processing by security professionals.” LLMs are prone to hallucination—confidently stating things that sound plausible but are factually wrong. In security scanning context, this manifests as reported vulnerabilities that don’t actually exist, or missed vulnerabilities because the model didn’t reason correctly about the traffic. You need expertise to validate every finding, which undermines the automation value proposition.

Verdict

BurpGPT represents an important experiment in AI-augmented security testing, but its abandonment as open-source software and fundamental architectural constraints limit its practical utility. Use if: you’re researching AI integration patterns in security tooling, want to understand prompt engineering approaches for vulnerability detection, or are willing to pay for BurpGPT Pro and work exclusively on non-confidential projects where sending data to OpenAI is acceptable. The concept of natural language vulnerability queries remains compelling for business logic testing. Skip if: you need actively maintained open-source tools, handle any sensitive data (which is most security testing), require on-premise or air-gapped scanning, or lack budget for both commercial licensing and ongoing OpenAI API costs. The community edition’s death makes this primarily of historical interest rather than practical use. For production security work, stick with traditional scanners like Burp Suite Pro’s native capabilities or invest in local LLM infrastructure if you want AI augmentation without data exfiltration risks.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/aress31-burpgpt.svg)](https://starlog.is/api/badge-click/cybersecurity/aress31-burpgpt)