Back to Articles

Burpference: Adding LLM Intelligence to Your Security Proxy Workflow

[ View on GitHub ]

Burpference: Adding LLM Intelligence to Your Security Proxy Workflow

Hook

What if every HTTP request passing through your security proxy could be analyzed by GPT-4, Claude, or a local LLM without you lifting a finger? That's exactly what happens when you point Burpference at your Burp Suite traffic.

Context

Web application security testing has always been a game of pattern recognition. You proxy traffic through Burp Suite, manually inspect requests for SQL injection vectors, trace authentication flows, and hunt for authorization bypasses. The bottleneck isn't finding the requests—it's analyzing them fast enough to identify subtle vulnerabilities buried in thousands of HTTP transactions.

Traditional automated scanners help, but they operate on rigid rule sets. They'll catch the OWASP Top 10 low-hanging fruit, but miss business logic flaws, unusual authentication schemes, or context-dependent vulnerabilities. Meanwhile, large language models have proven surprisingly adept at security analysis when given the right prompts. The problem? No one wants to manually copy-paste requests into ChatGPT for hours. Burpference bridges this gap by automatically feeding your proxy history to configurable LLM endpoints and integrating the results back into Burp's native interface. It's not trying to replace Burp Scanner—it's augmenting your manual testing with an AI assistant that never gets tired of reading HTTP headers.

Technical Insight

Burpference operates as a Burp Suite extension written in Python, executed through Burp's Jython interpreter. This architectural decision means it runs inside Burp's JVM process with full access to the Extender API, allowing it to hook into proxy history, manipulate HTTP messages, and inject custom Scanner issues—all without external processes or complex IPC.

The core workflow starts with the processProxyHistory method, which continuously polls Burp's proxy log for new HTTP transactions. Each request/response pair passes through a sophisticated filtering pipeline that checks scope, MIME types, and status codes. This matters because sending binary files or out-of-scope domains to an LLM wastes API credits and dilutes analysis quality. The extension maintains a persistent processed_urls.json file to track what's already been analyzed, preventing redundant API calls:

def processProxyHistory(self):
    proxyHistory = self._callbacks.getProxyHistory()
    for entry in proxyHistory:
        requestInfo = self._helpers.analyzeRequest(entry)
        url = str(requestInfo.getUrl())
        
        # Skip if already processed or out of scope
        if url in self.processed_urls:
            continue
        if not self._callbacks.isInScope(requestInfo.getUrl()):
            continue
            
        # MIME type filtering to avoid binary content
        responseInfo = self._helpers.analyzeResponse(entry.getResponse())
        mimeType = responseInfo.getStatedMimeType()
        if mimeType in ['image', 'video', 'font', 'application/octet-stream']:
            continue
            
        # Extract and format for LLM
        request = self._helpers.bytesToString(entry.getRequest())
        response = self._helpers.bytesToString(entry.getResponse())
        self.sendToLLM(url, request, response)
        self.processed_urls.add(url)

The sendToLLM method handles multi-provider abstraction. Burpference doesn't hardcode API interactions—it reads from a config.json file that specifies which provider to use (OpenAI, Anthropic, Ollama, or custom endpoints). For local Ollama deployments, it constructs HTTP requests directly to localhost:11434, while commercial providers use their respective Python SDKs. The extension includes provider-specific prompt templates optimized for security analysis:

def constructPrompt(self, url, request, response):
    template = self.config.get('prompt_template', 'default')
    if template == 'sqli_focused':
        return f"""Analyze this HTTP transaction for SQL injection vulnerabilities.
        Focus on parameter handling, database error messages, and injection points.
        URL: {url}
        REQUEST:
        {request}
        RESPONSE:
        {response}
        Provide severity (Critical/High/Medium/Low) and exploitation steps."""
    # Other templates for XSS, AuthZ, business logic, etc.

When the LLM returns findings, Burpference parses the response text for severity keywords and vulnerability types, then creates native Burp Scanner issues using the addScanIssue callback. This is where the extension truly shines—findings aren't trapped in a separate log file. They appear in Burp's Issue Activity panel with custom severity levels, color-coded highlights, and clickable remediation details. The integration feels native because it uses the same APIs that Burp Scanner uses internally.

The extension also includes a dedicated Scanner tab that lets you target specific URLs or upload OpenAPI specifications for focused analysis. This bypasses the passive proxy monitoring and instead batch-processes endpoints you explicitly care about. Under the hood, it parallelizes requests using Python's threading module (limited by Jython's GIL constraints), sending chunks of URLs to the LLM with batched context to reduce API round trips.

One clever implementation detail: Burpference maintains separate findings.json and analysis.log files outside Burp's project structure. This means your LLM analysis persists across Burp sessions and can be exported for compliance reporting or fed into other tools. The JSON schema includes timestamps, model identifiers, and raw LLM responses, creating an audit trail of AI-assisted findings.

Gotcha

The Jython dependency is Burpference's biggest technical constraint. Burp Suite's Python extension support relies on Jython 2.7, which means you're stuck with Python 2.x syntax and cannot import modern libraries that depend on native extensions or Python 3 features. Want to use the latest OpenAI SDK with async/await? Tough luck. This forces the extension to vendor older library versions or implement HTTP clients from scratch using Java's networking classes through Jython's Java interop.

API key management is another pain point. Because Burp Suite runs in a JVM sandbox, it cannot access environment variables the way standalone Python scripts can. You must store API keys in plaintext JSON files within the extension directory. While you can restrict file permissions at the OS level, this creates an attack surface if someone gains read access to your Burp configuration folder. The README warns about this, but there's no elegant solution given Burp's architecture.

Performance becomes a real issue under load. If you're testing a JavaScript-heavy single-page application that generates hundreds of API calls, Burpference will queue them all for LLM analysis. With commercial APIs, you'll hit rate limits quickly and burn through credits. With local Ollama models, you'll max out GPU VRAM and create a backlog. The extension includes throttling options, but they're crude—simple delays between requests rather than intelligent batching or priority queues. You'll find yourself constantly tweaking the filtering rules to avoid analyzing OPTIONS requests, health checks, and other noise.

LLM accuracy is the elephant in the room. These models hallucinate. They'll flag legitimate rate limiting as a DoS vulnerability, or miss obvious SQL injection because the response didn't include 'SQL' in an error message. You cannot trust findings blindly—every result requires manual validation. For experienced pentesters, this is fine; you're already doing manual validation anyway. But junior practitioners might waste hours chasing false positives or, worse, skip real vulnerabilities because the LLM didn't flag them.

Verdict

Use if: You're an experienced penetration tester conducting deep web app assessments with hundreds of endpoints, you want LLM assistance surfacing edge cases and business logic issues that rule-based scanners miss, and you have budget for API credits or hardware for local models. Burpference excels when you need to process large amounts of traffic and want AI to highlight patterns you might overlook during manual review. It's particularly valuable for bug bounty hunters analyzing unfamiliar codebases or red teams building context during reconnaissance phases. Skip if: You need deterministic, compliance-ready results for audit reports, you're working with highly sensitive data that cannot touch external APIs (and lack local GPU resources), or you're new to AppSec and don't yet have the expertise to validate LLM findings. Also skip if you demand modern Python tooling—the Jython constraint makes dependency management frustrating. This is a research-grade augmentation tool for practitioners who understand its limitations, not a turnkey security solution.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/llm-engineering/dreadnode-burpference.svg)](https://starlog.is/api/badge-click/llm-engineering/dreadnode-burpference)