Building a Self-Documenting Security Remediation Pipeline: Lessons from AvP

Hook

What if your vulnerability scanner didn't just find bugs, but actually understood changelogs, cross-referenced safety databases, and wrote its own pull requests with evidence-based justifications?

Context

The security automation landscape is crowded with tools that detect vulnerabilities, but the gap between detection and remediation remains painfully manual. Developers receive Snyk alerts, Wiz reports, and Dependabot notifications—then spend hours researching whether updates are safe, reading changelogs for breaking changes, and crafting PR descriptions that justify the fix to reviewers.

AvP (Autonomous Vuln Patcher) emerged from the Nebula Fog hackathon as an experiment in closing this loop. Rather than building yet another vulnerability scanner, it tackles the unsexy middle ground: the analysis, validation, and documentation phases that consume engineering time. The project combines two seemingly unrelated components—a self-referential AI presentation about its own creation, and a Python CLI for vulnerability remediation workflows—to demonstrate how multi-agent systems can handle both creative and analytical automation challenges.

Technical Insight

System architecture — auto-generated

The remediation architecture follows a phased pipeline design that separates concerns cleanly. The fetch command queries multiple scanner APIs (Snyk, Wiz) to collect vulnerability data, the osv-check phase validates fixes against the OSV/GHSA databases for known security issues, changelog performs LLM-powered analysis of package release notes, and dedup-check prevents duplicate PRs before the final render-pr stage generates markdown documentation.

What makes this approach interesting is the validation layer. Most automated patching tools blindly bump versions based on scanner recommendations. AvP introduces a safety check by querying the OSV database—essentially asking "does this proposed fix introduce new vulnerabilities?" Here's the core validation flow from the codebase:

# Simplified from osv_check.py
async def check_package_version(package: str, version: str, ecosystem: str) -> dict:
    query = {
        "package": {"name": package, "ecosystem": ecosystem},
        "version": version
    }
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://api.osv.dev/v1/query",
            json=query,
            timeout=30.0
        )
    vulns = response.json().get("vulns", [])
    return {
        "safe": len(vulns) == 0,
        "vulnerabilities": vulns,
        "package": package,
        "version": version
    }

This pattern—validating proposed fixes against external threat intelligence before applying them—should be standard practice but rarely is. The deduplication logic adds another layer of pragmatism. Before creating a PR, the tool queries the GitHub API to check if a similar remediation PR already exists, preventing spam in repositories with multiple security tools running simultaneously.

The changelog analysis phase demonstrates where LLMs add genuine value. Rather than regex parsing or keyword matching, the system feeds raw changelog text to a language model with specific instructions to identify breaking changes, deprecated features, and migration requirements. The prompts are stored as templates:

# From changelog_analyzer.py conceptual structure
BREAKING_CHANGE_PROMPT = """
Analyze the following changelog for {package} upgrading from {old_version} to {new_version}.

Identify:
1. Breaking changes (API removals, signature changes, behavioral changes)
2. Required migration steps
3. Deprecated features that still work but will break in future versions

Changelog:
{changelog_text}

Respond in JSON format with keys: breaking_changes (list), migrations (list), deprecations (list)
"""

The PR rendering system uses Jinja2 templates to generate consistent documentation. Each PR includes sections for vulnerability details, OSV validation results, changelog highlights, and remediation strategy justification. This creates an audit trail that's invaluable during security reviews—you can trace exactly why the tool recommended a specific version bump and what validation occurred.

The presentation component, while seemingly separate, actually demonstrates the same architectural philosophy: multi-agent content generation with validation loops. The slide deck uses multiple AI agents (content writer, code generator, design critic, judge) that collaborate through structured prompts. One agent generates slide content, another evaluates it against hackathon criteria, and a third renders it into HTML with Max Headroom-inspired aesthetics. The self-referential nature—building a presentation about building the presentation—mirrors how the remediation tool documents its own decision-making process.

The project uses modern Python tooling throughout: uv for fast package management, Pydantic for data validation, async HTTP with httpx, and Click for CLI argument parsing. The Pydantic models ensure type safety across the pipeline:

from pydantic import BaseModel, Field
from typing import List, Optional

class VulnerabilityFinding(BaseModel):
    scanner: str = Field(description="Source scanner: snyk, wiz, etc")
    package_name: str
    current_version: str
    suggested_version: str
    severity: str
    cve_ids: List[str] = []
    osvCheck: Optional[dict] = None
    changelogAnalysis: Optional[dict] = None

This structured data flows through each pipeline stage, accumulating validation results and analysis that eventually populate the PR template. The architecture prioritizes traceability—every remediation decision includes evidence.

Gotcha

The "autonomous" label is somewhat misleading. AvP automates analysis and PR creation, but it doesn't actually patch vulnerabilities autonomously. There's no automatic merging, no CI/CD integration to validate proposed fixes, and no rollback mechanism if a fix breaks production. The tool generates recommendations and documentation, but humans still make the final decision. This is probably the right call for production systems, but don't expect true hands-off remediation.

The dependency footprint is substantial. You need GitHub CLI authenticated and configured, API tokens for whatever scanners you're using (Snyk, Wiz), network access to OSV APIs, and an LLM provider (likely OpenAI) for changelog analysis. The setup complexity means you're probably not running this locally—it's designed for CI/CD environments or dedicated security automation runners. The hackathon nature shows in the lack of error recovery: if the OSV API times out or the LLM returns malformed JSON, the pipeline stages can fail ungracefully. There's minimal retry logic or degraded functionality modes.

Verdict

Use if: you're building internal security automation tooling and need reference implementations for scanner integration, OSV validation patterns, or AI-powered changelog analysis. The architectural separation of fetch/validate/analyze/document phases is excellent, and the Pydantic models provide clean contracts between stages. Extract these patterns for your own tools rather than deploying AvP directly. Also use if you're exploring multi-agent AI systems and want to see how validation loops and structured prompts create more reliable outputs than single-shot generation. Skip if: you need production-ready vulnerability remediation today—go with Dependabot, Renovate, or Mend instead. Also skip if you're averse to LLM dependencies in security tooling or work in environments where external API calls (OSV, OpenAI) aren't permitted. The project's hackathon origins and minimal maintenance make it better suited as a learning resource than a deployment target.

Building a Self-Documenting Security Remediation Pipeline: Lessons from AvP

Building a Self-Documenting Security Remediation Pipeline: Lessons from AvP

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Building a Self-Documenting Security Remediation Pipeline: Lessons from AvP

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Caldera: When Your Red Team Needs a Planning Algorithm, Not Just Another C2

Caldera: Building Adversary Emulation with Fact-Based Planning Engines

Inside Mathias Bynens' Dotfiles: The Blueprint for 30,000 macOS Developer Environments

Glow: Why Rendering Markdown in the Terminal Shouldn't Require a Browser

Caldera: When Your Red Team Needs a Planning Algorithm, Not Just Another C2

Caldera: Building Adversary Emulation with Fact-Based Planning Engines

Inside Mathias Bynens' Dotfiles: The Blueprint for 30,000 macOS Developer Environments

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]