Back to Articles

The GitHub Exploit Aggregator That Security Teams Don't Talk About Publicly

[ View on GitHub ]

The GitHub Exploit Aggregator That Security Teams Don't Talk About Publicly

Hook

Within hours of a critical vulnerability disclosure, weaponized exploit code often appears on GitHub before security vendors update their databases—and most organizations have no systematic way to detect it.

Context

Security teams face an impossible information asymmetry problem. When a new CVE drops, the race begins: defenders need to assess whether public exploits exist to prioritize patching, while attackers scour GitHub for working proof-of-concept code. Traditionally, this meant manually searching GitHub with queries like CVE-2024-1234 or monitoring security researcher Twitter feeds—a process that could take hours or days.

The nomi-sec/PoC-in-GitHub repository emerged to solve this discovery gap by automating what was previously tribal knowledge. Instead of relying on manual searches or waiting for curated databases like Exploit-DB to publish entries (which can lag by days or weeks), this tool continuously scrapes GitHub's vast codebase for newly published exploits. It's essentially a search engine purpose-built for the question every security team asks during incident response: "Is there public exploit code for this vulnerability yet?" With over 7,730 stars, it's become an unofficial standard for threat intelligence gathering, though its existence highlights an uncomfortable truth—exploit code is increasingly treated as open-source software, complete with GitHub repositories, CI/CD pipelines, and community contributions.

Technical Insight

The architecture of PoC-in-GitHub centers on GitHub's search API and repository indexing capabilities. At its core, the system likely implements a scheduled scraper that queries GitHub using vulnerability-related keywords and CVE identifier patterns. The most straightforward implementation would use GitHub's REST or GraphQL API with search queries structured like this:

import requests
import re
from datetime import datetime, timedelta

GITHUB_API = "https://api.github.com/search/repositories"
HEADERS = {"Authorization": "token YOUR_GITHUB_TOKEN"}

def search_recent_pocs(cve_id, days_back=7):
    date_filter = (datetime.now() - timedelta(days=days_back)).strftime("%Y-%m-%d")
    
    queries = [
        f"{cve_id} in:readme,description",
        f"poc {cve_id}",
        f"exploit {cve_id}"
    ]
    
    results = []
    for query in queries:
        params = {
            "q": f"{query} created:>{date_filter}",
            "sort": "stars",
            "order": "desc",
            "per_page": 100
        }
        
        response = requests.get(GITHUB_API, headers=HEADERS, params=params)
        if response.status_code == 200:
            for repo in response.json().get("items", []):
                results.append({
                    "name": repo["full_name"],
                    "url": repo["html_url"],
                    "description": repo["description"],
                    "stars": repo["stargazers_count"],
                    "created": repo["created_at"]
                })
    
    return deduplicate_results(results)

The clever part isn't just the search—it's the continuous monitoring and data enrichment. The system needs to track not just repositories with CVE mentions, but also commits, issues, and pull requests that might contain exploit code. This requires parsing multiple data streams and correlating them against known vulnerability identifiers.

What makes this particularly valuable for security teams is the temporal dimension. By tracking when exploit code first appears publicly, you can build a timeline: vulnerability disclosure → PoC publication → exploitation in the wild. This lag period—sometimes just hours—is critical for prioritization. A vulnerability with a 9.8 CVSS score but no public exploit might be less urgent than a 7.5 with working exploit code on GitHub.

The repository likely maintains a structured index, possibly as markdown files organized by CVE year or vulnerability type, with metadata including GitHub stars (as a rough proxy for exploit reliability), publication date, and repository activity. The data structure might look like:

CVE-2024-1234:
  severity: CRITICAL
  repositories:
    - url: https://github.com/researcher/exploit-cve-2024-1234
      stars: 145
      first_seen: 2024-01-15T08:23:00Z
      last_updated: 2024-01-16T14:30:00Z
      language: Python
      status: active
    - url: https://github.com/pentester/poc-1234
      stars: 12
      first_seen: 2024-01-15T11:45:00Z
      status: archived
  exploit_available: true
  weaponization_level: functional

The system's warning about malware isn't paranoia—it's recognition of a real threat. Adversaries have been known to publish fake PoC repositories that contain information stealers or backdoors, exploiting the fact that security researchers will blindly clone and execute code. A sophisticated aggregator would implement basic safety checks: scanning for obfuscated code, checking repository age and contributor history, or flagging suspicious network calls. However, PoC-in-GitHub appears to prioritize comprehensiveness over curation, making it a raw intelligence feed rather than a vetted database.

The real technical challenge is dealing with GitHub's rate limits (5,000 requests/hour for authenticated users) while maintaining fresh data across thousands of CVEs. This likely requires intelligent caching, webhook-based updates for known repositories, and prioritization algorithms that focus scraping resources on recently disclosed or trending vulnerabilities. The system might implement a tiered approach: real-time monitoring for CVEs from the last 30 days, hourly updates for the last year, and daily scans for historical vulnerabilities.

Gotcha

The most critical limitation is the zero-trust problem: you cannot assume any collected PoC is what it claims to be. The repository's malware warning isn't just liability protection—there's documented evidence of threat actors publishing malicious code disguised as legitimate exploits. In 2023, researchers identified multiple fake security tool repositories that delivered infostealers to security professionals. Running code from PoC-in-GitHub without sandboxing is operational security malpractice, yet the tool provides no built-in isolation, verification, or safety mechanisms.

Beyond malware, there's a signal-to-noise ratio problem. GitHub search returns repositories based on keyword matching, which means you'll get incomplete PoCs, theoretical attack descriptions without working code, and duplicates of the same exploit across dozens of repositories. The repository provides no quality indicators beyond GitHub stars, which is a poor proxy—malicious repositories can artificially inflate stars, and a 3-star repository from a respected security researcher might be more valuable than a 200-star repository that's just copied code. You still need human expertise to evaluate whether a PoC is functional, whether it matches your specific environment and software versions, and whether the claimed CVE mapping is accurate. For organizations without dedicated security researchers, this tool might create false confidence: seeing a PoC listed doesn't necessarily mean you're immediately exploitable or that the exploit actually works.

Verdict

Use if: You're a security professional building threat intelligence pipelines and need early detection of exploit publication to prioritize vulnerability remediation. Use if you have proper sandboxing infrastructure (isolated VMs, container environments) and the expertise to analyze potentially malicious code before execution. Use if you're conducting red team assessments and need to survey the current exploit landscape for specific CVE ranges, understanding you'll need to validate each PoC manually. Skip if: You lack dedicated security expertise and might execute untrusted code on production or development systems. Skip if you need verified, production-ready exploits—this is an index, not a curated database. Skip if you're looking for comprehensive vulnerability coverage beyond GitHub, as this misses exploits published on GitLab, security forums, or disclosed through vendor channels. Skip if your threat model requires attribution or exploit source verification, since GitHub repositories can be ephemeral, forked, or deliberately deceptive.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/nomi-sec-poc-in-github.svg)](https://starlog.is/api/badge-click/cybersecurity/nomi-sec-poc-in-github)