Back to Articles

diodb: How a Flat JSON File Became the Security Researcher's Phone Book

[ View on GitHub ]

diodb: How a Flat JSON File Became the Security Researcher’s Phone Book

Hook

What if the best database for tracking thousands of security contact points across the internet wasn’t a database at all, but a single JSON file maintained by volunteers on GitHub?

Context

Before diodb, security researchers faced a fragmented landscape when trying to report vulnerabilities responsibly. Some organizations buried their security contact information in obscure corners of their websites. Others had no public disclosure policy at all, leaving researchers uncertain whether reporting a bug would be met with gratitude or legal threats. Even when companies ran bug bounty programs, they were scattered across competing platforms or hosted on custom infrastructure, making discovery a manual archaeological dig through DNS records, security.txt files, and corporate legal pages.

The disclose.io initiative emerged to standardize vulnerability disclosure with a focus on safe harbor—legal protections for security researchers acting in good faith. But standards are worthless without adoption visibility. diodb solves this coordination problem by creating a single, authoritative, vendor-neutral registry of who accepts vulnerability reports and under what terms. By choosing GitHub as infrastructure and JSON as the data format, the project trades sophisticated database features for something more valuable: zero operational overhead, complete transparency, and frictionless community contribution.

Technical Insight

Pull Requests

Stores

Validates Schema

CI Runs

Consumed by

Direct Access

Search Interface

Community Contributors

GitHub Repository

program-list.json

Python Validation Scripts

disclose.io/programs

Security Researchers

System architecture — auto-generated

The genius of diodb lies in what it doesn’t build. Instead of creating a custom backend with authentication, API rate limiting, and admin panels, the entire system is a 2MB JSON file living in a Git repository. The core data structure in program-list.json is elegantly simple:

{
  "program_name": "Example Corp Security",
  "policy_url": "https://example.com/security",
  "contact_url": "security@example.com",
  "launch_date": "2020-03-15",
  "bug_bounty": true,
  "safe_harbor": "full",
  "domains": ["example.com", "*.example.com"],
  "program_type": "company"
}

This flatness is a feature, not a limitation. Every entry is self-contained and human-readable. Security researchers can grep through it locally, parse it with any language’s JSON library, or even read it directly on GitHub. The lack of foreign keys and normalized tables means zero impedance mismatch when converting to other formats.

The Python tooling in the repository provides validation without gatekeeping. A simple schema validator ensures submissions meet minimum quality standards:

import json
from jsonschema import validate, ValidationError

def validate_program_entry(entry):
    schema = {
        "type": "object",
        "required": ["program_name", "policy_url"],
        "properties": {
            "program_name": {"type": "string", "minLength": 3},
            "policy_url": {"type": "string", "format": "uri"},
            "safe_harbor": {"enum": ["full", "partial", "none"]},
            "domains": {"type": "array", "items": {"type": "string"}}
        }
    }
    try:
        validate(instance=entry, schema=schema)
        return True
    except ValidationError as e:
        print(f"Invalid entry: {e.message}")
        return False

This validation runs in GitHub Actions on every pull request, creating a continuous integration pipeline without deploying any servers. Contributors fork the repo, add their program entry, and submit a PR. The automated checks catch malformed JSON while human reviewers verify the program actually exists and the details are accurate.

The real architectural cleverness emerges in how diodb leverages GitHub’s native features as a poor man’s database management system. Issues become feature requests and data quality reports. Pull requests are the write API. Git blame provides audit trails showing exactly who added each program and when. GitHub’s search becomes the query engine. Stars indicate community endorsement. And GitHub Pages or external sites like disclose.io/programs consume the raw JSON to provide user-friendly interfaces.

This architecture creates a virtuous cycle: low operational burden means maintainers focus on curation rather than infrastructure. The open JSON format means anyone can build tools on top of it without asking permission. Security researchers integrate it into their reconnaissance workflows with simple curl commands:

curl -s https://raw.githubusercontent.com/disclose/diodb/master/program-list.json \
  | jq '.[] | select(.safe_harbor == "full") | .program_name'

For organizations, submitting to diodb is marketing—it signals to the security community that you’re serious about receiving vulnerability reports. The social proof is visible: your entry sits alongside Fortune 500 companies and respected security-first organizations. The safe_harbor field becomes particularly powerful here, functioning as a binary flag that researchers filter on when deciding where to invest time hunting for vulnerabilities.

Gotcha

The data freshness problem is real and inherent to the model. Programs shut down, companies get acquired, security team email addresses change, and policies get updated—but diodb only knows what volunteers tell it. There’s no heartbeat check, no automated verification that policy URLs still resolve or that contact emails don’t bounce. A program added in 2019 might still be listed even if the company quietly discontinued it in 2023. Security researchers using diodb as gospel will waste time reporting to dead endpoints or, worse, make incorrect assumptions about safe harbor protections that no longer apply.

The validation scripts catch syntactic errors but can’t verify semantic accuracy. Nothing prevents someone from submitting a program entry with a valid-looking URL that points to a page with completely different terms than what’s represented in the JSON. The human review process helps, but maintainers can’t realistically verify every detail of every submission, especially for less-known organizations. The community curation model scales contribution but doesn’t scale verification, creating an asymmetric trust problem where bad data is easier to add than to detect and remove.

Verdict

Use diodb if you’re a security researcher building reconnaissance tooling, need to quickly identify which organizations have formal vulnerability disclosure programs, or want to filter targets by safe harbor status before investing time in security testing. It’s also valuable for organizations wanting to benchmark their disclosure programs against peers or for researchers studying the adoption of responsible disclosure practices across industries. The open data format and GitHub-native approach make it trivial to integrate into automated workflows. Skip diodb if you need guaranteed current information for legal or compliance purposes, require detailed program rules beyond basic contact info, or want real-time accuracy—in those cases, verify directly with the organization or use managed platforms like HackerOne where program operators actively maintain their listings. Also skip it if you’re looking for in-scope asset lists or technical program requirements; diodb is deliberately minimal and focused on the coordination problem of ‘who accepts reports and under what legal terms’ rather than ‘what exactly should I test.’

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/disclose-diodb.svg)](https://starlog.is/api/badge-click/cybersecurity/disclose-diodb)