Inside APIs.guru: The Wikipedia of OpenAPI Specs and Its Hidden Correction Pipeline

Hook

Eighty percent of public API specifications are broken straight from the source. A small open-source project has been quietly fixing them for years, becoming the backbone of dozens of developer tools you probably use.

Context

Before APIs.guru launched in 2016, finding machine-readable API specifications was a scattered mess. API providers would host their OpenAPI (then Swagger) files on various domains, some would go offline, versioning was inconsistent, and many specs were syntactically invalid or semantically broken. If you were building a tool that needed to work with multiple APIs—an SDK generator, a mock server, an API testing suite—you had to manually hunt down each specification, validate it, fix it, and hope the provider didn't move or break it next week.

The OpenAPI Specification promised a world where APIs were self-describing and tooling could be built generically across any API. But the promise only works if you can actually find and trust those specifications. APIs.guru emerged as the missing layer: a curated, corrected, continuously updated directory of public API definitions. Think of it as a CDN for API specs, but one that actively fixes your dependencies before serving them. Today it catalogs over 4,000 APIs from providers like AWS, Azure, Google, Stripe, and thousands of smaller services, with automated weekly updates and a correction pipeline that transforms unusable specs into reliable data sources.

Technical Insight

System architecture — auto-generated

The repository's architecture is deceptively simple but remarkably effective. At its core is a file hierarchy under /APIs/ organized by provider domain and API name, containing both OpenAPI 2.0 (swagger.yaml) and 3.x (openapi.yaml) versions. But the real intelligence lives in the automated pipeline that fetches, validates, converts, and patches these specifications.

Every API definition includes an x-origin extension that tracks its source, fetch date, and any transformations applied. This metadata is critical—it lets you trace a spec back to its canonical source while understanding what corrections were necessary. Here's what a typical origin block looks like:

x-origin:
  - format: openapi
    version: '3.0'
    url: https://api.example.com/openapi.json
    converter:
      url: https://github.com/mermade/oas-kit
      version: 7.0.8
    x-apisguru-direct: true

The correction pipeline is where things get interesting. The project uses a set of automated fixes and manual patches stored in patches/ to address common issues: missing required fields, invalid JSON references, malformed schema definitions, incorrect content types, and broken examples. These patches are version-controlled alongside the specs, creating an auditable trail of what was wrong with each API's original definition.

For developers consuming this data, the recommended approach is through the REST API rather than raw Git access. The API endpoint https://api.apis.guru/v2/list.json returns a complete catalog with metadata about each API:

{
  "stripe.com": {
    "added": "2015-11-10T21:25:10.000Z",
    "preferred": "v1",
    "versions": {
      "v1": {
        "info": {
          "title": "Stripe API",
          "version": "2023-10-16"
        },
        "swaggerUrl": "https://api.apis.guru/v2/specs/stripe.com/v1/swagger.json",
        "swaggerYamlUrl": "https://api.apis.guru/v2/specs/stripe.com/v1/swagger.yaml",
        "openapiVer": "3.0.0",
        "updated": "2024-01-15T12:34:56.000Z"
      }
    }
  }
}

This API-first consumption model is intentional. The repository structure can and does change as the project reorganizes or renames APIs. Building directly against Git commits or GitHub raw URLs will break your tooling. Instead, the REST API provides stable URLs with semantic versioning and backward compatibility guarantees.

The conversion pipeline is particularly sophisticated when handling OpenAPI 2.0 to 3.x migrations. The project uses automated converters but then applies domain-specific knowledge through patches. For example, AWS API specs often need fixes for their x-amazon-apigateway extensions, while Google APIs frequently have issues with their discovery document to OpenAPI conversions. These provider-specific quirks are documented in patch files, creating a knowledge base of API specification anti-patterns.

For tool builders, the real value is in the validation guarantees. Every spec in the directory passes both syntactic validation (valid YAML/JSON, correct OpenAPI schema) and semantic validation (resolvable references, valid examples, consistent types). This means you can build generators, validators, or mock servers that assume well-formed input, dramatically reducing your error handling surface area. Tools like HTTP Toolkit use APIs.guru as their API catalog, Kiota uses it for SDK generation scenarios, and numerous documentation generators pull from it as a trusted source.

Gotcha

The project's documentation is explicit about the biggest gotcha: do not use this as a Git submodule or rely on GitHub's raw content URLs. The repository structure is not an API contract. Directories get reorganized, files get renamed, and APIs occasionally move between categories. I've seen developers hardcode paths like raw.githubusercontent.com/APIs-guru/openapi-directory/main/APIs/stripe.com/swagger.yaml in their CI pipelines, only to have builds break weeks later when the directory structure changed. Always use the REST API endpoints.

The weekly update cadence is another limitation that catches people off guard. If an API provider pushes a breaking change to their spec, it can take a week or more before APIs.guru picks it up, validates it, and publishes it. For rapidly evolving APIs or beta endpoints, you're better off fetching directly from the provider. The directory is optimized for stability and correctness, not real-time synchronization. Additionally, the project explicitly excludes certain API categories: private/internal APIs, temporary event-specific APIs, and APIs behind authentication walls that prevent specification discovery. If you're building tooling for enterprise internal APIs or need comprehensive coverage of authenticated-only endpoints, you'll need to supplement this directory with your own sources.

Verdict

Use if you're building developer tools that need to work across multiple APIs (SDK generators, mock servers, testing frameworks, documentation tools), need a large corpus of real-world OpenAPI specs for testing your own validators or converters, or want a reliable, corrected source for public API discovery without hunting down individual specs. The correction pipeline alone is worth it—you get specifications that actually work rather than the broken versions most providers serve. Skip if you need real-time API updates, are working with private or internal APIs, need GraphQL or gRPC definitions instead of REST, or plan to access specs directly via Git rather than the REST API. For those cases, go directly to the source providers or use a commercial API catalog with enterprise SLAs.

Inside APIs.guru: The Wikipedia of OpenAPI Specs and Its Hidden Correction Pipeline

Inside APIs.guru: The Wikipedia of OpenAPI Specs and Its Hidden Correction Pipeline

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Inside APIs.guru: The Wikipedia of OpenAPI Specs and Its Hidden Correction Pipeline

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Free-AI-Social-Media-Scheduler: A 2,000-Star Repository With Zero Lines of Code

jam-nodes: Type-Safe Workflow Nodes That Stop Before They Become an Orchestrator

Puppeteer: How Chrome's DevTools Protocol Became the Standard for Browser Automation

Inside awesome-selfhosted: How a 292K-Star GitHub List Became the Self-Hosting Movement's Central Nervous System

Free-AI-Social-Media-Scheduler: A 2,000-Star Repository With Zero Lines of Code

jam-nodes: Type-Safe Workflow Nodes That Stop Before They Become an Orchestrator

Puppeteer: How Chrome's DevTools Protocol Became the Standard for Browser Automation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]