Octosuite: GitHub OSINT Without the API Boilerplate
Hook
When Bellingcat investigates state-sponsored disinformation campaigns or tracks weapons manufacturers, they're not just scraping Twitter—they're analyzing GitHub commit histories, contributor networks, and repository patterns with a tool most developers have never heard of.
Context
Open Source Intelligence (OSINT) on GitHub has become critical for investigative journalism, security research, and threat intelligence. Investigators need to answer questions like: Who's contributing to this suspicious repository? What's the network of developers around this organization? Has this user changed their activity patterns? Which projects share unusual contributor overlap?
Traditionally, answering these questions meant writing custom scripts against GitHub's REST API, managing pagination, handling rate limits, and building exporters for different data formats. For non-developers or time-pressed investigators, this created a significant barrier. Even experienced developers faced repetitive boilerplate: authenticating, parsing responses, following paginated links, and structuring output. Bellingcat, the investigative journalism organization behind MH17 and Skripal poisoning investigations, built Octosuite to eliminate this friction—a focused toolkit that surfaces GitHub's intelligence value without requiring API expertise.
Technical Insight
Octosuite's architecture revolves around four core data models: User, Repo, Org, and Search. Each wraps specific GitHub API endpoints while providing a consistent interface for data extraction. The design philosophy is minimalist—rather than implementing GitHub's entire API surface, Octosuite focuses on OSINT-relevant endpoints and makes them accessible through three interfaces: a terminal UI built with Rich, a straightforward CLI, and a Python library for programmatic access.
The library interface is remarkably clean. Here's how you'd profile a GitHub user and export their repository data:
from octosuite import User
# Initialize user object
user = User("torvalds")
# Access profile data
print(f"Name: {user.name}")
print(f"Company: {user.company}")
print(f"Followers: {user.followers}")
# Get repositories with automatic pagination
repos = user.repos()
for repo in repos:
print(f"{repo['name']}: {repo['stargazers_count']} stars")
# Export to multiple formats
user.export_repos(filename="torvalds_repos.json", format="json")
user.export_repos(filename="torvalds_repos.csv", format="csv")
Under the hood, Octosuite uses the requests library to hit GitHub's REST API and handles pagination automatically. GitHub's Link header contains URLs for next, previous, first, and last pages—Octosuite parses these and iterates until all data is collected. This is crucial for OSINT work where you might need all 1,000+ repositories for an organization or every follower of a targeted account.
The data models implement lazy loading—API calls only execute when you access properties or methods that require them. This means you can instantiate a User object without immediately hitting rate limits, then selectively query only the data you need. For bulk operations, this design prevents unnecessary API consumption.
The terminal UI, built with Python's Rich library, provides an interactive alternative for investigators who prefer guided workflows. It presents menu-driven navigation through GitHub entities, automatically formats output in readable tables, and provides export options without writing code. The CLI mode offers a middle ground—command-line flags for quick queries:
# Get user information
octosuite user --username torvalds --repos --export json
# Search repositories
octosuite search --query "topic:osint language:python" --export csv
# Analyze organization
octosuite org --name bellingcat --members --repos
One clever architectural decision is the separation of data fetching from formatting. Each model's methods return raw dictionaries from the GitHub API, while separate exporter functions handle JSON, CSV, and HTML output. This means adding new export formats (XML, SQLite, etc.) requires only implementing new formatters without touching the API logic. For investigations involving multiple tools, this raw data access is valuable—you can pipe Octosuite's output directly into pandas, jq, or other analysis tools.
The Search model demonstrates Octosuite's OSINT focus. It supports GitHub's code search, allowing investigators to find specific patterns across repositories—leaked credentials, configuration files, internal hostnames, or API keys. Combined with date filters and user constraints, this enables temporal analysis: "Show me all repositories by users in this organization that mentioned 'production-db' in the last six months."
Octosuite intentionally avoids GitHub's GraphQL API, sticking with REST. While GraphQL offers more efficient queries, REST endpoints are simpler to understand, debug, and work with for non-expert users. The tool prioritizes accessibility over performance—a reasonable trade-off for OSINT workflows where data collection is batch-oriented rather than real-time.
Gotcha
Octosuite's biggest limitation is inherited from GitHub's API: rate limiting. Unauthenticated requests cap at 60 per hour, which you'll exhaust quickly when profiling users with hundreds of repositories or organizations with thousands of members. While GitHub tokens increase this to 5,000 requests per hour, Octosuite's documentation doesn't clearly explain authentication integration—you'll need to read the code or set environment variables manually. For large-scale investigations, this rate limit becomes a bottleneck requiring careful query planning or multiple API tokens.
The tool also lacks temporal awareness. GitHub's API returns current state—you can't use Octosuite to see historical repository membership, deleted repositories, or how a user's profile looked six months ago. For investigations tracking changes over time, you'll need to run Octosuite repeatedly and maintain your own historical database. There's no built-in diff functionality or change detection.
Finally, Octosuite is purely an extraction tool. It won't analyze contributor networks, visualize repository relationships, or detect anomalies. You get raw data that requires external analysis. If you're expecting social graph visualization or statistical insights, you'll be disappointed—Octosuite hands you structured data and gets out of the way. For some investigators, this is a feature; for others hoping for turnkey analytics, it's a significant gap.
Verdict
Use if: You're conducting investigations requiring systematic GitHub data collection—profiling developers, mapping organizational relationships, tracking repository activity, or searching for code patterns—and you want minimal setup without writing custom API scripts. It's ideal for journalists, security researchers, threat intelligence analysts, or anyone who needs GitHub OSINT data in exportable formats for further analysis. Also perfect for building data collection pipelines where you need reliable, paginated GitHub data extraction as a library component. Skip if: You need real-time monitoring, historical analysis, or built-in data visualization. Skip if you're already comfortable with PyGithub or GitHub's official CLI and don't need the OSINT-focused abstractions. Skip if your investigation requires private repository access or advanced GitHub features beyond basic profiles, repositories, and organizations. For one-off queries, GitHub's web interface or search is faster than installing a dedicated tool.