Octosuite: Bellingcat’s OSINT Toolkit for Investigating GitHub Without Writing API Code
Hook
When investigative journalists at Bellingcat needed to trace developer networks and repository relationships for open-source investigations, they built a GitHub intelligence toolkit that handles pagination and exports—so you don’t have to write another API wrapper.
Context
GitHub is a goldmine for open-source intelligence work. Developer profiles reveal collaboration patterns, repository histories expose technology stacks, and organizational memberships map out team structures. Security researchers tracking malware repositories, journalists investigating state-sponsored development activity, and recruiters profiling candidate contributions all need systematic ways to extract this data. But GitHub’s REST API, while comprehensive, requires handling pagination, authentication, and response parsing for even basic queries. Most practitioners end up either writing throwaway scripts that break when requirements change, or clicking through GitHub’s web interface copying data manually. Octosuite emerged from Bellingcat’s investigative work to provide repeatable, exportable GitHub data extraction that non-programmers could use interactively while still being scriptable for automated workflows. Rather than yet another specialized analytics dashboard, Octosuite embraces the Unix philosophy: do data extraction well, output in standard formats, and let downstream tools handle analysis.
Technical Insight
Octosuite’s architecture is deceptively simple—four core entity classes (User, Repo, Org, Search) that wrap GitHub API endpoints with consistent interfaces for pagination and export. What makes it elegant is the layered access pattern: the same underlying library powers both an interactive TUI and command-line tools, while remaining importable as a standard Python package.
The library API reveals the core abstraction. Every entity follows an exists() pattern that checks resource availability before attempting expensive operations:
from octosuite import User, Repo
user = User("torvalds")
exists, profile = user.exists()
if exists:
repos = user.repos(page=1, per_page=100)
followers = user.followers(page=1, per_page=50)
events = user.events(page=1, per_page=100)
repo = Repo(name="linux", owner="torvalds")
exists, repo_data = repo.exists()
if exists:
commits = repo.commits(page=1, per_page=100)
languages = repo.languages()
stargazers = repo.stargazers(page=1, per_page=100)
This two-step validation prevents wasted API calls against non-existent resources. Each method returns structured data that’s already deserialized from JSON, ready for export or processing.
The CLI layer demonstrates how to design command interfaces for investigative workflows. Notice the consistent flag patterns across entity types:
# Extract user data with specific detail level
octosuite user torvalds --repos --page 1 --per-page 50
octosuite user torvalds --followers --json --export ./investigation
# Repository forensics
octosuite repo torvalds/linux --commits --per-page 100
octosuite repo torvalds/linux --stargazers --export ./data
# Organization mapping
octosuite org github --members --json
# Cross-entity search
octosuite search "machine learning" --repos --per-page 50
octosuite search "python cli" --users --json
The --export flag writes data to disk in JSON, CSV, or HTML formats—critical for creating audit trails in investigative work where you need to document exactly what data you collected and when. The --json flag outputs to stdout, making Octosuite pipeable with jq, grep, or other text processing tools.
What’s particularly clever is the data type flag system. Each entity exposes specific endpoints as flags (—repos, —followers, —commits, —stargazers, etc.). This maps GitHub’s API surface area to command-line switches, making API documentation unnecessary for basic usage. Run octosuite user --help and you see exactly what data you can extract.
The TUI mode (octosuite --tui) wraps this in a terminal interface using arrow key navigation. For investigators who aren’t developers, this removes the command-line barrier entirely—they can explore GitHub data interactively, see results immediately, and export when they find something relevant.
Under the hood, Octosuite handles pagination with configurable page and per_page parameters (max 100 per GitHub’s API constraints). User entities support queries for profile, repos, subscriptions, starred, followers, following, orgs, gists, events, and received_events. Repository entities expose endpoints including forks, issue_events, branches, tags, languages, commits, releases, and deployments. This comprehensive coverage means you rarely need to drop down to raw API calls for common investigative tasks.
Gotcha
Octosuite is fundamentally constrained by GitHub’s public API, which means rate limiting exists but isn’t documented in the tool’s README—you’ll need to understand GitHub’s API quotas independently. The tool appears to lack documentation on rate limit handling, backoff strategies, or how to track remaining quota based on the available documentation.
The tool also doesn’t appear to include data enrichment or deduplication features based on the README. Extract followers for multiple related users and you’ll get redundant records with no built-in way to merge or identify overlap. For real investigations, you’ll need to build your own database layer or pipeline Octosuite output into tools like SQLite or pandas for relationship analysis. The export formats (JSON, CSV, HTML) are raw dumps—the README doesn’t indicate built-in analysis features beyond data extraction.
Verdict
Use Octosuite if you’re doing investigative research, security reconnaissance, or developer profiling where you need systematic, exportable GitHub data without maintaining custom API integration code. It’s perfect for journalists tracking developer networks, security teams mapping organization structures, or recruiters building candidate portfolios. The three-interface design (TUI/CLI/library) means both non-technical investigators and automation engineers can use the same tool. Skip it if you need real-time monitoring, historical trend analysis beyond what GitHub’s API provides, or you’re building production systems requiring sophisticated rate limit orchestration and error recovery—for those cases, use PyGithub or Octokit directly and build proper retry logic, or jump to gharchive.org for large-scale historical analysis.