APTnotes: A Curated Archive of 2,000+ Threat Intelligence Reports
Hook
Security researchers face a common problem: the critical APT report you need to reference has disappeared from the vendor’s website. APTnotes exists because threat intelligence has a shelf life shorter than most malware campaigns.
Context
Advanced Persistent Threat (APT) reports are the lifeblood of security research. Vendors like FireEye, CrowdStrike, and Kaspersky publish detailed analyses of sophisticated attack campaigns, documenting tactics, techniques, and indicators that defenders rely on to protect their networks. But there’s a problem: these reports vanish. Companies rebrand, restructure their websites, or pull down old content. Links break. PDFs become 404s. Critical intelligence becomes inaccessible exactly when you need it most.
APTnotes was created to solve this preservation crisis. Started as a GitHub repository with reports stored directly in year-based folders, the project hit storage limits and migrated to a more practical solution: reports moved to Box cloud storage (donated by Box) while metadata remained in the GitHub repo as CSV and JSON files. The project maintains publicly-available papers and blogs related to malicious campaigns and vendor-defined APT groups, creating a searchable, machine-readable archive spanning from 2008 to present. It’s infrastructure for the security research community, enabling historical analysis, tool development, and ensuring that knowledge about threat actors doesn’t disappear into the digital void.
Technical Insight
The architecture of APTnotes is deliberately pragmatic, prioritizing accessibility and longevity over complexity. Reports are stored on Box cloud storage (donated by Box), while metadata lives in two structured formats within the GitHub repository: CSV and JSON. This separation of concerns—metadata in version control, large binaries in object storage—solved the original storage problem when the repo ran out of room.
The CSV format provides maximum compatibility:
Filename,Title,Source,Link,SHA-1,Date,Year
Fritz_HOW-CHINA-WILL-USE-CYBER-WARFARE(Oct-01-08),How China Will Use Cyber Warfare,Jason Fritz,https://app.box.com/s/696xnzy1an3jbm3b212y5n8xieirbemd,3e6399a4b608bbd99dd81bd2be4cd49731362b5e,10/1/08,2008
Each row captures the essential metadata: filename, title, source vendor, Box download link, SHA-1 hash for integrity verification, and publication date. The SHA-1 hash is particularly important—it allows downstream consumers to verify they’ve received an unmodified copy and detect duplicate reports even when filenames differ.
The JSON format enables programmatic access with minimal parsing overhead:
[{
"sha1": "3e6399a4b608bbd99dd81bd2be4cd49731362b5e",
"Title": "How China Will Use Cyber Warfare",
"Filename": "Fritz_HOW-CHINA-WILL-USE-CYBER-WARFARE(Oct-01-08)",
"Source": "Jason Fritz",
"Link": "https://app.box.com/s/696xnzy1an3jbm3b212y5n8xieirbemd",
"Year": "2008",
"Date": "10/1/08"
}]
For researchers building automation, the companion tools repository (https://github.com/aptnotes/tools) provides scripts for bulk downloading. This enables workflows like: fetch the JSON metadata, filter by year or source, download matching reports, and run batch analysis. The architecture supports use cases from simple reference lookups to large-scale corpus analysis for machine learning projects.
The contribution model is equally pragmatic. Rather than complex pull request workflows, APTnotes accepts submissions via Twitter hashtags (#aptnotes), GitHub issues with a structured template, or direct contact. The issue template removes guesswork—contributors provide the report URL, vendor, title, and date. For HTML-only reports, the project recommends converting to PDF using tools like Print Friendly, ensuring consistent format across the archive. This low-friction contribution model has sustained the project for over a decade, with a growing list of contributors maintaining the corpus.
The data is known to power downstream projects like ThreatMiner, demonstrating its value as foundational infrastructure. ThreatMiner consumes the APTnotes metadata to provide searchable access to reports alongside IOCs and malware samples, showing how structured metadata transforms a static archive into a queryable intelligence platform.
Gotcha
APTnotes has architectural limitations that reflect its community-driven origins. The most significant: dependency on Box for storage. While this solved the GitHub size constraints, it creates a single point of failure. If Box discontinues their sponsorship or changes access policies, the entire corpus could become inaccessible. There’s no mention of redundant mirrors or alternative distribution—your access depends on Box’s infrastructure and APTnotes’ continued relationship with them.
The metadata is deliberately minimal. You get filename, title, source, date, and hash—nothing more. There’s no tagging by threat actor, malware family, industry vertical, or MITRE ATT&CK technique. If you need to find all reports about a specific group like APT28 or malware like TrickBot, you’re doing keyword searches through titles or downloading everything and building your own index. The project focuses on preservation, not enrichment. For structured threat intelligence with taxonomies and mappings, you’ll need to layer additional tooling on top or turn to alternatives like MITRE ATT&CK or STIX/TAXII repositories that prioritize structured data over document archival.
Verdict
Use APTnotes if you’re building threat intelligence platforms, conducting historical research on APT campaigns, need a comprehensive reference collection for security training, or want to analyze trends across vendors and years. It’s essential infrastructure for anyone who’s ever cursed a dead link in a two-year-old report citation. Use it if you value preservation over real-time feeds and can work with basic metadata to build your own analysis layer. Skip it if you need live threat feeds, structured IOCs ready for SIEM ingestion, or detailed taxonomies mapping reports to MITRE ATT&CK techniques—this is a historical archive with minimal enrichment. Skip it if you’re looking for technical malware analysis or sample repositories; APTnotes catalogs campaign reports, not binaries. For those use cases, look at Malpedia for malware samples or AlienVault OTX for real-time threat feeds with structured indicators.