APTnotes: The Underground Library Every Security Researcher Should Bookmark

Hook

When China's APT1 was exposed in 2013, it sparked an avalanche of public threat reporting. Today, over a decade of these reports sit scattered across vendor blogs and PDF archives—unless you know where to look.

Context

Advanced Persistent Threat groups don't operate like script kiddies. They're well-funded, patient adversaries—often nation-state actors—who compromise networks for months or years before detection. Understanding their tactics requires studying their history, but threat intelligence has a discoverability problem. Mandiant's APT1 report lives on their blog. Kaspersky's Equation Group research sits in a different PDF repository. CrowdStrike's reports use different naming conventions than FireEye. For researchers investigating whether strange network behavior matches known APT patterns, finding relevant historical reports meant maintaining personal bookmark collections or paying for expensive threat intelligence platforms.

APTnotes emerged to solve this fragmentation. Created by Kiran Bandla, it's a chronologically organized index of publicly disclosed APT campaign reports spanning 2008 to present. Rather than hosting documents directly, it maintains curated links to primary sources—vendor whitepapers, conference presentations, research blogs—organized by year. Think of it as the Internet Archive for threat intelligence, maintained through community contributions. With 3,600+ stars, it's become the de facto starting point for APT research, used by incident responders investigating breaches, threat analysts building adversary profiles, and security teams assessing whether observed TTPs match known campaign patterns.

Technical Insight

System architecture — auto-generated

APTnotes' architecture is deliberately simple: a GitHub repository with yearly directories containing markdown files that link to external documents. Each entry follows a consistent format with the threat actor name, report title, author organization, and document URL. This structure prioritizes longevity and community maintainability over technical sophistication. The real value lies in how researchers can programmatically interact with this data.

The companion repository aptnotes/data provides machine-readable formats—JSON and CSV exports of the indexed reports. This enables automated analysis that would otherwise require manual literature review. For example, you can quickly identify all reports mentioning a specific APT group across different vendor names. APT28 is also called Fancy Bear, Sofacy, Pawn Storm, and Sednit depending on the reporting organization. A simple Python script can aggregate these:

import json
import requests
from collections import defaultdict

# Fetch the JSON index from aptnotes/data
url = "https://raw.githubusercontent.com/aptnotes/data/master/APTnotes.json"
response = requests.get(url)
apt_data = response.json()

# Group reports by year and analyze coverage
yearly_coverage = defaultdict(list)
for report in apt_data:
    year = report.get('Year', 'Unknown')
    title = report.get('Title', '')
    filename = report.get('Filename', '')
    
    if 'apt28' in title.lower() or 'fancy bear' in title.lower() or 'sofacy' in title.lower():
        yearly_coverage[year].append({
            'title': title,
            'source': report.get('Source', 'Unknown'),
            'url': f"https://github.com/kbandla/aptnotes/blob/master/{year}/{filename}"
        })

# Display chronological activity patterns
for year in sorted(yearly_coverage.keys()):
    print(f"\n{year}: {len(yearly_coverage[year])} reports")
    for report in yearly_coverage[year]:
        print(f"  - {report['title'][:60]}... ({report['source']})")

This approach reveals reporting patterns: spike years often correlate with major incidents (2016 saw heavy APT28 coverage following DNC breaches), while gaps might indicate operational dormancy or detection failures. Security teams can use similar queries to research specific malware families, understand historical targeting patterns, or identify which vendor has published most extensively on threats relevant to their sector.

The repository structure also enables differential analysis. By comparing reports from different years, you can track how APT groups evolve their infrastructure and tactics. A researcher might pull all Lazarus Group reports and extract discussed domains, then identify infrastructure patterns that persist across campaigns—useful for building long-term detection rules rather than reactive IOC blocking.

Another powerful use case: building a local threat intelligence corpus for machine learning models. Natural language processing researchers have used APTnotes as training data for models that automatically extract indicators of compromise from unstructured security reports. The consistent indexing makes it straightforward to download PDFs programmatically and feed them into document processing pipelines:

import os
import time
from urllib.parse import urlparse

def download_apt_reports(year, download_dir='apt_corpus'):
    """Download all reports from a specific year for local analysis"""
    os.makedirs(download_dir, exist_ok=True)
    
    for report in apt_data:
        if report.get('Year') == year:
            url = report.get('URL', '')
            if url and 'box.com' in url:
                filename = report.get('Filename', f"report_{time.time()}.pdf")
                local_path = os.path.join(download_dir, filename)
                
                # Add appropriate download logic with rate limiting
                print(f"Downloading: {filename}")
                # requests.get(url, stream=True) with proper error handling
                time.sleep(2)  # Respect hosting service rate limits
    
    return f"Downloaded reports to {download_dir}"

The simplicity of markdown files in version control means the repository itself serves as a historical record. You can use git log to see when specific APT groups first appeared in public reporting, or track how quickly the community documented emerging threats. This temporal metadata is often more valuable than the reports themselves for understanding threat landscape evolution.

Gotcha

APTnotes has significant limitations that become apparent during operational use. The most critical: link rot. Many reports link to box.com or vendor websites that eventually break, reorganize, or require authentication. A 2018 report URL might now return a 404, especially if the security company was acquired or rebranded. The repository maintainers can't control external hosting, so availability degrades over time. Researchers often need to use the Wayback Machine or contact report authors directly to access older documents.

The second major limitation is that APTnotes is a bibliography, not a threat intelligence platform. It provides no structured data extraction, relationship mapping, or IOC databases. If you need to know "which APT groups have targeted healthcare organizations using spearphishing" or "give me all IP addresses associated with APT29," APTnotes won't answer those questions directly. You'd need to manually read through dozens of PDFs. The JSON data file includes titles and metadata, but not report contents or extracted indicators. For operational security teams needing actionable intelligence—specific malware hashes, C2 domains, YARA rules—APTnotes is a starting point that requires substantial manual effort to operationalize.

Coverage bias is another consideration. The repository only includes publicly disclosed reports, which represents a fraction of actual APT activity. Nation-state victims often don't publicly acknowledge breaches. Classified intelligence never appears. Some security vendors keep detailed research behind paywalls. This means APTnotes reflects what threat intelligence providers chose to publish, not comprehensive threat landscape reality. Certain APT groups may appear less active simply because their campaigns weren't publicly documented, while others receive disproportionate coverage due to geopolitical attention or vendor marketing priorities.

Verdict

Use if: You're conducting threat research requiring historical context on specific APT groups, building adversary profiles for red team exercises, or investigating an incident where you suspect APT involvement and need to compare observed TTPs against documented campaigns. It's invaluable for security researchers studying long-term threat evolution, intelligence analysts needing to cite primary sources, or teams without budgets for commercial threat intelligence platforms. Use it as your first stop when you encounter suspicious activity patterns and want to know "has anyone seen this before?"

Skip if: You need real-time threat feeds with machine-readable IOCs for immediate defensive action, require structured relationship mapping between threat actors and malware families, or want automated integration with SIEM/SOAR platforms. If your use case demands legal-grade attribution evidence or comprehensive coverage including non-public intelligence, you need commercial providers. APTnotes works best as a research library for human analysts, not as operational infrastructure for automated defenses. For detection engineering and incident response requiring current, actionable indicators, combine APTnotes research with platforms like MITRE ATT&CK for technique mapping and AlienVault OTX for community IOC sharing.

APTnotes: The Underground Library Every Security Researcher Should Bookmark

APTnotes: The Underground Library Every Security Researcher Should Bookmark

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

APTnotes: The Underground Library Every Security Researcher Should Bookmark

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Pi: A Coding Agent Toolkit That Treats Your Sessions as Training Data

Open Notebook: Building a Self-Hosted NotebookLM Clone with Multi-Provider AI

Open Interpreter: Running GPT-4 with Root Access to Your Machine

The Indie Hacker's AI Arbitrage Kit: Inside 50+ Generative SaaS Templates That Treat Code as Commodity

Pi: A Coding Agent Toolkit That Treats Your Sessions as Training Data

Open Notebook: Building a Self-Hosted NotebookLM Clone with Multi-Provider AI

Open Interpreter: Running GPT-4 with Root Access to Your Machine

// CODEBASE INTELLIGENCE

Best for

Skip when