> your AI agent picks dependencies from memory; give it dated facts — try starlog.dev ↗ vet your agent's deps ↗ vibe-coding is fine. vibe-importing isn’t. — try starlog.dev ↗ vibe-importing isn’t fine ↗ your agent has never seen your private packages — try starlog.dev ↗ facts for private packages ↗ a linter for the dependencies your AI agent picks — try starlog.dev ↗ a linter for agent deps ↗

Back to Articles

HEARTH: How AI Workflows Transform Threat Intelligence Into Searchable Hunt Hypotheses

[ View on GitHub ]

HEARTH: How AI Workflows Transform Threat Intelligence Into Searchable Hunt Hypotheses

Hook

What if contributing to a threat hunting knowledge base required nothing more than pasting a URL into a GitHub issue? HEARTH's AI pipeline does the rest—extracting intelligence, validating MITRE techniques, checking for duplicates, and generating pull requests automatically.

Context

Threat hunting remains one of cybersecurity's most knowledge-intensive disciplines. Hunters need to constantly consume threat intelligence reports, translate technical indicators into hypotheses, and organize findings in ways that teammates can discover and reuse. Traditionally, this means manually reading dozens of vendor reports, extracting relevant TTPs, mapping them to MITRE ATT&CK, and documenting hunt procedures in wikis or shared documents that inevitably become stale or siloed.

The problem compounds across teams. A security analyst at Company A might spend hours crafting a hypothesis around Cobalt Strike's process injection techniques, while an analyst at Company B duplicates that exact work the following week. Knowledge sharing happens through conferences, blog posts, or Twitter threads—valuable but ephemeral and unsearchable. HEARTH reimagines this workflow by treating threat intelligence URLs as the input and structured, categorized hunt hypotheses as the output, with AI handling the translation layer. It's GitHub-native, community-driven, and designed around the PEAK framework's three hunting methodologies: Hypothesis-Driven (Flames), Baselining (Embers), and Model-Assisted (Alchemy).

Technical Insight

HEARTH's architecture reveals a sophisticated marriage of static content, vector search, and GitHub Actions orchestration. At its core, hunt hypotheses live as Markdown files with YAML frontmatter in category-specific directories. Here's what a Flame (hypothesis-driven hunt) looks like:

---
title: "Detect Cobalt Strike Named Pipe Patterns"
mitre_attack:
  - T1055.001
  - T1090.001
confidence: high
cti_source: "https://example.com/cobalt-strike-analysis"
---

## Hypothesis
Adversaries using Cobalt Strike beacons create named pipes with characteristic patterns (\msagent_*, \MSSE-*, \postex_*) for SMB communications.

## Data Requirements
- Windows Security Event ID 4656 (Handle to an Object was Requested)
- Sysmon Event ID 17/18 (Pipe Created/Connected)

## Hunt Procedure
1. Query pipe creation events for regex patterns matching known CS defaults
2. Correlate with parent process anomalies (rundll32, regsvr32)
3. Investigate any matches in non-standard user contexts

The magic happens when you submit a CTI URL via GitHub issue. The cti-submission.yml workflow triggers, invoking a Python script that orchestrates multiple AI calls. First, it fetches the URL content and sends it to Claude or OpenAI with a structured prompt:

def extract_cti_content(url, content):
    prompt = f"""
    Analyze this threat intelligence report and extract:
    - Adversary TTPs with specific technical details
    - MITRE ATT&CK technique IDs (verify these are valid)
    - Observable behaviors suitable for threat hunting
    
    Report URL: {url}
    Content: {content[:8000]}  # Truncate for token limits
    
    Return JSON with: ttps, mitre_techniques, key_behaviors
    """
    response = openai.ChatCompletion.create(
        model="gpt-4-turbo",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3  # Lower temp for factual extraction
    )
    return json.loads(response.choices[0].message.content)

The extracted MITRE technique IDs then pass through a validation layer that checks against a locally cached version of the ATT&CK framework (691 techniques for Enterprise). This prevents hallucinated technique IDs—a common problem when LLMs generate ATT&CK references. The validator uses fuzzy matching for confidence scoring:

def validate_mitre_techniques(technique_ids):
    validated = []
    for tid in technique_ids:
        # Exact match
        if tid in ATTACK_FRAMEWORK:
            validated.append({"id": tid, "confidence": "high"})
        # Check sub-technique format
        elif re.match(r'T\d{4}\.\d{3}', tid):
            parent = tid.split('.')[0]
            if parent in ATTACK_FRAMEWORK:
                validated.append({"id": tid, "confidence": "medium"})
        # Fuzzy name match as fallback
        else:
            matches = fuzzy_match_technique_name(tid)
            if matches:
                validated.append({"id": matches[0], "confidence": "low"})
    return validated

Duplicate detection leverages SQLite with the sqlite-vec extension for vector similarity search. When a new hypothesis generates, the system creates embeddings of the title and description, then queries for semantic duplicates:

import sqlite3
import sqlite_vec

def check_duplicates(title, description, threshold=0.85):
    conn = sqlite3.connect('hearth.db')
    conn.enable_load_extension(True)
    sqlite_vec.load(conn)
    
    # Generate embedding for new content
    new_embedding = get_embedding(f"{title} {description}")
    
    # Vector similarity search
    results = conn.execute("""
        SELECT id, title, vec_distance_cosine(embedding, ?) as similarity
        FROM hunts
        WHERE similarity > ?
        ORDER BY similarity DESC
        LIMIT 5
    """, (new_embedding, threshold)).fetchall()
    
    return results

This vector approach delivers 30-60x faster similarity checks compared to naive pairwise comparisons, crucial as the knowledge base scales to hundreds of hunts. The SQLite database serves purely as an index—Markdown files remain the source of truth, preserving git's auditability.

Once validation and duplicate checks pass, the workflow generates a pull request with the new hunt file(s), tagged with appropriate PEAK categories. Maintainers review for semantic accuracy (did the AI correctly interpret the CTI?) before merging. The frontend, a static GitHub Pages site, uses vanilla JavaScript to fetch the hunt database as JSON and provide client-side filtering by MITRE technique, confidence level, or PEAK category—no backend required.

Gotcha

HEARTH's reliance on external AI APIs creates both cost and availability constraints. Processing a single CTI submission through Claude or GPT-4 can cost $0.50-$2 depending on article length, and API rate limits may throttle submissions during high activity. There's no offline mode—the entire contribution pipeline requires internet connectivity and valid API keys. Teams concerned about data sovereignty should note that CTI content passes through third-party LLM providers, though HEARTH doesn't submit proprietary detections or internal telemetry.

The quality control model assumes human review remains the final gate. While AI handles extraction and formatting impressively well, it can misinterpret nuanced threat intelligence or suggest hunts that sound plausible but miss critical context. For example, an AI might extract a MITRE technique correctly but generate a hunt hypothesis that requires data sources most organizations don't collect. The PEAK categorization also depends on AI judgment—a hunt might be tagged as Hypothesis-Driven when it's actually better suited for Baselining. Maintainer review catches these issues, but community scale depends on volunteer bandwidth. Additionally, the repository's JavaScript language classification is misleading GitHub metadata—this is fundamentally a Python automation project with minimal frontend code. Developers expecting a JavaScript-heavy application will be surprised to find the value lies in workflows and content curation, not client-side engineering.

Verdict

Use HEARTH if you're building or scaling a threat hunting program and need a structured library of research-backed hypotheses to jumpstart investigations, or if you regularly consume threat intelligence and want to contribute back to the community without heavy manual documentation work. The AI-powered CTI pipeline genuinely lowers the barrier to knowledge sharing, and the PEAK framework provides useful categorization for different hunting methodologies. It's particularly valuable for teams that lack dedicated threat intelligence analysts but have hunters who consume vendor reports. Skip HEARTH if you need production-ready detection rules that deploy directly to SIEM or EDR platforms—these are research hypotheses requiring adaptation to your environment and tooling. Also skip if you require offline operation, have strict data handling policies that prohibit using commercial AI APIs, or want to build custom knowledge management workflows from scratch. HEARTH shines as a collaborative research library and learning resource, not an operational detection platform. Treat it as a threat hunting cookbook: great for inspiration and methodology, but you'll still need to cook your own meals.