Back to Articles

LLM OSINT: When AI Becomes a Digital Private Investigator

[ View on GitHub ]

LLM OSINT: When AI Becomes a Digital Private Investigator

Hook

Give GPT-4 just a name, and it can build a psychological profile, draft a resume, and identify personal details about someone in minutes—all from publicly available data. This isn't science fiction; it's a 268-star GitHub repo that made The Wall Street Journal nervous.

Context

Open-source intelligence (OSINT) has traditionally been a manual, tedious process. Security researchers, journalists, and investigators spend hours combing through social media profiles, corporate records, news articles, and forum posts to piece together information about a target. Tools like Maltego and theHarvester have automated parts of this workflow, but they still require human reasoning to connect dots, disambiguate results, and synthesize findings into actionable intelligence.

LLM OSINT represents a paradigm shift: what if we gave language models the ability to autonomously navigate this process? The repository demonstrates how GPT-4's reasoning capabilities can orchestrate the entire OSINT pipeline—from formulating search queries to extracting relevant information, cross-referencing sources, and generating comprehensive reports. Created as a proof-of-concept to highlight privacy and security implications, the project caught mainstream attention when The Wall Street Journal featured it as an example of how AI could enable sophisticated phishing and social engineering attacks. The author's explicit warnings about ethical use underscore a critical question: as LLMs gain agency over information gathering, how do we balance their utility against their potential for abuse?

Technical Insight

The architecture of LLM OSINT is deceptively simple but conceptually profound. Rather than hard-coding specific data sources or scraping patterns, it uses the LLM itself as the orchestration layer. The system operates through iterative prompting, where GPT-4 decides what information to search for, evaluates the relevance of findings, and determines next steps—essentially implementing a reasoning loop over web data.

At its core, the system follows a multi-stage pipeline. First, the LLM generates search queries based on the initial target information. For someone named "John Smith," it might start with broad searches but quickly narrow based on context clues—location mentions, company affiliations, or timeline markers found in initial results. The LLM then analyzes search results to identify the most promising sources, effectively performing triage on potentially hundreds of links.

Here's a simplified example of how the prompting chain might work:

# Pseudo-code representation of LLM OSINT's reasoning loop
target = "Jane Doe, software engineer"
context = {}

for iteration in range(max_iterations):
    # LLM decides what information is still needed
    prompt = f"""
    Target: {target}
    Current information: {context}
    
    What specific information should we search for next?
    Generate a search query and explain why.
    """
    
    search_decision = llm.complete(prompt)
    results = web_scraper.search(search_decision.query)
    
    # LLM extracts relevant information from results
    extraction_prompt = f"""
    Search results: {results}
    Target: {target}
    
    Extract any relevant personal, professional, or behavioral 
    information. Ignore irrelevant results.
    """
    
    new_info = llm.complete(extraction_prompt)
    context.update(new_info)
    
    # LLM determines if enough information has been gathered
    if llm.should_terminate(context, target):
        break

# Final synthesis
report = llm.generate_report(context, target)

What makes this approach powerful is the LLM's ability to disambiguate and cross-reference. When encountering multiple "Jane Doe" profiles, GPT-4 can identify connecting details—maybe a LinkedIn profile mentions a specific university that matches a conference speaker bio, or a Twitter handle appears in a GitHub commit history. The model performs fuzzy matching and temporal reasoning that traditional regex-based scrapers simply cannot.

The web scraping component likely uses libraries like BeautifulSoup or Playwright to handle different site structures, but the critical insight is that the LLM adapts the extraction strategy per-site. Instead of writing custom parsers for LinkedIn, Twitter, and personal blogs, the system provides raw HTML or cleaned text to the LLM, which extracts salient information using natural language understanding. For JavaScript-heavy sites, browser automation becomes necessary, but again, the LLM decides what interactions to perform.

The synthesis phase is where emergent capabilities shine. Given fragments of information from different sources and time periods, GPT-4 can construct narrative timelines, infer psychological traits from communication patterns, and even generate hypothesis-driven search queries ("If this person worked at Google in 2015, they might have attended Google I/O—search for conference photos"). This goes far beyond keyword matching into genuine investigative reasoning.

One technical detail that significantly impacts accuracy: the system must handle the LLM's context window constraints. With GPT-4's 8K or 32K token limits, the orchestrator can't simply dump all scraped content into a single prompt. The implementation likely uses summarization passes—the LLM reads full articles, extracts key facts, and stores compressed representations. This introduces a trade-off: compression loses nuance but enables broader information gathering across more sources.

Gotcha

The most glaring limitation is accuracy and hallucination risk. LLMs are notorious for confidently presenting plausible-sounding but false information, and this problem compounds in OSINT scenarios. When GPT-4 encounters ambiguous or contradictory data—say, two different job histories for similar names—it might merge them into a single profile or fabricate details to resolve inconsistencies. There's no built-in fact-checking mechanism beyond the LLM's training, and the system lacks ground truth validation. A security researcher relying on this for threat intelligence could end up investigating the wrong person entirely, or worse, making decisions based on AI-generated fiction masquerading as discovered facts.

The ethical and legal minefield cannot be overstated. While the data is technically public, automated mass collection may violate terms of service for platforms like LinkedIn or Facebook. More seriously, many jurisdictions have privacy laws that restrict how personal data can be collected and processed, even if it's publicly available. The repository provides no consent mechanisms, no opt-out capabilities, and no audit trails—features that would be mandatory for any legitimate security or research application. The project also requires GPT-4 API access, which means OpenAI's usage policies apply, and using the API for surveillance or profiling likely violates those terms. Cost is another practical constraint; a thorough OSINT investigation could easily consume thousands of tokens across multiple API calls, making this an expensive approach compared to traditional tools.

Verdict

Use if: You're a security researcher studying AI-enabled threat vectors, need to demonstrate OSINT risks to privacy-conscious organizations, or are building defensive tools to detect when someone is being profiled this way. This code is educational gold for understanding how LLM reasoning can be chained with external data sources, and it's a legitimate tool for red team exercises where you need to show executives just how much an attacker could learn. Also valuable for AI safety researchers exploring the boundaries of autonomous agent capabilities. Skip if: You're looking for a production OSINT tool (use Maltego or SpiderFoot instead), need legally defensible investigation capabilities, or want reliable, auditable results. Skip entirely if you're tempted to profile individuals without explicit consent—the ethical violations aren't worth it, and you're likely breaking laws. Also skip if you need offline capabilities or want to avoid vendor lock-in to OpenAI's APIs. This is a proof-of-concept that raises important questions, not a solution you should deploy.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/llm-engineering/sshh12-llm-osint.svg)](https://starlog.is/api/badge-click/llm-engineering/sshh12-llm-osint)