LLM OSINT: When GPT-4 Becomes a Digital Detective
Hook
A Python script fed nothing but a name correctly inferred someone’s Myers-Briggs personality type, compiled a psychological profile with accurate strengths and weaknesses, and generated a resume—all from publicly available internet data. Welcome to AI-powered reconnaissance.
Context
Open-source intelligence gathering has traditionally been a manual, tedious process. Security researchers, journalists, and investigators spend hours piecing together information from scattered sources—social media profiles, public records, forum posts, professional networks. Traditional OSINT tools like Maltego or SpiderFoot automate collection, but humans still synthesize meaning from the data.
LLM OSINT represents a proof-of-concept that combines large language models with internet information gathering to perform tasks with collected data. As noted in The Wall Street Journal piece “Generative AI Could Revolutionize Email—for Hackers,” this approach demonstrates what becomes possible when AI systems can both gather and reason about publicly available information. The project shows how LLMs can move beyond simple data collection toward automated understanding and inference.
Technical Insight
LLM OSINT demonstrates the concept of using language models to gather internet information and perform tasks with that data, though the README does not detail its internal implementation. The repository provides working examples rather than architectural documentation.
The person_lookup.py example illustrates the system’s capabilities through its command-line interface:
python examples/person_lookup.py "Shrivu Shankar" --ask "Write their top 3 most likely myers-briggs types"
This single command, with no additional information provided, produces detailed analysis. When asked for Myers-Briggs types, the system identified INTJ with high confidence, noting “passion for coding, research, and problem-solving” and “strategic, innovative, and goal-oriented” characteristics—an assessment the repository author confirms as accurate.
The psychological report example demonstrates even more sophisticated inference. The system identified strengths like “Curiosity” and “Self-Motivated” while inferring potential weaknesses like “Limited Work-Life Balance” and “Potential Overcommitment” based on patterns of achievements rather than explicit statements. This shows the LLM’s capability to make inferences beyond simple information retrieval.
The resume generation example shows the system can merge information from different time periods and sources into structured output formats. The system appears capable of synthesizing data from multiple public sources—LinkedIn, GitHub, Twitter, Instagram, and personal websites—though the specific mechanisms for search, scraping, and synthesis are not documented in the README.
What makes this implementation notable is its proof-of-concept demonstration that LLMs can serve dual roles: both gathering information from the internet and reasoning about that information to answer complex questions. The examples show the system handling disambiguation (“For common names, disambiguation can be done like ‘John Smith (the Texas Musician)’”), suggesting some level of query refinement capability.
The technical approach likely involves prompting strategies and web access mechanisms, but the README focuses on demonstrating capabilities through examples rather than explaining implementation. Developers interested in building similar systems would need to examine the /examples directory code for implementation details.
Gotcha
The most significant limitation is ethical rather than technical. The repository’s own privacy warning acknowledges the tool is “spooky good at gathering information from publicly available sources” and emphasizes that “personal information uncovered through open-source intelligence remains personal and should be treated with respect and protection.” The warning explicitly states users should be “cognizant of each person’s right to privacy” when researching individuals other than themselves.
This creates an inherent tension: the repository demonstrates concerning capabilities while making them publicly accessible to anyone with Python skills and API access. The examples intentionally use only the repository author’s own information, establishing self-research as the ethical baseline.
From a technical perspective, this is a proof-of-concept rather than a production tool. As demonstration code, it may lack robustness features expected in production systems—error handling for blocked websites, rate limiting for API costs, or reliability patterns for handling contradictory information. The README does not document implementation details, so specific technical limitations cannot be verified, but users should expect proof-of-concept code to require hardening for serious use.
The repository has 264 stars, indicating community interest, but the README provides no information about maintenance status, update frequency, or compatibility with current LLM APIs. Users implementing similar functionality might find more robust solutions in modern agent frameworks that have formalized patterns for tool use and web access.
Verdict
Use LLM OSINT if you’re a security researcher studying AI and privacy implications, a developer learning about LLM-based information gathering through working examples, or an educator demonstrating dual-use AI capabilities. The repository’s value is primarily educational—it proves that combining LLMs with internet access enables automated research and inference from public data. Skip if you need production-ready OSINT tools (established platforms like Maltego or SpiderFoot offer better reliability), lack clear ethical justification for automated personal information gathering, or want modern agent framework features. Most critically, the repository’s own guidance is clear: use this tool on yourself, not others, without explicit consent. The technical capability exists, but as the README emphasizes, “personal information uncovered through open-source intelligence remains personal and should be treated with respect and protection.” This repository’s greatest contribution is demonstrating both the power and the responsibility that comes with giving LLMs autonomous access to information.