STRIDE GPT: Why Threat Modeling with LLMs Actually Works
Hook
A security tool with 1,000 GitHub stars that stores zero data and supports multiple LLM providers isn’t trying to be a platform—it’s solving a prompt engineering problem disguised as a threat modeling tool.
Context
Threat modeling is one of those security practices everyone agrees is important but few teams do consistently. The traditional approach—drawing data flow diagrams, manually enumerating STRIDE threats, building attack trees—requires specialized expertise and hours of focused work. OWASP Threat Dragon gives you diagramming but you’re still manually identifying threats. The rise of large language models promised to change this, but most security teams lack the prompt engineering expertise to get useful output from ChatGPT without extensive trial and error.
STRIDE GPT emerged as a specialized layer over LLMs that codifies threat modeling expertise into prompts. It’s built on a simple premise: if you can translate the STRIDE methodology (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege) into structured prompts, you can generate threat models that give experienced security engineers a significant head start rather than starting from a blank page. The tool has evolved beyond basic STRIDE to incorporate OWASP Top 10 for LLMs and the new OWASP ASI framework for agentic AI applications, making it particularly relevant as organizations grapple with securing GenAI systems. With support for multi-modal inputs including architecture diagrams and GitHub repository analysis, it’s positioned as a productivity multiplier for teams that already understand threat modeling but want to spend less time on the mechanical parts.
Technical Insight
STRIDE GPT’s architecture is deceptively straightforward: it’s a Streamlit web application that constructs sophisticated prompts, sends them to your chosen LLM provider, and formats the responses. The interesting part isn’t the web framework—it’s the prompt engineering that maps security methodologies into LLM-consumable formats. The application detects whether you’re modeling a traditional application, a GenAI system, or an agentic AI application, then dynamically adjusts its prompts to include relevant threat categories.
When you select “Agentic AI application,” STRIDE GPT doesn’t just add boilerplate LLM threats. It incorporates architectural pattern detection inspired by the Cloud Security Alliance’s MAESTRO framework. Describe a system with “RAG pipeline using vector database for retrieval,” and the LLM identifies it as a RAG/Retrieval System pattern, applying specific threats like vector store poisoning, embedding manipulation, and cross-tenant leakage. Mention “multiple specialized agents with tool access,” and it detects a Multi-Agent System pattern, flagging agent collision risks, inter-agent trust boundary violations, and emergent behavior threats. This pattern matching happens within the prompt itself—there’s no separate classification model. The prompt instructs the LLM to analyze the application description for these patterns and apply pattern-specific threats from the OWASP ASI framework (ASI01 through ASI10) alongside traditional STRIDE categories.
The multi-modal capability reveals the prompt construction strategy. When you upload an architecture diagram, STRIDE GPT doesn’t parse it programmatically. Instead, it sends the image to vision-capable models with prompts that ask the LLM to describe the architecture, identify data flows, and then perform threat modeling based on what it sees. This works because modern vision models can extract architectural information from diagrams—they recognize load balancers, database symbols, cloud service icons, and trust boundaries. The Markdown output you download is exactly what the LLM generated, with minimal post-processing.
The GitHub repository analysis feature demonstrates another prompt engineering pattern. When you provide a GitHub URL, the application doesn’t clone the entire repository or perform static analysis. Instead, it fetches the README file using the GitHub API and optionally analyzes the repository structure (file types, directory organization). Here’s the simplified flow:
# Conceptual example of how GitHub analysis works
def analyze_github_repo(repo_url, github_token=None):
# Extract owner and repo from URL
owner, repo = parse_github_url(repo_url)
# Fetch README using GitHub API
readme_content = fetch_readme(owner, repo, github_token)
# Optionally fetch repository structure
repo_structure = fetch_repo_structure(owner, repo, github_token)
# Construct prompt with README and structure
prompt = f"""
Based on this GitHub repository README:
{readme_content}
Repository structure:
{repo_structure}
Generate a concise application description suitable for threat modeling.
Identify: application type, key technologies, authentication methods,
data sensitivity, internet exposure, and architectural components.
"""
# Send to LLM to generate application description
app_description = call_llm(prompt)
return app_description
This generated description then becomes the input for the actual threat modeling prompts. It’s a two-stage LLM process: first, summarize the application from its documentation; second, threat model the summary. This reduces token consumption compared to sending entire codebases while still providing context-aware analysis.
The LLM provider abstraction is implemented through a unified interface that normalizes API calls across OpenAI, Anthropic, Google, Mistral, Groq, Ollama, and LM Studio. Each provider has different API schemas, but STRIDE GPT presents a consistent interface to the Streamlit UI. For organizations concerned about sending application details to third-party APIs, the Ollama and LM Studio support enables fully local deployment—you can run Llama 3 or Mistral models on your own infrastructure and STRIDE GPT will interact with them identically to how it calls OpenAI’s API.
The DREAD scoring feature adds quantitative risk assessment by prompting the LLM to rate each identified threat on five dimensions: Damage potential (how bad could this be?), Reproducibility (how easy to recreate?), Exploitability (how much skill/resources needed?), Affected users (how many impacted?), and Discoverability (how easy to find?). Each dimension gets a 1-10 score, and the average becomes the threat’s DREAD score. This isn’t sophisticated risk calculation—it’s the LLM’s interpretation of DREAD applied to the threats it generated—but it provides a starting point for prioritization that teams can refine based on their actual risk appetite.
The Gherkin test case generation is perhaps the most practically useful feature for development teams. After generating threats and mitigations, you can ask STRIDE GPT to create Gherkin-formatted test scenarios that verify your mitigations are implemented correctly. These Given-When-Then scenarios translate security requirements into testable acceptance criteria that QA teams and security testing tools can consume. A threat like “SQL injection via user input” becomes a Gherkin scenario that specifies malicious input, expected behavior (parameterized queries), and verification steps. This bridges the gap between threat modeling output and actual security testing.
Gotcha
The fundamental limitation is one the README doesn’t hide: “No data storage; application details are not saved.” This sounds like a privacy feature—and it is—but it also means STRIDE GPT has zero memory. Every threat modeling session is independent. You can’t build a threat library specific to your organization’s infrastructure, track how threats evolve as your application changes, or compare threat models across different versions of your architecture. If you’re threat modeling a microservices ecosystem with 20 services, you’re running 20 separate sessions with no relationship between them. There’s no concept of organizational knowledge accumulation beyond manually downloading Markdown files and storing them in Confluence.
The output quality variance is significant and LLM-dependent. With current advanced models (OpenAI’s GPT-5.2 series, Anthropic’s Claude 4.5 series, Google’s Gemini 3), you get comprehensive threat models that often identify non-obvious threats. With smaller models through Groq or local Ollama deployments, the output becomes generic—you’ll get “implement input validation” as a mitigation for every injection threat without specifics about parameterized queries, output encoding, or context-specific controls. The tool provides no validation layer. If the LLM hallucinates a threat that doesn’t apply to your architecture, or misses an entire attack surface, STRIDE GPT won’t catch it. You need a human security expert reviewing the output, which means this is a productivity tool for people who already know threat modeling, not a substitute for that expertise.
The roadmap mentions “customizable and exportable reports (e.g. PDF, Word)” but this feature doesn’t exist yet. You get Markdown files. For many teams, this is fine—Markdown converts easily to other formats. But if you need to generate a polished threat model report for compliance purposes, with your organization’s branding, risk matrices, and approval workflows, you’re going to write custom tooling around STRIDE GPT’s output or manually format everything. The enterprise deployment guide talks about forking the repository to inject organizational security controls, which is the right approach technically but means you’re maintaining a fork rather than configuring a tool. Every update to the main repository requires merge consideration.
Verdict
Use STRIDE GPT if you’re a security engineer or architect who already understands threat modeling and wants to significantly reduce initial draft time. It’s particularly valuable if you’re securing GenAI or agentic AI applications where the OWASP LLM/ASI integration provides frameworks that are otherwise hard to apply consistently. The multi-modal support is genuinely useful—being able to upload an architecture diagram and get a threat model is faster than describing everything in text. Organizations with on-premise requirements should appreciate the Ollama/LM Studio support for local deployment without cloud API dependencies. Skip STRIDE GPT if you need a validated, compliance-ready threat modeling platform with audit trails, approval workflows, and guaranteed coverage of your organizational security standards. This is prompt engineering packaged as an application—extremely useful for bootstrapping threat models, but it requires expert review and doesn’t replace the judgment that comes from understanding your specific threat landscape. If your team doesn’t have security expertise to validate LLM output, a more structured tool with built-in threat libraries will force better discipline even if it takes longer.