Gepetto: Teaching IDA Pro to Explain Malware With Large Language Models
Hook
What if your disassembler could explain what obfuscated malware does in seconds, instead of hours? A Python plugin is bringing large language models directly into IDA Pro’s decompiler, and professional security researchers are already using it to analyze threats.
Context
Reverse engineering is a war of attrition. You’re staring at decompiled C code with variable names like v7 and a3, trying to reconstruct what the original developer intended—or in the case of malware analysis, what an adversary is trying to hide. Even with tools like IDA Pro’s HexRays decompiler, understanding a complex function can take hours of mental overhead: tracing data flows, identifying common patterns, inferring semantics from assembly instructions.
Gepetto emerged from this friction. Created by JusticeRage and backed by security firms like Kaspersky and HarfangLab, it’s a Python plugin that integrates large language models directly into IDA Pro’s workflow. Instead of manually annotating every function, you right-click and ask an LLM to explain it, suggest better variable names, or add inline comments. The plugin supports approximately 35+ models across 10 providers—from OpenAI’s GPT-4o and o3 to local Ollama instances—making it one of the most flexible AI-assisted reverse engineering tools available.
Technical Insight
Gepetto’s architecture is deceptively simple: extract decompiled pseudocode from IDA, send it to an LLM API with a specialized prompt, then parse the response back into IDA’s interface. The magic lies in how it integrates with HexRays and supports multiple model providers through a common abstraction.
The plugin hooks into IDA’s context menu. When you right-click a function and select “Explain function,” Gepetto extracts the pseudocode, wraps it in a prompt like “Explain what this function does,” and sends it to whichever model you’ve configured. The response appears in IDA’s output window, and for variable renaming, the plugin parses the LLM’s suggestions and applies them directly to the decompiled view.
Here’s what the configuration looks like for adding multiple providers:
[Gepetto]
LANGUAGE = "en_US"
[OpenAI]
API_KEY = "sk-your-key-here"
MODEL = "gpt-4o"
[Ollama]
BASE_URL = "http://localhost:11434"
MODEL = "llama3.1"
[Gemini]
API_KEY = "your-gemini-key"
MODEL = "gemini-2.5-flash"
The plugin’s model abstraction lives in the gepetto/models directory, where each provider implements a common interface. For instance, the OpenAI integration uses the official openai Python package, while the Ollama integration makes direct HTTP requests to the local server. This modularity means adding a new provider is as simple as creating a new Python file that implements send_query() and list_available_models().
One fascinating feature: Gepetto includes a CLI interface accessible from IDA’s command line. Select “Gepetto” from the input bar dropdown, and you can ask arbitrary questions about the code you’re analyzing: “Is this function vulnerable to buffer overflows?” or “What cryptographic algorithm is this implementing?” The LLM receives the current function’s pseudocode as context, turning IDA into an interactive analysis assistant.
The plugin also implements workflow optimizations based on empirical testing. The README explicitly notes that “asking for better names works better if you ask for an explanation of the function first.” This insight reveals an understanding of how LLMs build context: by first generating an explanation, the model creates a semantic understanding it can reference when suggesting variable names. Instead of renaming v7 to something generic like counter, it might suggest decryption_round_index after understanding the function performs AES decryption.
Hotkey support makes the workflow seamless: Ctrl+Alt+G explains the current function, Ctrl+Alt+K adds comments, and Ctrl+Alt+R renames variables. For malware analysts who spend entire days in IDA, these shortcuts eliminate the friction of switching between the decompiler and external AI tools.
The localization infrastructure is another technical highlight. Gepetto uses GNU gettext for internationalization, with a gepetto/locales directory containing .po files for different languages. You can set LANGUAGE = "fr_FR" in the config to get French-language explanations—critical for non-English-speaking reverse engineers who might struggle with technical English terminology in LLM outputs.
Gotcha
The most significant limitation is the hard dependency on IDA Pro version 7.6 or higher with the HexRays decompiler. This isn’t hobbyist software—IDA Pro requires a commercial license. If you’re using Ghidra, Binary Ninja, or Radare2, Gepetto won’t help you. The plugin is laser-focused on IDA’s ecosystem, with no plans for cross-platform support.
LLM API costs can accumulate surprisingly fast when analyzing binaries with hundreds of functions. OpenAI and other commercial providers charge per token, so deep analysis sessions can incur meaningful costs. Local models via Ollama or LM Studio avoid API costs but may sacrifice quality—smaller models can hallucinate incorrect explanations or suggest nonsensical variable names.
The inherent randomness of LLM outputs is another challenge. Run the same command twice, and you’ll get different explanations with varying levels of accuracy. The plugin includes no validation layer to verify the LLM’s claims. If the model confidently explains that a function performs RSA encryption when it’s actually AES, you need the expertise to catch that mistake. Gepetto is an assistant, not an oracle—blindly trusting its output in a security-critical context is dangerous.
Verdict
Use Gepetto if you’re a professional reverse engineer or malware analyst with an IDA Pro license who regularly encounters unfamiliar or obfuscated code. The plugin excels at accelerating initial comprehension: even imperfect AI explanations provide starting points that save hours of manual analysis, and the variable renaming feature can transform inscrutable pseudocode into readable logic. If your workflow already involves IDA and you’re comfortable critically evaluating LLM outputs, Gepetto is a force multiplier. Skip it if you’re using free tools like Ghidra (no support), working with well-documented or familiar codebases where naming conventions are already clear, or if you’re uncomfortable with AI-generated analysis that occasionally hallucinates. Also skip if the IDA Pro license cost is prohibitive—there’s no free tier or alternative for hobbyists. For professional threat researchers analyzing APTs or ransomware where speed matters and false leads are acceptable, Gepetto is transformative. For everyone else, the cost-benefit math doesn’t work.