Back to Articles

Humanify: Using LLMs to Deobfuscate JavaScript Without Hallucination Risks

[ View on GitHub ]

Humanify: Using LLMs to Deobfuscate JavaScript Without Hallucination Risks

Hook

Most developers don't realize that feeding obfuscated code directly to ChatGPT for deobfuscation is dangerous—the LLM might hallucinate logic changes while making the code 'readable.' Humanify solves this by letting AI only suggest names, never touch logic.

Context

Anyone who's tried to debug a production JavaScript error knows the pain: your stack trace points to a.b.c() in a minified bundle, and you're left guessing what actually broke. Worse still, some code isn't just minified—it's deliberately obfuscated with techniques like identifier mangling, dead code injection, and control flow flattening to prevent reverse engineering.

Traditional deobfuscation tools use pattern matching and statistical analysis. JSNice pioneered using machine learning trained on open-source code to predict variable names, but it's limited to patterns it's seen before. The recent explosion of LLMs opened new possibilities: these models understand semantic context far better than statistical approaches. But there's a catch—if you paste obfuscated code into ChatGPT and ask it to 'clean this up,' you're trusting the model to perfectly recreate logic it's inferring. One hallucination could silently change your code's behavior. Humanify's creator recognized this risk and built a hybrid architecture that gets the best of both worlds.

Technical Insight

Humanify's architecture is deceptively elegant: it uses Babel to parse JavaScript into an Abstract Syntax Tree, identifies candidates for renaming, asks an LLM for semantic name suggestions, then applies those renames via AST transformations. The critical insight is the separation of concerns—the LLM never touches code structure, only suggests strings.

Here's what happens under the hood when you run Humanify on obfuscated code:

// Input: obfuscated code
function a(b,c){const d=b+c;return d*2}

// Humanify parses to AST, identifies 'a', 'b', 'c', 'd' as rename candidates
// For each identifier, it extracts surrounding context:
// "function a(b,c){const d=b+c;return d*2}"

// Sends to LLM with prompt:
// "This function takes two parameters and returns their sum multiplied by 2.
// Suggest a semantic name for identifier 'a':"

// LLM responds: "doubleSum"

// Babel applies transformation:
function doubleSum(firstNumber, secondNumber) {
  const sum = firstNumber + secondNumber;
  return sum * 2;
}

The tool's V2 rewrite moved from Python to pure TypeScript, which matters more than it sounds. The original version required Python dependencies and a Node-Python bridge, creating installation friction. The TypeScript rewrite uses Babel's parser and transform pipeline natively, making the tool a simple npx command. This architectural decision also improved maintainability—the codebase now uses the same AST manipulation libraries that power modern JavaScript tooling.

Humanify integrates Webcrack for webpack unbundling before deobfuscation. This is crucial because webpack bundles wrap code in runtime loaders and module systems that obscure the original structure. By unbundling first, Humanify can work with the actual application logic rather than webpack's scaffolding. The pipeline looks like: Webcrack unbundle → Babel parse → LLM rename suggestions → Babel transform → Output.

The LLM interaction is stateless and parallelizable. Each variable rename is an independent API call with local context (the function or scope containing that variable). This means Humanify can batch requests and isn't trying to maintain conversation state or ask the LLM to 'remember' earlier parts of the file. It also means failures are isolated—if the LLM times out on one variable, others still get renamed.

For local mode, Humanify uses llama.cpp through a TypeScript wrapper, with Apple Silicon GPU acceleration via Metal. The prompt engineering is minimal by design—it provides the code context and asks for a name, avoiding complex chain-of-thought prompting that might work inconsistently across different LLM providers. This prompt simplicity is why the tool works with OpenAI, Gemini, and local models without provider-specific code paths.

The cost model is predictable because it's proportional to code size, not complexity. Humanify estimates ~2 tokens per character, so you can calculate costs before running: a 10KB file ≈ 20K tokens ≈ $0.20 with GPT-4o. Compare this to interactive debugging where you'd manually trace through obfuscated code for hours. The tool also provides --dry-run to show exactly what would be sent to the LLM before incurring costs.

Gotcha

The biggest limitation is cost at scale. The repository's example shows bootstrap.min.js costing around $0.50 to process—that's one file. If you're trying to deobfuscate an entire application with dozens of bundled files, you could easily hit $20-50 in API costs. There's no incremental mode or caching, so re-running on slightly modified code pays the full cost again. For large-scale reverse engineering projects, the local mode becomes necessary, but it's significantly less accurate than cloud LLMs.

Local mode performance is also polarizing. On an M1 Max with GPU acceleration, it's reasonably fast. On a CPU-only Linux server, it can be painfully slow—the README doesn't specify model sizes, but llama.cpp models suitable for code understanding are typically 7B+ parameters, meaning 4-8GB downloads and slow inference without GPUs. The accuracy gap between local and cloud models is also substantial; GPT-4 understands semantic context that smaller models miss, leading to generic names like value or data instead of domain-specific identifiers.

Another edge case: highly polymorphic code or framework-specific patterns can confuse the LLM. If your obfuscated code uses a framework like React with JSX transforms, or Vue with template compilation, the LLM sees the compiled output and suggests names based on that, not the original intent. You might get createElementFunction instead of renderUserProfile because the context the LLM sees is already abstracted away from the domain logic.

Verdict

Use if: You're reverse engineering obfuscated JavaScript where understanding intent matters more than raw speed, you have budget for API costs or hardware for local models, and you need safer deobfuscation than asking ChatGPT directly. Humanify shines for security research, understanding third-party scripts, or debugging production code where source maps are missing. The hybrid architecture means you can trust the output is functionally equivalent to the input. Skip if: You're processing huge codebases where token costs become prohibitive (consider Webcrack alone for structural deobfuscation), you need real-time deobfuscation in a CI pipeline (the LLM latency is too high), or you're working with code that's only minified not obfuscated (Prettier plus source map reconstruction is faster and free). Also skip if you can't use external APIs for sensitive code and don't have GPU hardware for acceptable local mode performance.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/jehna-humanify.svg)](https://starlog.is/api/badge-click/developer-tools/jehna-humanify)