Mining Secrets from Spring Boot Memory: Inside pyhprof’s HPROF Parser
Hook
Spring Boot’s heapdump actuator endpoint was designed for debugging. Instead, it’s become a goldmine for attackers—exposing every password, API key, and HTTP request your application has touched, sitting unencrypted in memory.
Context
When Spring Boot applications expose the /actuator/heapdump endpoint without proper authentication, they hand over a complete snapshot of the JVM heap—a binary HPROF file containing everything in memory at that moment. While tools like Eclipse Memory Analyzer Tool (MAT) and VisualVM have long parsed these files for performance debugging, they require manual hunting through object graphs to find sensitive data. Security researchers needed something different: automated extraction of the secrets that leak into heap dumps.
The problem is more common than you’d think. Environment variables containing database passwords, OAuth tokens embedded in configuration objects, Authorization headers from recent HTTP requests—all of this lives in memory during normal application execution. Traditional heap analyzers make you query for specific classes, traverse object references, and manually inspect string values. For penetration testers discovering an exposed actuator endpoint or incident responders analyzing a compromised system, this manual process wastes precious time. pyhprof was built to solve this specific problem: parse the HPROF binary format, locate primitive array dumps containing configuration data, and automatically extract secrets using pattern matching.
Technical Insight
At its core, pyhprof tackles the undocumented variations in how Spring Framework writes PRIMITIVE ARRAY DUMP records within HPROF files. The HPROF format itself is versioned and documented, but the library’s key insight is that Spring’s heap dumps contain two distinct encoding patterns for primitive arrays—variations that don’t correlate cleanly with HPROF version numbers. The ReferenceBuilder class handles this by implementing two parsing modes, with automatic fallback when Type 1 parsing fails.
Here’s how the library approaches extraction. When you instantiate the parser and call parse(), it walks through heap segments looking for specific tags:
from pyhprof import parser
# Initialize parser with file path
hp = parser.HprofParser('/tmp/heapdump.hprof')
# Parse with Type 1 format (default)
hp.parse()
# Access extracted environment variables
for key, value in hp.env_vars.items():
print(f"{key}: {value}")
# Access configuration properties
for key, value in hp.config_props.items():
print(f"{key}: {value}")
# Extract HTTP request/response pairs
for req_id, request in hp.requests.items():
print(f"Request {req_id}:")
print(f" Method: {request['method']}")
print(f" URL: {request['url']}")
print(f" Headers: {request['headers']}")
The parsing logic relies on positional heuristics within the heap structure. When the parser finds a PRIMITIVE ARRAY DUMP containing a ‘PATH’ key, it knows to look two blocks ahead for environment variable data. This works because Spring’s PropertySource implementations maintain predictable memory layouts during serialization. The fragility here is intentional—pyhprof optimizes for the common case of Spring Boot 2.x and 3.x heap dumps rather than attempting to handle every possible HPROF variant.
The spring_heapdumper.py example script demonstrates the security-focused workflow. It wraps the parser with TruffleHog-derived regex patterns to automatically flag sensitive data:
import re
from pyhprof import parser
# Patterns adapted from TruffleHog
SECRET_PATTERNS = {
'AWS_KEY': re.compile(r'AKIA[0-9A-Z]{16}'),
'GENERIC_SECRET': re.compile(r'[sS][eE][cC][rR][eE][tT].*[=:]\s*[\'"\`]?([^\s\'"\`]+)'),
'AUTHORIZATION': re.compile(r'authorization:\s*bearer\s+([a-zA-Z0-9\-_]+)', re.IGNORECASE)
}
def find_secrets(heap_data):
findings = []
for name, pattern in SECRET_PATTERNS.items():
for key, value in heap_data.items():
matches = pattern.findall(str(value))
if matches:
findings.append({
'type': name,
'key': key,
'match': matches[0]
})
return findings
The Type 2 parsing mode exists because certain Spring Boot versions (particularly when using embedded Tomcat with specific configurations) write primitive arrays with different byte alignment. When you encounter a heap dump that fails with Type 1, passing ptype=2 to the constructor switches the offset calculations. The library doesn’t automatically detect which type to use upfront because that would require a full pre-parse pass—instead, it catches parsing exceptions and prompts you to retry with the alternate mode.
Memory management is pyhprof’s Achilles heel. The parser loads the entire HPROF file into memory, builds object reference maps, and only then begins extraction. For a 2GB heap dump, expect to allocate 4-6GB of RAM during parsing. The architecture prioritizes simplicity over efficiency: single-pass processing with in-memory state beats the complexity of streaming parsers or on-disk indexes. This design choice makes sense for the target use case—security assessments where you’re analyzing a handful of heap dumps on a capable workstation, not production monitoring of thousands of dumps.
Gotcha
The memory requirements will bite you faster than you expect. A 4GB heap dump from a production Spring Boot application will crash pyhprof on a machine with 8GB RAM. The parser provides no progress indication—it processes silently until completion or out-of-memory death. You can’t stream results; you wait for full parsing, then access the extracted dictionaries. For large dumps, you’ll need to either run on a beefy machine or fall back to Eclipse MAT with its disk-based indexing.
The positional parsing logic is brilliant for its specific target but fragile everywhere else. The assumption that environment variables appear ‘two blocks past PATH’ works for standard Spring Boot configurations, but custom PropertySource implementations or heavily customized application contexts can break this heuristic. The library doesn’t validate that it’s actually parsing what it thinks it’s parsing—if the memory layout shifts, you get silent failures or partial extraction. There’s no schema validation, no integrity checks. You’re flying blind unless you manually verify the output against known secrets.
Verdict
Use if: You’re conducting security assessments on Spring Boot applications with exposed actuator endpoints, performing incident response on compromised Java applications where you need rapid secret extraction, or automating heapdump analysis in CTF or bug bounty scenarios. This tool is purpose-built for the security use case and does that job better than general-purpose profilers. Skip if: You need to analyze heap dumps larger than a few gigabytes without dedicated high-memory infrastructure, you’re working with non-Spring Java applications (the heuristics won’t apply), you need production-grade reliability with comprehensive error handling, or you’re doing performance analysis rather than security forensics. For those scenarios, stick with Eclipse MAT or VisualVM—they’re slower for secret extraction but far more robust.