Parsing .NET's BinaryFormatter in Python: Cross-Platform Forensics for a Dangerous Serialization Format

Hook

Microsoft spent years warning developers not to use BinaryFormatter, then deprecated it entirely in .NET 5. So why would you need a Python library to parse it? Because legacy systems don't disappear overnight, and attackers know exactly where to find them.

Context

For nearly two decades, .NET's BinaryFormatter was the default choice for serializing objects to disk or network streams. It was fast, handled complex object graphs with circular references, and came built into the framework. Developers used it everywhere—session state, remoting, cache systems, inter-process communication. The problem? BinaryFormatter is inherently insecure. The deserialization process instantiates types and invokes methods based on data in the serialized stream, creating a massive attack surface for remote code execution vulnerabilities. By the mid-2010s, security researchers had weaponized these flaws with tools like ysoserial.net, generating malicious payloads that could compromise systems simply by being deserialized.

Microsoft's response was unequivocal: stop using BinaryFormatter. They deprecated it in .NET 5, made it throw exceptions by default in .NET 6, and recommended JSON, protobuf, or other safer formats. But millions of lines of legacy code remain in production, and attackers still target these systems. Security researchers, incident responders, and forensic analysts need to examine captured BinaryFormatter streams to understand what's being transmitted, whether payloads are malicious, and how systems were compromised. The problem? These analysts often work in Python-based toolchains—think Volatility for memory forensics, Scapy for network analysis, or custom SIEM integrations. They can't easily spin up a .NET runtime just to parse a suspicious byte stream. That's the gap NetBinaryFormatterParser fills: it implements the MS-NRBF (Microsoft .NET Remoting: Binary Format) protocol specification entirely in Python, allowing cross-platform inspection of .NET serialized data without touching the .NET ecosystem.

Technical Insight

System architecture — auto-generated

NetBinaryFormatterParser works by implementing a state machine that processes the MS-NRBF protocol's record types sequentially. Every BinaryFormatter stream starts with a SerializationHeaderRecord, followed by a series of typed records that describe objects, their members, references between objects, and metadata about the types and assemblies involved. The parser reads these records one by one, maintaining a dictionary of object IDs to handle forward and backward references within the object graph.

The architecture is straightforward: the main NetBinaryFormatterParser class takes a byte stream and exposes methods to parse individual records based on their type identifier. Each record type—ClassWithId, MemberPrimitiveTyped, BinaryObjectString, ArraySinglePrimitive, and dozens of others—has its own parsing logic that extracts the relevant fields according to the specification. Here's how you'd use it to parse a simple serialized object:

from NetBinaryFormatterParser import NetBinaryFormatterParser

# Read a captured BinaryFormatter stream
with open('suspicious_payload.bin', 'rb') as f:
    data = f.read()

parser = NetBinaryFormatterParser(data)
records = parser.parse()

for record in records:
    print(f"Record Type: {record['RecordTypeEnum']}")
    if record['RecordTypeEnum'] == 'ClassWithMembersAndTypes':
        print(f"  Class: {record['ClassInfo']['Name']}")
        print(f"  Assembly: {record['LibraryId']}")
        print(f"  Members: {record['MemberTypeInfo']}")
    elif record['RecordTypeEnum'] == 'BinaryObjectString':
        print(f"  String Value: {record['Value']}")

This example illustrates the parser's output structure: each record is a dictionary containing the record type and its associated data. For ClassWithMembersAndTypes records, you get the full type name, the assembly it came from, and metadata about each member field. This is invaluable when analyzing malicious payloads—you can immediately see if dangerous types like System.Windows.Data.ObjectDataProvider or System.Configuration.Install.AssemblyInstaller appear in the stream, which are common gadgets in deserialization attacks.

The parser handles one of BinaryFormatter's most complex features: object references. .NET serialization assigns each object a unique ID, and subsequent references to that object use a MemberReference record containing just the ID rather than serializing the entire object again. This enables circular references and keeps stream sizes manageable. NetBinaryFormatterParser maintains an internal lookup table as it processes records, so when it encounters a MemberReference with ID 7, it can retrieve the previously parsed object. This is critical for reconstructing the full object graph—without it, you'd only see fragmented data.

The implementation also handles primitive types, arrays, and enums correctly, respecting the type information embedded in the stream. BinaryFormatter doesn't just serialize data; it serializes complete type information including assembly names, versions, and public key tokens. This metadata is crucial for security analysis because it reveals exactly which libraries and versions the serialized objects depend on, potentially exposing known vulnerabilities in those specific versions.

One particularly useful feature for security research is the ability to extract strings and type names without fully deserializing objects. Since BinaryFormatter streams are self-describing, you can scan through records looking for suspicious patterns—SQL injection strings, PowerShell commands, file paths, or evidence of known exploit frameworks—without risking execution of malicious code:

def extract_suspicious_strings(data):
    parser = NetBinaryFormatterParser(data)
    records = parser.parse()
    
    suspicious_keywords = ['powershell', 'cmd.exe', 'System.Diagnostics.Process']
    findings = []
    
    for record in records:
        if record.get('RecordTypeEnum') == 'BinaryObjectString':
            value = record.get('Value', '').lower()
            if any(keyword in value for keyword in suspicious_keywords):
                findings.append(value)
        elif 'ClassInfo' in record:
            class_name = record['ClassInfo'].get('Name', '')
            if 'ObjectDataProvider' in class_name:
                findings.append(f"Dangerous type: {class_name}")
    
    return findings

This pattern-matching approach is exactly how analysts triage captured network traffic or memory dumps. You can process hundreds of serialized streams quickly, flagging those that deserve deeper investigation without the risk and overhead of actually deserializing them in a .NET environment.

Gotcha

The elephant in the room is that BinaryFormatter is deprecated and Microsoft has made it increasingly difficult to use in modern .NET. This means NetBinaryFormatterParser's relevance is inherently time-limited. If you're analyzing systems built in the last few years or working with .NET 5+, you're unlikely to encounter BinaryFormatter in production unless someone explicitly re-enabled it (which itself is a security red flag). The tool's primary value is for legacy system analysis, incident response on older applications, and academic security research.

Documentation is sparse. The repository includes the specification link, but there are minimal usage examples or API documentation. You'll need to read the source code to understand how to work with the parsed output structure, which fields are available for each record type, and how to navigate object references. This isn't a dealbreaker for security researchers who are comfortable reading code, but it raises the barrier to entry for casual users. Additionally, the parser is read-only—it can't serialize Python objects into BinaryFormatter format. This makes sense given the security implications, but it means you can't use it for legitimate data exchange scenarios where you need bidirectional conversion between Python and .NET. Error handling could also be more robust; malformed or truncated streams may cause exceptions rather than graceful degradation, which can be problematic when analyzing corrupted data from memory dumps or incomplete network captures.

Verdict

Use if: You're conducting security research on .NET applications, performing forensic analysis on legacy enterprise systems, analyzing network captures from environments still running .NET Framework 4.x or earlier, or building detection rules for SIEM systems that need to identify malicious BinaryFormatter payloads. This tool is essential for any Python-based security toolkit that encounters .NET serialization. Skip if: You're working with modern .NET 5+ applications that use JSON/protobuf/MessagePack, you need to serialize Python objects for legitimate .NET interop (use pythonnet or switch to a standard format), you require production-grade reliability with comprehensive error handling, or you're looking for a general-purpose .NET analysis tool (dnfile or dnlib are better for assembly inspection). For the specific niche of parsing legacy .NET serialization streams from Python—particularly in security contexts—NetBinaryFormatterParser is invaluable. Just remember that its continued relevance depends on how slowly the last BinaryFormatter systems die.

Parsing .NET's BinaryFormatter in Python: Cross-Platform Forensics for a Dangerous Serialization Format

Parsing .NET's BinaryFormatter in Python: Cross-Platform Forensics for a Dangerous Serialization Format

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

Parsing .NET's BinaryFormatter in Python: Cross-Platform Forensics for a Dangerous Serialization Format

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Pi: A Coding Agent Toolkit That Treats Your Sessions as Training Data

Open Notebook: Building a Self-Hosted NotebookLM Clone with Multi-Provider AI

Open Interpreter: Running GPT-4 with Root Access to Your Machine

The Indie Hacker's AI Arbitrage Kit: Inside 50+ Generative SaaS Templates That Treat Code as Commodity

Pi: A Coding Agent Toolkit That Treats Your Sessions as Training Data

Open Notebook: Building a Self-Hosted NotebookLM Clone with Multi-Provider AI

Open Interpreter: Running GPT-4 with Root Access to Your Machine

// CODEBASE INTELLIGENCE

Best for

Skip when