Inside mlattacks: A Red Team’s Guide to Breaking Machine Learning Systems
Hook
Did you know you can backdoor a machine learning model by modifying a pickle file, or that opening a Jupyter notebook could execute arbitrary code? The mlattacks repository demonstrates both—and according to the research, they’re easier than you think.
Context
Machine learning systems are deployed everywhere—from content moderation to fraud detection—but most security teams treat them as black boxes. Traditional penetration testing frameworks don’t cover adversarial perturbations or model extraction attacks. When wunderwuzzi23 created this repository around 2020-2022, the gap between ML research and practical red team knowledge was enormous. Academic papers discussed theoretical attacks using mathematical notation, while security practitioners had no hands-on resources to threat model their production ML pipelines.
The mlattacks series bridges this divide by demonstrating real attacks against “Husky AI,” a companion image classification system. Rather than abstract explanations, you get executable Jupyter notebooks showing how to brute-force predictions, craft adversarial examples that fool models with perturbations, steal model weights, inject backdoors into serialized files, and even exploit the development toolchain itself. The repository culminates in CVE-2020-16977, a remote code execution vulnerability in VS Code’s Python extension discovered during the attack series—proof that ML security isn’t just about algorithms, but the entire operational pipeline.
Technical Insight
The repository organizes attacks across three layers of the ML stack: inference attacks, model tampering, and supply chain compromise. Each layer demonstrates progressively sophisticated techniques with working code.
Inference attacks start simple. The brute-force approach iterates through images, sending them to Husky AI’s classification endpoint until finding misclassifications. The “smart fuzzing” approach appears to apply transformations like rotation, scaling, and color shifts to systematically explore the model’s decision boundaries. The image scaling attack exploits preprocessing vulnerabilities: you craft an image that looks innocuous at one resolution but triggers different behavior after the server resizes it.
For more sophisticated attacks, the repository integrates IBM’s Adversarial Robustness Toolbox (ART) and Azure Counterfit. These frameworks generate optimized perturbations using gradient-based methods. Instead of random pixel changes, they calculate which pixels to modify to maximize classification error while remaining imperceptible to humans.
The model tampering attacks target serialized model files. The repository demonstrates how attackers with file system access can inject malicious code into pickle files. Python’s pickle format allows arbitrary code execution during deserialization, enabling attackers to modify saved model files to include payloads that execute when the model loads.
The supply chain layer attacks target the development environment itself. CVE-2020-16977 showed that VS Code’s Python extension would execute code embedded in Jupyter notebook metadata when rendering the file. An attacker could commit a poisoned notebook to a repository, and any developer opening it would trigger remote code execution—no user interaction beyond opening the file. This attack vector is particularly insidious for ML teams sharing notebooks via Git.
The repository also applies STRIDE threat modeling methodology specifically to ML systems, mapping out attack surfaces from data ingestion through model deployment. It covers repudiation threats—the risk that attackers modify models or data without leaving audit trails—and demonstrates logging defenses to detect unauthorized access. Beyond image classification attacks, the series also explores using Generative Adversarial Networks (GANs) to create synthetic images and includes work on bypassing malware detection models through binary signing.
Gotcha
The repository is a time capsule from 2020-2022 with no recent maintenance (75 stars, last major updates appear to be from that period). Dependency versions are likely outdated—expect compatibility issues with current TensorFlow, PyTorch, or scikit-learn releases. The Adversarial Robustness Toolbox has evolved significantly since these notebooks were written, and some API calls may be deprecated. You’ll need to fork and modernize dependencies before running examples.
While the repository covers significant ground—from image attacks to GANs to malware evasion—the primary focus is on image classification via Husky AI. If you’re securing NLP models, tabular data systems, or reinforcement learning agents, you’ll need to translate concepts yourself. The code quality is demonstrative rather than production-grade—notebooks are meant for learning, not as penetration testing tools you’d deploy in real engagements. CVE-2020-16977 has been patched for years, so the VS Code exploit is primarily of historical interest unless you’re targeting unpatched legacy environments.
Verdict
Use mlattacks if you’re building threat models for ML systems and need concrete attack examples beyond theoretical papers, conducting security training for data science teams who don’t understand adversarial risks, or researching ML security with a red team mindset. It’s valuable for understanding the full attack surface—from gradient-based perturbations to supply chain compromise—in one comprehensive series. Skip it if you need current, production-ready adversarial ML tools (go directly to Adversarial Robustness Toolbox or Counterfit instead), want automated security scanning rather than manual attack demonstrations, or need actively maintained code with modern dependencies. The educational value remains high for understanding ML attack vectors, but operational use requires significant modernization work.