Back to Articles

Evilarc: The 200-Line Python Script That Breaks Archive Extraction Everywhere

[ View on GitHub ]

Evilarc: The 200-Line Python Script That Breaks Archive Extraction Everywhere

Hook

A ZIP file is just a list of filenames and compressed data—and if those filenames say "../../../etc/passwd", most extraction libraries will happily oblige. Evilarc weaponizes this trust.

Context

Archive files have been a staple of software distribution and data compression since the 1980s, but their security model relies on a dangerous assumption: that the creator of an archive is trustworthy. When you extract a ZIP or TAR file, you expect files to land in a subdirectory relative to your current location. But archives don't enforce this—they simply contain paths as strings, and it's up to the extraction tool to validate those paths.

The vulnerability class known as "Zip Slip" or directory traversal in archive extraction has plagued virtually every major programming ecosystem. In 2018, Snyk researchers found over 2,500 affected projects across Java, JavaScript, .NET, Go, and Ruby. The attack is elegantly simple: craft an archive where file paths include "../" sequences to escape the intended extraction directory. When vulnerable code extracts "../../../../tmp/malicious.sh", it writes to /tmp instead of the working directory. Evilarc, created by Patrick Toomey, automates the creation of these malicious archives for security testing, turning what could be a tedious manual process into a single command-line invocation.

Technical Insight

Path Construction

depth, target, OS type

file content

crafted path string

../ × depth

target directory

arcname parameter

ZIP/TAR format

CLI Arguments Parser

Path Manipulation Engine

Archive Creator

Input Payload File

Malicious Archive

Traversal Prefix

Target Path

Combined Evil Path

System architecture — auto-generated

Evilarc's architecture is deceptively straightforward—it's essentially a wrapper around Python's zipfile and tarfile standard libraries with one crucial modification: it deliberately corrupts the path normalization that safe archive creation would enforce. The tool intercepts the filename parameter during archive creation and injects directory traversal sequences without sanitization.

Here's how you'd create a malicious archive that attempts to write a file to /tmp:

python evilarc.py -o unix -d 5 -p /tmp/ malicious.txt

This command generates an archive where malicious.txt has a path of "../../../../../tmp/malicious.txt". The "-d 5" parameter specifies traversal depth (how many "../" sequences to prepend), and "-p /tmp/" sets the target path. The "-o unix" flag optimizes for Unix-style paths versus Windows.

Under the hood, evilarc manipulates the archive's central directory entries. In a ZIP file, each file entry contains a filename field—just a string with no inherent path validation. The Python code looks something like this conceptually:

import zipfile

def create_evil_zip(filename, payload_path, depth, target_dir):
    traversal = '../' * depth
    evil_path = traversal + target_dir.lstrip('/') + filename
    
    with zipfile.ZipFile('evil.zip', 'w') as zf:
        zf.write(payload_path, arcname=evil_path)

The critical insight is that arcname accepts any string—Python's zipfile library doesn't validate paths when writing archives, only when extracting (and even extraction validation was added relatively recently). Evilarc exploits this asymmetry: archive creation is permissive, but extraction implementations vary wildly in their security posture.

The TAR format works identically. TAR headers contain a 100-byte filename field (or longer with extensions), and there's no specification requirement that filenames be relative or safe. Evilarc can target both formats because the underlying vulnerability is in the extraction logic, not the archive format itself.

What makes this particularly insidious is how it interacts with common extraction patterns. Consider this vulnerable PHP code that exists in countless applications:

$zip = new ZipArchive;
$zip->open('upload.zip');
$zip->extractTo('/var/www/uploads/');

If upload.zip was created by evilarc with a target of "/var/www/html/shell.php", the extractTo call will happily write outside the intended uploads directory. The PHP ZipArchive class (prior to security patches) performs no path validation. Similar vulnerabilities existed in Java's java.util.zip, Python's zipfile (before 3.x security updates), and virtually every language's standard library.

Evilarc also supports symlink attacks for TAR files. By creating an archive containing a symlink that points outside the extraction directory, followed by files that write through that symlink, you can achieve arbitrary file writes even on systems that validate paths shallowly. The tool's "-s" flag enables this mode:

python evilarc.py -o unix -f tar -s link=/etc/cron.d payload.txt

This creates a TAR archive with a symlink named "link" pointing to /etc/cron.d, followed by a file that writes through that symlink—a two-stage attack that bypasses basic "does this path contain ../" checks.

Gotcha

Evilarc's biggest limitation is that it's a one-trick pony solving only half the problem. It generates malicious archives brilliantly, but testing whether your own application is vulnerable requires you to actually extract those archives and verify the file writes. There's no scanning capability, no automated testing harness, and no integration with CI/CD pipelines. You're left manually inspecting filesystem state after extraction, which doesn't scale for thorough security testing.

More importantly, the tool's effectiveness has diminished significantly since its creation. Modern library versions across most languages now include path sanitization by default. Python's zipfile module added protection in 3.x, Java's zip handling was hardened after widespread exploits, and frameworks like Spring and Django now warn developers about unsafe extraction patterns. If you're testing a contemporary application with up-to-date dependencies, evilarc will likely generate archives that simply fail to extract or trigger security exceptions. The tool remains valuable for legacy systems, custom extraction implementations, and educational purposes—but it's no longer the universal skeleton key it once was. You'll also find that some edge cases aren't handled: Unicode path exploitation, extremely long filenames that cause buffer issues, and format-specific quirks in implementations like Java's JAR handling require manual archive crafting beyond what evilarc provides.

Verdict

Use if: You're conducting penetration tests against web applications that accept file uploads and extract archives server-side, performing security audits on legacy codebases with custom extraction logic, creating CTF challenges or security training materials that demonstrate directory traversal concepts, or researching vulnerability patterns across different language ecosystems. Evilarc excels at quickly generating test cases to validate whether extraction code performs proper path sanitization. Skip if: You need defensive security tooling to protect your own applications (this is purely offensive), you're working exclusively with modern, patched libraries where zip slip vulnerabilities have been addressed at the framework level, or you require automated vulnerability scanning rather than manual exploit generation. For defensive work, focus on static analysis tools that detect unsafe extraction patterns in your codebase, or use dependency scanners that flag vulnerable library versions.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/ptoomey3-evilarc.svg)](https://starlog.is/api/badge-click/cybersecurity/ptoomey3-evilarc)