Laika BOSS: The Recursive File Scanning Framework That Lockheed Martin Built for Malware Analysis
Hook
Lockheed Martin open-sourced their internal malware scanning framework in 2015, and its recursive extraction model remains one of the most elegant approaches to nested threat detection—even if the Python 2 codebase shows its age.
Context
Before modular scanning frameworks emerged, malware analysis was a manual, time-intensive process. Analysts would manually unpack archives, deobfuscate scripts, and examine files one layer at a time. When adversaries started nesting malicious payloads inside multiple container formats—a ZIP containing a RAR with an obfuscated JavaScript that downloads a PE executable—traditional signature-based scanning struggled. You needed something that could recursively peel back layers automatically.
Laika BOSS emerged from Lockheed Martin's need to scan massive volumes of files at the network perimeter and in email gateways. They needed a system that could distribute scanning workload across multiple machines, integrate with existing security infrastructure like Sendmail and Suricata, and allow security teams to add new detection modules without restarting services. The result was a broker-based architecture using ZeroMQ for distribution, with a plugin system that treats every file as an object to be recursively dissected until no more children can be extracted.
Technical Insight
At its core, Laika BOSS implements a recursive object extraction model through a dispatcher (laika.py) that coordinates scanning modules. When you submit a file, the dispatcher doesn't just scan the top-level object—it extracts every child object it can find, building a tree of nested files that each get their own module pass. An email attachment containing a ZIP file with a JavaScript file gets analyzed at three depths: the email, the archive, and the script.
The architecture separates into standalone and distributed modes. Standalone execution is straightforward—you call the dispatcher directly with a file. But the distributed mode is where the design gets interesting. It uses ZeroMQ (a lightweight messaging library) to implement a broker pattern with three components: clients (cloudscan), a broker (laikad), and workers. The broker queues scan requests and distributes them to available workers, enabling horizontal scaling:
# Simplified example of how a module processes an object
class META_PE(laikaboss.si_module):
def __init__(self):
self.module_name = "META_PE"
def _run(self, scanObject, result, depth, args):
moduleResult = []
# Parse PE file structure
try:
pe = pefile.PE(data=scanObject.buffer)
# Extract metadata about the executable
moduleResult.append(laikaboss.si_result({
'imphash': pe.get_imphash(),
'compile_time': pe.FILE_HEADER.TimeDateStamp,
'sections': pe.FILE_HEADER.NumberOfSections
}))
# Check if packed/obfuscated
if self._is_packed(pe):
scanObject.addFlag('flag:PACKED')
except Exception as e:
scanObject.addFlag('flag:CORRUPT_PE')
return moduleResult
Modules communicate through the scanObject, which carries the file buffer, metadata, and flags. Flags are the key decision-making mechanism—modules set flags like 'flag:MALICIOUS' or 'flag:PACKED' that influence disposition. A module detecting a known malware signature might set a flag that causes the email milter integration to quarantine the message.
The recursive extraction happens through a simple but powerful pattern. When a module identifies a child object (like a file inside a ZIP), it calls scanObject.addChildObject() with the extracted buffer and a source identifier. The dispatcher automatically queues these children for their own scanning passes:
# Inside a decompression module
def _run(self, scanObject, result, depth, args):
if scanObject.buffer[:2] == b'\x1f\x8b': # gzip magic bytes
try:
decompressed = gzip.decompress(scanObject.buffer)
# Queue the decompressed content as a child object
scanObject.addChildObject(
buffer=decompressed,
externalVars=laikaboss.ExternalVars(
source='gzip_decompression'
)
)
except:
scanObject.addFlag('flag:CORRUPT_GZIP')
return []
This creates a depth-first traversal of the object tree. A phishing email with a nested payload gets fully unpacked: email → ZIP attachment → JavaScript file → deobfuscated script → extracted URL. Each layer runs through the full module suite.
The real-world integration story is where Laika BOSS differentiates itself. The milter implementation connects directly to Sendmail/Postfix mail transfer agents, scanning emails in-transit. When an email arrives, the milter extracts attachments, submits them to Laika BOSS (either locally or via the ZeroMQ broker), and makes accept/reject/quarantine decisions based on returned flags. The Suricata integration (though marked as prototype) uses Redis as a queue—Suricata extracts files from network traffic, pushes them to Redis, and Laika BOSS workers pull and scan them.
Module development follows a straightforward pattern: inherit from laikaboss.si_module, implement _run(), and return results as laikaboss.si_result objects. The system handles module loading dynamically, which was crucial for Lockheed's operational needs—when a new malware family emerged, analysts could deploy a new YARA rule or detection module without service interruption. The framework handles the plumbing; you focus on the detection logic.
The metadata philosophy is verbose by design. Rather than simple pass/fail verdicts, modules generate rich contextual data: PE import hashes, JavaScript variable names, fuzzy hashes (ssdeep) for similarity matching, and extracted URLs. This supports threat hunting workflows where analysts need to correlate across scans and build detection patterns from historical data.
Gotcha
The elephant in the room is Python 2. Laika BOSS was built before Python 3 became standard, and the entire codebase reflects that legacy. Python 2 reached end-of-life in 2020, meaning no security updates, incompatible with modern libraries, and increasingly difficult to deploy on current Linux distributions. The dependency chain compounds this—older versions of pefile, python-magic, and YARA that expect Python 2 behaviors. You'll spend time in dependency hell getting this running on anything newer than Ubuntu 16.04.
The deployment story is manual and fragile. No Docker containers, no Kubernetes manifests, no modern orchestration. The documentation walks through installing dependencies one-by-one, compiling some from source, and manually configuring paths. The Suricata integration requires a forked version of Suricata that's years behind upstream, making it a non-starter for production networks. The project's last significant commit was years ago—this isn't actively maintained, and community support is minimal. If you hit issues, you're largely on your own. The 750 GitHub stars suggest interest, but the lack of recent issues or pull requests indicates limited active usage.
Verdict
Use if: You're researching malware analysis architectures and want to understand recursive extraction patterns, you have legacy Python 2 infrastructure and need a scanning framework that integrates with it, or you're building your own tool and want to study a real-world modular scanning system's design decisions. The recursive object model and module pattern are genuinely well-designed and worth learning from. Skip if: You need production-ready malware scanning (look at Strelka, Assemblyline, or CAPE instead), require Python 3 and modern dependency management, want containerized deployment, need active maintenance and security updates, or expect production-quality network traffic integration. Laika BOSS is now primarily of historical and educational interest—the architecture influenced newer tools, but you shouldn't deploy this in 2024 unless you have very specific legacy constraints.