Back to Articles

PANDA: Recording Nine Billion Instructions in a Few Hundred Megabytes

[ View on GitHub ]

PANDA: Recording Nine Billion Instructions in a Few Hundred Megabytes

Hook

A complete FreeBSD boot sequence—nine billion x86 instructions executing across kernel space, userland, and device drivers—fits in a few hundred megabytes. That recording plays back deterministically, instruction-for-instruction, as many times as you need.

Context

Debugging a memory corruption bug is hard. Debugging one that only appears after 20 minutes of runtime, across kernel and userspace boundaries, is nearly impossible with traditional debuggers. Malware analysts face similar challenges: samples that exhibit behavior only after complex environmental checks, or only on specific architectures. You can't step backwards in gdb. You can't share your exact execution state with a colleague. You can't run the same analysis across ARM, MIPS, and x86 without rewriting your instrumentation.

PANDA (Platform for Architecture-Neutral Dynamic Analysis) emerged from MIT Lincoln Laboratory's research into whole-system analysis. It extends QEMU—the versatile whole-system emulator—with two foundational capabilities: deterministic record and replay, and a plugin architecture for deep runtime analysis. The record/replay system captures non-deterministic inputs (interrupts, DMA, user input) during execution, creating compact logs that reproduce the exact same execution later. The plugin framework lets you write analysis passes that hook into execution at multiple granularities, from individual instructions to system calls. Combined with LLVM translation, these analyses work identically across QEMU's 13 supported CPU architectures.

Technical Insight

CPU Instructions

Translates to

IR Blocks

Logs external events

Injects events during replay

Optionally translates

Architecture-neutral analysis

Execution events

Triggers

Analysis results

Guest OS Execution

QEMU Emulator Core

TCG Code Generator

Record/Replay Engine

Non-deterministic Event Log

LLVM IR Translation

Plugin Framework

Callback Hooks

System architecture — auto-generated

PANDA's architecture stacks three key layers: QEMU's core emulation, a record/replay engine that intercepts non-determinism, and a plugin framework with optional LLVM IR translation for architecture-neutral analysis.

The record/replay mechanism works by recognizing that emulated execution is deterministic except for external inputs. During recording, PANDA logs all non-deterministic events: interrupts, DMA transfers, keyboard input, timer values, anything that comes from outside the CPU. The log stores these events with instruction counts indicating when they occurred. During replay, PANDA runs in pure emulation mode but injects the logged events at precisely the right instruction counts. The result is bit-for-bit identical execution, including all timing-dependent behavior. This approach produces remarkably compact logs—only the non-deterministic deltas need storage, not the entire execution state.

Plugins hook into PANDA's execution through callbacks at various granularities. Here's a minimal plugin that counts system calls on a Linux x86 guest:

#include "panda/plugin.h"

bool init_plugin(void *self) {
    panda_cb pcb = { .before_block_exec = on_block };
    panda_register_callback(self, PANDA_CB_BEFORE_BLOCK_EXEC, pcb);
    return true;
}

void on_block(CPUState *cpu, TranslationBlock *tb) {
    CPUArchState *env = (CPUArchState*)cpu->env_ptr;
    target_ulong pc = panda_current_pc(cpu);
    
    // x86 syscalls use int 0x80 or syscall instruction
    if (tb->size == 2 && /* check for syscall pattern */) {
        uint32_t syscall_num = env->regs[R_EAX];
        printf("syscall %d at pc 0x%lx\n", syscall_num, pc);
    }
}

For architecture-neutral analysis, PANDA translates QEMU's Tiny Code Generator (TCG) intermediate representation to LLVM IR. This happens at basic block granularity: when QEMU translates a block of guest instructions to TCG ops, PANDA can further translate those ops to LLVM. Analyses written against LLVM IR work identically whether the guest is ARM, MIPS, x86, or any other architecture QEMU supports.

The dynamic taint analysis plugin demonstrates this power. It operates entirely on LLVM IR, tracking how data flows through registers, memory, and LLVM temporaries without any architecture-specific code. When you taint network input on an ARM guest, the same analysis logic that tracks data flow through x86 registers works without modification:

from pandare import Panda

panda = Panda(generic="arm")

@panda.queue_blocking
def run_analysis():
    # Start recording
    panda.record_cmd("wget http://example.com/malware", 
                     recording_name="network_capture")
    
@panda.cb_replay_net_transfer
def on_network(cpustate, kind, buf, size):
    if kind == panda.net_transfer_type.NET_TRANSFER_READ:
        # Taint all network input
        for i in range(size):
            panda.taint_label_ram(buf + i, i)

panda.run()

PyPANDA, the Python interface, wraps PANDA's C API with Pythonic ergonomics. It handles PANDA installation, guest image management, and provides decorators for callback registration. The @panda.queue_blocking decorator is particularly clever: it runs the function in a separate thread, allowing you to script sequential interactions (boot VM, run command, take snapshot) while PANDA's execution loop runs in the main thread. This inversion of control makes complex analysis workflows readable.

Plugins can communicate through a callback-based pub/sub system called PPP (PANDA Plugin-Plugin interface). A plugin defines a header declaring functions it provides, and other plugins call those functions. The OS-awareness plugins (OSI) use this extensively: osi_linux understands Linux kernel structures and exposes functions like get_current_process(), which other plugins consume without reimplementing process parsing. This creates an ecosystem of composable analysis building blocks.

Gotcha

PANDA's replay recordings are not portable between 32-bit and 64-bit builds. The replay log format contains raw pointers from the PANDA process itself (not guest pointers), meaning a recording made with 64-bit PANDA cannot replay on 32-bit PANDA and vice versa. The documentation strongly recommends always using 64-bit builds, but this still bites when sharing recordings with collaborators who might have different build configurations. There's no technical reason this couldn't be fixed with a pointer-width-agnostic log format, but it remains a practical limitation.

The LLVM dependency locks you to LLVM 14. While this provides a stable target for writing analyses, it means you cannot easily upgrade to newer LLVM versions with improved optimizations or features. Projects with existing LLVM infrastructure at different versions face integration friction. The S2E-derived LLVM translation layer is complex and updating it across major LLVM versions requires deep expertise in both QEMU's TCG and LLVM's IR evolution.

PANDA lags behind upstream QEMU, typically by several major versions. This means missing out on newer CPU features, performance improvements, and security fixes that land in mainline QEMU. The team rebases periodically, but the extensive modifications make this a substantial engineering effort. If you need to emulate cutting-edge hardware or depend on recent QEMU features, you'll find PANDA's base is a few years behind.

Verdict

Use PANDA if you need repeatable, shareable whole-system analysis across multiple architectures—malware reverse engineering, vulnerability research, or any scenario where deterministic replay is valuable. The compact recording format makes it excellent for collaborative research: capture once, analyze many times, share recordings instead of trying to reproduce environmental conditions. The Python interface and plugin ecosystem provide quick wins for common analyses like system call tracing or taint tracking. Skip PANDA if you need cutting-edge QEMU features or hardware support (the lag behind upstream will frustrate you), if your analysis targets only userspace on a single architecture (rr or Intel PIN offer lower overhead), or if you're building production monitoring systems (the emulation overhead makes this impractical for live systems). PANDA excels at deep, iterative forensic analysis where reproducibility trumps raw performance.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/panda-re-panda.svg)](https://starlog.is/api/badge-click/developer-tools/panda-re-panda)