Harpoon: Building a Runtime Function Hooking Library for macOS from First Principles

Hook

Function hooking on x86_64 macOS is ten times slower than on i386—not because of CPU speed, but because of how relative jumps work when you can't reach your destination in 2GB of address space.

Context

Runtime function hooking—the ability to intercept and redirect function calls in a running process—is fundamental to reverse engineering, debugging, and dynamic analysis on macOS. While high-level frameworks like Frida offer comprehensive instrumentation capabilities, they come with significant overhead and complexity. On the other end of the spectrum, Apple's own dyld symbol rebinding only works for dynamically linked functions, leaving direct function calls untouched.

Harpoon occupies the middle ground: a minimal C library that does one thing well—patching function prologues at runtime to redirect execution. Created as an educational project by jndok, it demonstrates the low-level mechanics of function interception without the abstractions that often obscure how hooking actually works. Unlike symbol rebinding approaches that modify linkage tables, Harpoon directly modifies executable code in memory, making it capable of hooking any function regardless of how it's called. This direct manipulation approach reveals both the power and the peril of runtime code modification.

Technical Insight

System architecture — auto-generated

Harpoon's architecture elegantly handles the fundamental challenge of x86_64 hooking: how do you redirect execution when your replacement function might be anywhere in a 64-bit address space, but x86_64 relative jumps can only reach ±2GB? The answer is a two-stage trampoline system using "jump zones."

On i386, hooking is straightforward. A standard JMP instruction with a 32-bit displacement can reach any address in the 4GB space, so Harpoon simply overwrites the first 5 bytes of the target function with a direct jump to the replacement:

// i386 hooking - simple direct jump
void throw_hook_i386(void *target, void *replacement, void **original) {
    uint8_t jmp[5];
    jmp[0] = 0xE9;  // JMP opcode
    int32_t offset = (int32_t)((uintptr_t)replacement - ((uintptr_t)target + 5));
    memcpy(&jmp[1], &offset, 4);
    
    // Save original bytes for later restoration
    *original = allocate_trampoline(target);
    
    // Patch the function
    vm_protect(...);  // Make memory writable
    memcpy(target, jmp, 5);
    vm_protect(...);  // Restore protection
}

On x86_64, this breaks down. A relative JMP can only reach ±2GB, but your replacement function is likely billions of bytes away in address space. Harpoon's solution uses a "jump zone"—a small region of executable memory allocated near the target function (within 2GB). The hook patches the target function with a short relative jump to this nearby zone, which then contains a full 64-bit absolute jump to the actual replacement:

// x64 hooking - two-stage jump via jump zone
void throw_hook(void *target, void *replacement, void **original) {
    // Allocate jump zone within ±2GB of target
    void *jump_zone = allocate_jump_zone_near(target);
    
    // Stage 1: Write absolute jump in jump zone
    uint8_t abs_jmp[14];
    abs_jmp[0] = 0xFF;  // JMP [RIP+0]
    abs_jmp[1] = 0x25;
    *(uint32_t*)&abs_jmp[2] = 0;  // Offset 0
    *(uint64_t*)&abs_jmp[6] = (uint64_t)replacement;
    memcpy(jump_zone, abs_jmp, 14);
    
    // Stage 2: Patch target with relative jump to zone
    uint8_t rel_jmp[5];
    rel_jmp[0] = 0xE9;
    int32_t offset = (int32_t)((uintptr_t)jump_zone - ((uintptr_t)target + 5));
    memcpy(&rel_jmp[1], &offset, 4);
    
    vm_protect(target, 5, FALSE, VM_PROT_ALL, &old_prot);
    memcpy(target, rel_jmp, 5);
    vm_protect(target, 5, FALSE, old_prot, &dummy);
}

This jump zone allocation is why x64 hooking is 10x slower than i386 in Harpoon's benchmarks (0.050s vs 0.005s). The library must search for available memory regions near the target, a process that involves querying the virtual memory map and finding suitable gaps—expensive operations that i386 avoids entirely.

The preservation of original function behavior is handled through a trampoline technique. Before patching, Harpoon copies the first several bytes of the target function (enough to cover the 5-byte patch plus any partially overwritten instructions) to a new executable page, followed by a jump back to the rest of the original function. The original pointer is set to this trampoline, allowing replacement functions to invoke the original logic:

void my_replacement_function(int arg) {
    printf("Hook called with: %d\n", arg);
    
    // Call original via preserved trampoline
    original_function(arg);
    
    printf("Hook returning\n");
}

This approach is cleaner than unhook-call-rehook cycles and maintains atomicity—no race conditions where another thread might execute the unhooked function. The trampoline stays valid for the lifetime of the hook, making it particularly elegant for multithreaded scenarios common in macOS applications.

Gotcha

Harpoon's minimalism cuts both ways. The library performs minimal validation—it doesn't verify that the target address actually contains executable code, check for sufficient instruction alignment, or handle position-independent code edge cases. If you point it at the middle of a function or at data masquerading as code, you'll corrupt memory and crash. There's no sophisticated instruction disassembly to ensure the 5-byte patch doesn't split multi-byte instructions, which can happen with prefixed or SSE instructions at function entry.

The x64 jump zone allocation can fail in address spaces with unusual memory layouts, particularly in processes with extensive dynamic library loading or custom memory management. When vm_allocate can't find a suitable region within ±2GB of the target, Harpoon has no fallback strategy—the hook simply fails. More robust libraries handle this with instruction relocation or longer instruction sequences, but these approaches require a full disassembler. The author's explicit warning that this is a "just for fun" project isn't false modesty; it's a genuine caveat about production reliability. You'll also find no thread safety guarantees, no hook chaining if multiple libraries try to hook the same function, and no mechanism to handle functions shorter than 5 bytes (rare but possible in optimized code).

Verdict

Use if: You're learning reverse engineering on macOS and want to understand how function hooking works at the instruction level without framework abstractions obscuring the details, you need a lightweight solution for personal research projects or CTF challenges where simplicity trumps robustness, or you're prototyping hooking strategies before implementing production-grade solutions. Skip if: You need production-ready instrumentation with error handling and edge case coverage (use Frida instead), you're working on security-critical applications where hook stability and reliability matter, you need to hook extensively in complex applications with unpredictable memory layouts, or you want active maintenance and community support. Harpoon is a teaching tool first and a utility second—embrace it for learning, but don't bet your production code on it.

Harpoon: Building a Runtime Function Hooking Library for macOS from First Principles

Harpoon: Building a Runtime Function Hooking Library for macOS from First Principles

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Harpoon: Building a Runtime Function Hooking Library for macOS from First Principles

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]