Back to Articles

Building a Cosmic Ray Detector from malloc(): When Your RAM Becomes a Physics Instrument

[ View on GitHub ]

Building a Cosmic Ray Detector from malloc(): When Your RAM Becomes a Physics Instrument

Hook

Right now, as you read this, cosmic rays from supernovae thousands of light-years away are silently flipping bits in your computer's memory. With less than 100 lines of C code, you can watch it happen in real-time.

Context

The gap between software abstraction and physical reality is rarely more dramatic than in memory management. We write int x = 0 and trust that x will remain zero until we explicitly change it. But that trust is misplaced. High-energy particles from space—cosmic rays—constantly bombard Earth, and when they strike silicon, they can flip bits from 0 to 1 or vice versa. This isn't theoretical physics; it's a documented cause of real-world failures, from Belgian election irregularities in 2003 to Toyota's unintended acceleration investigations.

For decades, this problem was relegated to mission-critical systems: spacecraft computers, financial servers, and scientific instruments. These systems use Error-Correcting Code (ECC) RAM that can detect and fix single-bit errors automatically. But consumer hardware—laptops, desktops, gaming rigs—typically lacks this protection. Smerity's bitflipped project makes an elegant observation: if cosmic rays can flip bits in your RAM, and you can allocate a large chunk of memory, then your computer is literally a cosmic ray detector. The project transforms a reliability concern into a physics experiment you can run on any machine.

Technical Insight

No

Yes

Yes

No

Yes

No

Program Start

Allocate 1GB Buffer

calloc - zeroed memory

Allocation

Success?

Exit with Error

Start Monitoring Loop

Sleep 60 seconds

Scan All Bytes

Byte != 0?

Report Bit Flip

offset & value

Reset Byte to 0

More Bytes?

System architecture — auto-generated

The implementation is deceptively simple, which is precisely what makes it brilliant. At its core, bitflipped allocates a large buffer of zeroed memory and periodically checks if any bits have spontaneously changed. Here's the essential logic:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#define BUFFER_SIZE (1024 * 1024 * 1024)  // 1GB
#define CHECK_INTERVAL 60  // seconds

int main() {
    unsigned char *buffer = calloc(BUFFER_SIZE, 1);
    if (!buffer) {
        fprintf(stderr, "Failed to allocate memory\n");
        return 1;
    }
    
    printf("Allocated %d MB. Monitoring for bit flips...\n", 
           BUFFER_SIZE / (1024 * 1024));
    
    while (1) {
        sleep(CHECK_INTERVAL);
        
        for (size_t i = 0; i < BUFFER_SIZE; i++) {
            if (buffer[i] != 0) {
                printf("Bit flip detected at offset %zu: 0x%02x\n", 
                       i, buffer[i]);
                buffer[i] = 0;  // Reset for continued monitoring
            }
        }
    }
    
    return 0;
}

The brilliance lies in what this code doesn't do. There's no complex instrumentation, no kernel modules, no hardware probing. It relies on a single assumption: memory initialized to zero should stay zero unless something external acts on it. The calloc() call is crucial here—it guarantees zero-initialized memory, unlike malloc() which returns uninitialized data. This gives us a known baseline state.

The detection mechanism is a straightforward linear scan. Every 60 seconds, the program walks through all 1 billion bytes checking for non-zero values. When found, it reports the offset and the corrupted byte value. The changed value itself can be informative: a single-bit error (like 0x00 becoming 0x01, 0x02, 0x04, etc.) suggests a different failure mode than a multi-bit corruption.

The choice of buffer size involves interesting trade-offs. Larger buffers increase the probability of catching a cosmic ray event—more target area means more chances for a particle strike. A 1GB buffer presents approximately 8 billion bits as targets. However, modern operating systems employ memory compression to optimize RAM usage. When you allocate a gigabyte of zeros, the OS may compress those pages down to nearly nothing, effectively neutering your detector. The memory still appears allocated from the program's perspective, but it's not physically resident in RAM chips where cosmic rays could strike it.

This creates a cat-and-mouse game with the OS. Some implementations touch each page periodically to prevent compression or swapping:

// Force pages to remain resident
for (size_t i = 0; i < BUFFER_SIZE; i += 4096) {
    buffer[i] = buffer[i];  // Read and write-back
}

This pattern walks through memory at page boundaries (typically 4KB) and performs a read-modify-write operation that signals to the OS that these pages are actively used. It's a form of memory pinning without requiring privileged operations.

The timing interval also matters. Cosmic ray bit flips are rare events—estimates suggest one flip per 8GB per hour at sea level for consumer RAM. Checking too frequently wastes CPU cycles scanning billions of bytes. Checking too infrequently increases the chance that multiple bits flip between scans, making it harder to correlate events. Sixty seconds represents a practical middle ground: frequent enough to catch events with reasonable latency, infrequent enough to keep CPU overhead minimal.

What's particularly clever about this project is how it reframes computing. We usually think of software as pure logic, isolated from physical concerns by layers of abstraction. Bitflipped pierces that veil, making the physical substrate visible. Your computer isn't just executing instructions—it's a piece of matter sitting in a universe where high-energy particles constantly pass through it. Most of the time, nothing happens. Occasionally, one strikes silicon at just the right angle with just the right energy, depositing enough charge to flip a transistor state. Your computer becomes a witness to events that began in stellar explosions millions of years ago.

Gotcha

The most significant limitation is that bitflipped cannot determine what caused a bit flip. Cosmic rays are romantic and make for great headlines, but they're not the only culprit. Alpha particles from trace radioactive isotopes in RAM packaging materials cause bit flips. Manufacturing defects create weak memory cells prone to errors. Electrical noise, voltage fluctuations, and temperature extremes all contribute. A detected flip could be cosmic in origin, or it could be a failing DIMM that needs replacement. The tool provides no diagnostic information to distinguish between these causes.

Modern system behaviors actively work against the detector's effectiveness. Memory compression, as mentioned, can collapse your 1GB detection buffer into a few kilobytes of physical RAM. Swap systems may page out unused memory to disk, where cosmic rays have no effect. Virtual memory means your "allocated" memory might not correspond to actual RAM at all. On systems with ECC RAM—common in servers and workstations—single-bit errors are silently corrected without any visibility to user-space programs. You could have dozens of cosmic ray strikes that ECC handles transparently, and bitflipped would never know. The detector is blind to the very protections designed to prevent the problem it's trying to observe.

There's also the waiting problem. Bit flips are genuinely rare on consumer hardware. You might run bitflipped for weeks or months without seeing a single event. This doesn't mean it's not working—it means you're observing a low-probability phenomenon. The lack of events makes it hard to validate that the detector is functioning correctly versus being defeated by OS memory management tricks.

Verdict

Use if: you're teaching low-level systems concepts and want a compelling demonstration of hardware-software interaction; you're genuinely curious about observing rare hardware events and have the patience for long-running experiments; you want to start conversations about reliability engineering and why ECC RAM matters in production systems; you're building intuition about what "bare metal" programming really means. Skip if: you need actual memory diagnostics (memtest86+ will find failing RAM in minutes, not months); you're running on ECC-equipped systems where the interesting events are hidden from user space; you want guaranteed results rather than probabilistic detection; you need to distinguish between cosmic rays and ordinary hardware failures. This is fundamentally a teaching tool and thought experiment that happens to produce real data. Treat it as a window into the physical reality of computing rather than a production monitoring solution.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/smerity-bitflipped.svg)](https://starlog.is/api/badge-click/developer-tools/smerity-bitflipped)