go-audit: Why Slack Rewrote Linux's Audit Daemon in Go

Hook

Linux's auditd has been the security audit gold standard for two decades, yet Slack threw it out entirely. Their replacement handles millions of kernel events per second without blocking—here's the engineering that made it possible.

Context

Every regulated Linux system runs auditd, the venerable daemon that's been capturing kernel security events since 2005. It works—nobody disputes that—but it wasn't designed for the cloud era. Auditd outputs cryptic key-value logs that require arcane ausearch queries to parse. Its plugin architecture (audisp) adds processing latency. Most critically, it can block on I/O, creating backpressure that kernel developers warn against. For a modern infrastructure company shipping logs to centralized systems, these aren't minor annoyances—they're architectural mismatches.

Slack's engineering team faced this reality managing thousands of hosts. They needed audit events in JSON for their log pipeline, guaranteed non-blocking behavior under load, and the ability to route events to multiple destinations without plugin complexity. Rather than wrapping auditd in translation layers, they built go-audit: a daemon that speaks directly to the kernel's audit subsystem via netlink sockets, processes events asynchronously in Go, and emits JSON natively. It's not trying to replicate every auditd feature—it's rethinking what a modern audit daemon should be.

Technical Insight

System architecture — auto-generated

The core architectural decision in go-audit is bypassing userspace abstractions entirely. While auditd uses libaudit to communicate with the kernel, go-audit implements the netlink protocol directly using Go's syscall package. This means creating a NETLINK_AUDIT socket and speaking the kernel's binary protocol without intermediaries.

Here's how go-audit establishes that critical kernel connection:

// Create a netlink socket for audit communication
fd, err := syscall.Socket(
    syscall.AF_NETLINK,
    syscall.SOCK_RAW,
    syscall.NETLINK_AUDIT,
)
if err != nil {
    return nil, err
}

// Bind to the audit multicast group
addr := &syscall.SockaddrNetlink{
    Family: syscall.AF_NETLINK,
    Groups: 0,
    Pid:    0, // Let kernel assign PID
}
if err := syscall.Bind(fd, addr); err != nil {
    return nil, err
}

This low-level approach grants precise control over socket buffer sizes—critical when the kernel is generating events faster than userspace can consume them. The socket_buffer.receive configuration allows tuning the kernel buffer to prevent the dreaded "no buffer space available" errors that indicate event loss.

Once events arrive, go-audit's processing pipeline is designed around non-blocking principles. Events are read from the netlink socket and immediately pushed into buffered Go channels. Worker goroutines consume from these channels, parse the binary audit format into structured data, and convert to JSON. This decoupling is essential: even if downstream outputs (like network-based syslog) stall momentarily, the kernel socket continues draining without backpressure.

The event parsing itself reveals another architectural choice: type safety over performance shortcuts. Rather than regex-parsing the kernel's text output (which auditd does), go-audit handles the binary netlink messages directly. Each audit message type has a corresponding Go struct:

type AuditMessage struct {
    Type      uint16
    Seq       uint32
    Timestamp time.Time
    Data      map[string]string
}

// Parse syscall events into structured format
func (m *AuditMessage) ParseSyscall() (*SyscallEvent, error) {
    event := &SyscallEvent{
        Syscall: m.Data["syscall"],
        Arch:    m.Data["arch"],
        Success: m.Data["success"] == "yes",
        Exit:    m.Data["exit"],
    }
    return event, nil
}

The output pipeline is where go-audit's flexibility shines. Unlike auditd's audisp plugin system (which requires separate processes and IPC), go-audit uses a simple interface that any Go type can implement:

type Output interface {
    Write(event []byte) error
}

Want to send audit events to Graylog2 GELF format? Implement the interface. Need custom filtering before writing to syslog? Wrap another output implementation. The configuration file can chain multiple outputs, and each runs in its own goroutine with independent error handling. If syslog is unreachable, file output continues unaffected.

This design eliminates a class of problems that plague auditd deployments: plugin crashes don't bring down the daemon, slow outputs don't block fast ones, and adding new destinations doesn't require recompiling C code or managing separate process lifecycles. It's the kind of architecture that only makes sense in a language with goroutines and channels as first-class features.

Gotcha

go-audit's biggest limitation is one it inherited from the kernel itself: filename resolution is unreliable. When the kernel logs a file access, it typically records the inode number, not the path. Auditd has the same problem, but years of tooling have evolved to work around it. The go-audit README is refreshingly honest about this: "The kernel only reports inodes, and we can't always map those back to filenames." If you need complete file path audit trails for compliance, you'll need to implement your own inode-to-path mapping using debugfs or accept gaps in your logs.

Event loss under extreme load is another sharp edge. While go-audit is architected to minimize blocking, the kernel-to-userspace socket buffer can still overflow if the system generates audit events faster than go-audit processes them. The solution is tuning socket_buffer.receive to massive values (the kernel maximum is typically 33554432 bytes), but this consumes significant memory per audit daemon instance. On systems with thousands of processes generating syscall audits, you may need to choose between memory consumption and guaranteed event capture. Additionally, if systemd-journald is configured to read audit events (common on modern distros), you'll see duplicate log entries unless you explicitly mask journald's audit socket unit. This isn't a bug—it's two daemons both reading from the same kernel subsystem—but it's a deployment complexity that auditd doesn't have since journald knows to defer to it.

Verdict

Use if: You're shipping audit logs to centralized logging infrastructure and need JSON output without transformation layers, you're operating at scale where auditd's blocking behavior or plugin architecture creates operational burden, or you're already invested in Go tooling and want audit infrastructure you can modify and extend in-house. go-audit excels in cloud-native environments where log aggregation is mandatory and you can tolerate occasional filename resolution gaps. Skip if: You're in a highly regulated environment where audit completeness is legally mandated and any event loss is unacceptable, you depend on specific auditd audisp plugins that would be difficult to reimplement, or you need guaranteed filename resolution for compliance reporting. Traditional auditd remains the conservative choice for financial services, healthcare, or government systems where audit requirements are stringent and the risk of gaps outweighs operational convenience.

go-audit: Why Slack Rewrote Linux's Audit Daemon in Go

go-audit: Why Slack Rewrote Linux's Audit Daemon in Go

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

go-audit: Why Slack Rewrote Linux's Audit Daemon in Go

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]