Back to Articles

witr: Tracing the Ghost Processes in Your Stack

[ View on GitHub ]

witr: Tracing the Ghost Processes in Your Stack

Hook

You run ps aux on your production server and see a mysterious process consuming CPU. It’s been running for three days. Your team has no idea what started it. This isn’t a horror story—it’s Tuesday.

Context

Traditional process monitoring tools answer the question ‘what is running?’ with brutal efficiency. Fire up ps, htop, or top, and you’ll see every process, its resource consumption, and its immediate parent. But they all share a critical blindspot: they can’t tell you why something is running. Was it started by systemd at boot? Launched from a cron job at 3 AM? Spawned by a Docker container that’s since been renamed? Triggered by a long-forgotten SSH session?

This causality gap becomes painful during incident response, security audits, and capacity planning. You end up manually correlating systemctl status, docker ps, crontab -l, and half a dozen other tools, piecing together a narrative from fragmented data sources. Each system—systemd, cron, Docker, interactive shells—maintains its own process genealogy in different formats, accessed through different commands. witr exists to collapse this multi-tool archaeology into a single query: it builds complete causality chains from any running process back to its ultimate origin, detecting launch mechanisms across systemd services, container runtimes, schedulers, and interactive sessions. It’s the tool you reach for when ‘who is your parent?’ isn’t enough—you need to know ‘who is your great-great-grandparent and how did they start this whole family tree?‘

Technical Insight

Target Process PID

Process Tree Walker

OS-Specific API Layer

/proc filesystem

stat + cmdline

sysctl kinfo_proc

libproc calls

WMI queries

Data Normalizer

Process Node

(PID, PPID, cmd, metadata)

Pattern Detectors

detectSystemd

cgroup analysis

detectDocker

containerd-shim

detectCron

parent + timing

Tree Builder

Hierarchical Output

System architecture — auto-generated

witr’s architecture tackles a deceptively complex problem: different operating systems expose process information through wildly incompatible APIs. Linux offers /proc, BSD variants provide sysctl with kinfo_proc structures, macOS wraps libproc, and Windows requires WMI queries. Rather than maintaining platform-specific codebases, witr builds a unified abstraction layer that normalizes these disparate data sources into a common process model.

The core data structure is elegant: each process becomes a node with fields for PID, parent PID, command line, start time, and detected metadata about its launch context. The tool walks backward from a target process, querying the OS-specific API at each hop to retrieve parent information. On Linux, this means parsing /proc/[pid]/stat for parent PIDs and /proc/[pid]/cmdline for arguments. The real insight comes in the pattern matching layer that sits atop this raw data—witr examines process names, command-line arguments, environment variables (when available), and parent relationships to infer launch mechanisms.

Here’s where it gets interesting. Consider a Python web application running under gunicorn, started by a systemd service. Running witr <pid> might produce:

├─ systemd (PID 1)
│  └─ systemd service: webapp.service
│     └─ /usr/bin/gunicorn --config gunicorn.conf.py
│        └─ gunicorn: worker [app:application]
│           └─ python /app/worker.py  # <-- your target process

To generate this output, witr walks the process tree and applies heuristics: if a process’s parent is PID 1 and its cgroup contains systemd:/system.slice/webapp.service, it’s a systemd service. If the process hierarchy includes /usr/bin/containerd-shim or docker-proxy, it’s inside a container. If the parent chain includes /usr/sbin/cron and the start time aligns with cron schedule intervals, it’s likely a cron job. These detection rules live in separate handlers—detectSystemd(), detectDocker(), detectCron()—that return enriched metadata when patterns match.

The cross-platform challenge manifests most clearly in the process information gathering layer. Here’s a simplified view of how witr abstracts this on Linux versus Windows:

// Simplified Linux implementation
func getProcessInfo(pid int) (*ProcessInfo, error) {
    statPath := fmt.Sprintf("/proc/%d/stat", pid)
    data, err := os.ReadFile(statPath)
    if err != nil {
        return nil, err
    }
    
    fields := parseStatFile(data)
    return &ProcessInfo{
        PID:       pid,
        PPID:      atoi(fields[3]),  // Parent PID is field 4
        Comm:      fields[1],         // Command name
        StartTime: atoi(fields[21]),  // Start time in clock ticks
    }, nil
}

// Simplified Windows implementation
func getProcessInfo(pid int) (*ProcessInfo, error) {
    query := fmt.Sprintf("SELECT ParentProcessId, Name, CreationDate FROM Win32_Process WHERE ProcessId = %d", pid)
    result, err := wmiQuery(query)
    if err != nil {
        return nil, err
    }
    
    return &ProcessInfo{
        PID:       pid,
        PPID:      result.ParentProcessId,
        Comm:      result.Name,
        StartTime: parseWMIDate(result.CreationDate),
    }, nil
}

Both functions return the same ProcessInfo struct, but the Linux version reads procfs files while Windows performs WMI queries. The calling code doesn’t care—it receives normalized data and continues building the causality chain. This abstraction allows witr to ship as a single binary per platform with identical user interfaces.

The formatting layer deserves mention for its clarity. Rather than dumping raw process trees, witr annotates each level with semantic meaning. When it detects a Docker container, it queries Docker’s socket (if available and permissions allow) to retrieve container names and image information. For systemd services, it parses cgroup paths to extract service names. For SSH sessions, it identifies sshd parents and includes client connection details when readable. This transforms opaque PIDs into narratives: ‘This process exists because you started a Docker container named production-api from image myapp:v2.1.3, which launched a systemd-managed service inside the container, which spawned this worker process.’

One architectural choice worth highlighting: witr is intentionally stateless. It doesn’t maintain a database of historical process starts or daemonize to monitor changes. Each invocation is a fresh snapshot query. This makes deployment trivial—drop the binary anywhere and run it—but means it can’t show you what started a process that’s since terminated. For causality chains to work, every ancestor in the chain must still be running.

Gotcha

witr’s effectiveness is bounded by two hard constraints: API availability and process lifecycle. On locked-down systems with restricted procfs access (common in hardened containers or security-conscious environments), witr may only see partial process information or fail entirely. If you’re running in a container without SYS_PTRACE capabilities or with a filtered procfs mount, expect incomplete causality chains. Similarly, if you’ve disabled WMI on Windows or restricted sysctl access on BSD systems, the tool loses its data source.

The more subtle limitation is the ‘orphaned process’ problem. When an intermediate ancestor terminates, the orphaned process gets reparented to PID 1 (init or systemd). At this point, the historical causality chain is lost—witr can only report that systemd is now the parent, not what originally started the process. This happens frequently with daemonizing services that fork and exit their parent, or with shell sessions that disconnect. You’ll see the process attached to init, but won’t know it was started from an SSH session two weeks ago by an engineer debugging something. The causality only exists as long as the full ancestor chain remains alive.

Container orchestrators present another edge case. witr will correctly identify that a process is running inside a Docker container, but if that container was launched by Kubernetes, the causality chain stops at the Docker daemon. It won’t traverse up to show which Kubernetes deployment, pod, or node triggered the container launch. The same applies to Nomad, Docker Swarm, or ECS tasks. You get ‘started by containerd,’ not ‘started by the web-backend deployment in the production namespace.’ For orchestrator-level visibility, you still need kubectl, nomad status, or equivalent tools.

Verdict

Use witr if you frequently debug unexpected processes on servers, troubleshoot resource usage from mysterious sources, conduct security audits requiring process ancestry documentation, or manage heterogeneous infrastructure where process origins span systemd, containers, and cron. It’s particularly valuable during incident response when you need to quickly understand how a process got there without manually correlating five different system tools. The single-binary, zero-dependency deployment makes it perfect for infrastructure teams managing hundreds of servers across different platforms. Skip witr if you need active process management capabilities (stopping, restarting, or modifying processes), real-time monitoring dashboards with historical metrics, or deep visibility into container orchestration systems above the runtime layer. It’s a diagnostic snapshot tool, not a replacement for systemctl, htop, or kubectl—it answers ‘why’ and ‘how,’ not ‘what should I do about it.’

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/pranshuparmar-witr.svg)](https://starlog.is/api/badge-click/developer-tools/pranshuparmar-witr)