Memex: Building a Personal Process Manager for Digital Archival

Hook

What if systemd's complexity is overkill for your weekend project that downloads your tweets, and Docker Compose feels like bringing a tank to a knife fight?

Context

Digital hoarding is a peculiar affliction of our time. We generate massive amounts of personal data across platforms—tweets, photos, bookmarks, fitness data—but we rarely control where it lives or how long it persists. When a platform inevitably changes its API, gets acquired, or simply disappears, that data vanishes with it. The solution is self-archival: running personal services that continuously scrape and store your digital footprint.

But here's the problem: managing these archival scripts is annoying. You could throw them in cron jobs, but then you lose visibility into failures. You could use systemd, but writing unit files for personal projects feels bureaucratic. You could use Docker Compose, but that's overkill when you just want to run a handful of Python or Node scripts. Memex emerged from this frustration—a tool for one developer to manage personal data archival services without the ceremony of production-grade orchestration.

Technical Insight

Memex's architecture is refreshingly straightforward: it's a single Go binary that watches a directory for YAML configuration files and spawns child processes accordingly. The elegance lies in its declarative service definitions. Each service lives in its own directory with a memexservice.yml file:

command: python3
args:
  - archiver.py
  - --interval=3600
env:
  TWITTER_API_KEY: "your-key-here"
  OUTPUT_DIR: ./archive

When you run memex start, it walks the configured directory tree (set via MEMEX_DIR environment variable), discovers all service configs, and spawns them as child processes. Each service inherits the working directory of its config file's location, which creates natural isolation—your Twitter archiver lives in ~/memex/twitter/ with its own virtual environment, while your GitHub star scraper lives in ~/memex/github/ with its dependencies.

The hot-reload mechanism is where things get interesting. Memex watches the filesystem for changes to any memexservice.yml file. When it detects a modification, it doesn't restart the entire supervisor—it surgically restarts only the affected service. This is implemented using Go's fsnotify package for efficient file watching, combined with a process registry that maps config file paths to running PIDs. Update your service configuration, save the file, and within seconds the new version is running.

Here's the clever part about logging: all stdout and stderr from child processes are captured and prefixed with the service name, then streamed to the parent process's output. This means you can run memex start in a tmux pane and see interleaved logs from all your services:

[twitter] 2024-01-15 10:23:45 Downloaded 47 new tweets
[github] 2024-01-15 10:23:46 Scraped 12 starred repos
[twitter] 2024-01-15 10:23:47 Rate limit: 142 requests remaining
[fitness] 2024-01-15 10:23:48 Synced 3 workouts from Strava

The supervisor pattern implementation includes automatic restart logic. When a service exits with a non-zero status code, Memex waits a configurable backoff period (default: 5 seconds) before respawning it. This prevents tight crash loops but ensures transient failures don't kill your archival pipeline. The restart counter resets after a service runs successfully for a threshold period, so a service that crashes once a week won't eventually hit a restart limit.

From a deployment perspective, Memex embraces the single-binary philosophy. There's no daemon installation, no global configuration files in /etc/, no systemd integration required (though you could wrap it in a systemd unit if desired). You compile it with go build, drop the binary somewhere in your $PATH, set MEMEX_DIR to point at your services directory, and run memex start. For persistent operation, you'd typically run it under your distribution's init system or simply in a detached tmux session—unglamorous but effective for personal use.

The codebase itself is compact, under 1,000 lines of Go. This isn't a framework with plugins and extension points; it's a focused tool that does one thing. The minimalism is intentional—when you're building infrastructure for an audience of one, every abstraction and configuration option is technical debt.

Gotcha

Memex's personal-use design means it lacks features you'd expect from production process managers. There's no log rotation built-in, so if your archival services are chatty, you'll eventually fill your disk with logs (pipe to rotatelogs or similar if running long-term). Resource limits aren't enforced—a runaway service can consume all available CPU or memory. There's no built-in health checking beyond 'is the process still running,' so a service that hangs but doesn't crash won't trigger a restart.

The hot-reload capability, while convenient, has sharp edges. If you make a syntax error in your YAML config, that service will fail to restart, but Memex won't validate the configuration before attempting the reload. You'll discover the error by seeing the service disappear from your log stream. Similarly, there's no dependency management between services—if your analytics service depends on your archival service having already downloaded data, you need to handle that coordination in your service code itself. The lack of inter-service communication primitives means each service is an island; if you need them to coordinate, you're writing that coordination logic yourself (files, databases, or HTTP endpoints).

Verdict

Use if: You're managing a handful of personal automation scripts or archival tasks on a single machine and want something more structured than cron but lighter than Docker. You value simplicity over features and don't mind writing glue code for edge cases. You're comfortable with Go tooling and prefer declarative configuration over imperative scripting. Skip if: You need production-grade reliability features like health checks, resource limits, or zero-downtime deployments. You're managing services across multiple machines (Memex has no clustering). You want a mature ecosystem with existing integrations and community support—in those cases, reach for systemd for system services, Docker Compose for containerized workflows, or Supervisord if you need something proven but still relatively lightweight.

Memex: Building a Personal Process Manager for Digital Archival

Memex: Building a Personal Process Manager for Digital Archival

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Memex: Building a Personal Process Manager for Digital Archival

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]