Back to Articles

jc: Turning 50 Years of Unix Text Into Modern JSON Pipelines

[ View on GitHub ]

jc: Turning 50 Years of Unix Text Into Modern JSON Pipelines

Hook

Every time you write awk '{print $2}' | grep -v '^$' | sort -u to parse command output, you're one locale change or version update away from a production incident. There's a better way.

Context

Unix commands were designed in an era when human readability trumped machine parsability. Tools like ps, netstat, and df output beautifully formatted tables perfect for terminal viewing but nightmarish for automation. For decades, DevOps engineers have written fragile parsing code—regex gymnastics in bash, brittle awk scripts, and sed chains that break when a command adds an extra space.

The rise of infrastructure-as-code and GitOps workflows intensified this pain. Modern automation tools expect JSON or YAML, but legacy commands speak only in whitespace-delimited tables and human-friendly text. You could rewrite everything in Python to call system APIs directly, but that abandons decades of battle-tested CLI utilities. What the ecosystem needed was a translator: something that understands the quirks of 100+ command output formats and speaks fluent JSON. That's the problem Kelly Brazil solved with jc.

Technical Insight

At its core, jc is a plugin-based parsing framework with over 100 specialized parsers, each tuned to handle the idiosyncrasies of a specific command or file format. The architecture separates concerns cleanly: a CLI interface handles input/output and orchestration, while parser modules contain the transformation logic. Each parser exposes a standard interface (parse() function) that accepts text and returns structured data.

What makes jc particularly clever is its dual-mode operation. As a CLI tool, you can pipe command output directly: ifconfig | jc --ifconfig | jq '.[] | select(.state=="UP")'. But the "magic" syntax is where it shines: jc ifconfig actually executes the command and parses it in one shot. This works through subprocess execution and output capture, avoiding the need for explicit pipes when you want everything parsed immediately.

import jc

# As a library, jc integrates seamlessly into Python scripts
result = jc.parse('ifconfig', ifconfig_output)
for interface in result:
    if interface.get('state') == 'UP':
        print(f"{interface['name']}: {interface.get('ipv4_addr')}")

# Raw mode preserves original string values without type coercion
raw_result = jc.parse('ps', ps_output, raw=True)
for proc in raw_result:
    # 'pid' is a string, not an integer
    if proc['pid'].startswith('1'):
        print(proc['command'])

The parsing strategies vary by command complexity. Simple commands like uname use straightforward string splitting and dictionary construction. Complex parsers like iptables or netstat employ multi-stage parsing with state machines that track context as they traverse output lines. For example, the iptables parser must handle chain definitions, rule lines with varying formats, and packet/byte counters—all with different indentation and structure.

Streaming parsers represent jc's most sophisticated feature for handling large outputs. Commands like tcpdump or long-running logs can produce gigabytes of data. Standard parsers load everything into memory before processing, but streaming parsers yield results line-by-line:

import jc

# Streaming mode for memory-efficient processing
for packet in jc.parse('tcpdump', tcpdump_stream, streaming=True):
    if packet.get('dest_port') == 443:
        alert(f"TLS traffic to {packet['dest_ip']}")

The parser development workflow deserves attention. Each parser includes a test suite with real command outputs captured from various platforms and versions. This test-driven approach helps catch regressions when commands evolve. Contributors add new parsers by subclassing a base parser class, implementing the parsing logic, and providing comprehensive test fixtures. The framework handles input normalization, output formatting (JSON/YAML/dict), and error handling.

Integration with Ansible showcases jc's practical value in automation. The tool ships as a filter plugin in Ansible's community.general collection, allowing playbooks to parse command output inline:

- name: Get listening ports
  shell: netstat -tulpn
  register: netstat_output

- name: Parse and filter for port 8080
  set_fact:
    app_listening: "{{ (netstat_output.stdout | community.general.jc('netstat')) | selectattr('local_port', 'equalto', 8080) | list | length > 0 }}"

- name: Alert if app is down
  debug:
    msg: "Application not listening on port 8080"
  when: not app_listening

This eliminates the need for complex shell parsing within Ansible tasks, making playbooks more readable and maintainable. The same pattern applies to Salt, Nornir, and custom Python automation frameworks.

Gotcha

The fundamental limitation of jc is that it's fighting against the grain of how Unix tools were designed. Parsers rely on regex patterns and assumptions about output formats that can break when commands change. A new version of ps that adds a column or changes spacing can cause parsing failures. Locale settings are particularly insidious—date formats, decimal separators, and even column headers can vary based on LANG environment variables, and while jc documents these issues, you still need defensive coding in production scripts.

The magic syntax has sharp edges. It doesn't work with shell aliases, builtins, or complex pipelines. If your workflow relies on aliased commands or shell-specific features, you'll need to use explicit piping instead. This isn't a bug—it's an architectural consequence of how jc executes subprocesses—but it catches users by surprise. Additionally, some parsers lag behind command evolution. With 100+ parsers to maintain, edge cases and newer command versions may not be fully supported. Before depending on jc in production, audit the specific parsers you need against your actual command versions and output formats. The test suite is comprehensive but can't cover every platform variant and version combination.

Verdict

Use if: You're building automation that interacts with legacy Unix commands, especially in Ansible/Salt/Terraform workflows where structured data is essential. You're replacing fragile awk/sed/grep chains in bash scripts and want more maintainable code. You need to feed CLI output into jq, REST APIs, or monitoring systems. You value developer velocity over parsing microseconds and can tolerate the dependency. Skip if: Your commands already output JSON natively (many modern tools support --json flags). You're in severely constrained environments where Python dependencies are problematic. You need guaranteed parsing accuracy across wildly divergent command versions or locales—in those cases, calling system APIs directly via ctypes or the os module gives more reliable results. You're parsing formats that change frequently without community parser updates.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/kellyjonbrazil-jc.svg)](https://starlog.is/api/badge-click/developer-tools/kellyjonbrazil-jc)