fastgron: Why Flattening JSON is 50x Faster Than Querying It

Hook

When processing a 100MB JSON file, jq takes 18 seconds to extract a single field. fastgron does it in one second. The secret? Stop thinking of JSON as a tree and start thinking of it as a list of assignment statements.

Context

JSON has become the lingua franca of APIs and configuration files, but working with large JSON datasets in shell pipelines remains painful. Tools like jq are powerful but struggle with performance at scale, and their custom query languages create cognitive overhead when you just want to grep for a value. The original gron tool, created by Tom Hudson in 2016, proposed an elegant solution: flatten JSON into discrete assignment statements where each line represents one path-to-value mapping. Instead of navigating a tree structure, you get greppable text where json.users[0].email = "user@example.com" becomes just another line you can pipe through familiar Unix tools.

The problem was performance. The original gron, written in Ruby, processes JSON at roughly 8MB/s—fine for API responses, catastrophic for log files or large datasets. When your JSON file is 500MB, waiting over a minute for gron to complete breaks the interactive shell experience that makes Unix pipelines powerful. Adam Ritter's fastgron solves this by rebuilding the concept in C++20 with simdjson, achieving 400MB/s input throughput and proving that the gron format itself was never the bottleneck.

Technical Insight

The architecture of fastgron centers on simdjson's on-demand parsing API, which uses SIMD instructions to parse JSON at speeds approaching memcpy. Unlike traditional JSON parsers that build a complete DOM tree in memory, simdjson's on-demand API processes JSON as a forward-only stream, reducing memory allocations and cache misses. Fastgron walks this stream recursively, building path strings as it descends into objects and arrays, and emits each leaf value as a discrete gron statement.

Here's what the transformation looks like in practice:

# Input JSON
echo '{"user":{"name":"Alice","scores":[98,87,92]}}' | fastgron

# Output (gron format)
json = {};
json.user = {};
json.user.name = "Alice";
json.user.scores = [];
json.user.scores[0] = 98;
json.user.scores[1] = 87;
json.user.scores[2] = 92;

Now every JSON value has a complete path, making grep operations trivial:

# Find all email addresses in a 200MB JSON file
fastgron large-dataset.json | grep 'email' | head -5

# Extract just the values (remove the path prefix)
fastgron large-dataset.json | grep '\.price = ' | cut -d'=' -f2

The real power emerges when you combine gron format with ungron (the reverse operation). Fastgron can convert gron statements back into valid JSON, which means you can use grep to filter JSON structure:

# Extract only user objects that contain 'admin' somewhere in their data
fastgron users.json | grep 'admin' | fastgron --ungron

# Remove all fields containing 'internal' from a JSON file
fastgron data.json | grep -v 'internal' | fastgron -u

Fastgron's built-in filtering goes further, implementing path-based queries directly in C++ to avoid spawning grep processes. The -F flag enables fixed-string filtering that's compiled into the parser itself:

# Extract just the .users[].email paths
fastgron -F '.users' -F '.email' api-response.json

# This is 18x faster than: jq '.users[].email' api-response.json

The performance advantage comes from simdjson parsing only the portions of JSON needed for matched paths, while jq must parse and evaluate the entire document. For a 100MB JSON file with deeply nested arrays, jq might parse 100MB and allocate gigabytes of intermediate structures, while fastgron with filtering parses only matched subtrees and streams output directly.

Fastgron also includes libcurl integration (on Unix systems) for direct HTTP fetching, turning it into a complete API exploration tool:

# Explore GitHub API structure
fastgron https://api.github.com/repos/adamritter/fastgron | grep 'stargazers'

# Output:
# json.stargazers_count = 674;
# json.stargazers_url = "https://api.github.com/repos/adamritter/fastgron/stargazers";

The bidirectional nature of gron format enables transformations that would be complex in jq. Want to rename all occurrences of a field across a deeply nested structure? Use sed on the gron output:

# Rename 'userId' to 'user_id' throughout entire JSON structure
fastgron data.json | sed 's/\.userId /.user_id /' | fastgron -u > transformed.json

This approach trades query expressiveness for composability and performance. You lose jq's sophisticated filtering and mapping operators, but gain the ability to use the entire Unix toolkit—grep, sed, awk, sort, uniq—on JSON structure with predictable, linear performance characteristics.

Gotcha

Fastgron's limitations stem from its fundamental design choice: flattening JSON into discrete statements works brilliantly for extraction and filtering, but poorly for complex transformations. If you need to compute aggregates ("sum all prices"), reshape data structures ("group users by country"), or perform conditional logic ("if premium user, apply discount"), you'll find yourself fighting against the gron format. These operations require understanding relationships between values, which is easy in jq's tree representation but requires parsing and correlating multiple gron lines. For complex queries, jq's 18x performance penalty is often worth paying for its expressiveness.

The path query syntax is explicitly minimal. Wildcards, recursive descent, and array slicing—features you'd expect from JSONPath or jq—are either missing or only partially implemented. The documentation notes these are 'planned features,' which means production pipelines depending on these queries would break. Windows users face a more concrete limitation: no libcurl support means you can't fetch URLs directly, requiring curl as a separate process and losing the integrated workflow Unix users enjoy. Finally, while fastgron handles large files far better than original gron, extremely large filtered outputs (gigabytes) can still cause memory pressure during ungron operations, though this is more a function of JSON structure than tool limitation.

Verdict

Use if: You regularly work with JSON files larger than 50MB where jq feels sluggish, need to apply Unix text processing tools (grep/sed/awk) to JSON structure, want to explore unfamiliar API responses interactively, or build pipelines that extract simple fields from large datasets. It's ideal for log analysis, API exploration, and any workflow where "find all X in this JSON" is more common than "transform this JSON into that JSON." Skip if: You're working with small JSON files (under 10MB) where jq's startup overhead is negligible, need complex transformations beyond simple extraction and filtering, require Windows with HTTP support, or depend on advanced JSONPath features like wildcards and recursive descent. For those cases, accept jq's performance cost for its mature ecosystem and expressive query language.

fastgron: Why Flattening JSON is 50x Faster Than Querying It

fastgron: Why Flattening JSON is 50x Faster Than Querying It

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

fastgron: Why Flattening JSON is 50x Faster Than Querying It

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]