fastgron: When JSON is Too Large to Query With jq
Hook
The original gron tool crashes when processing an 840MB JSON file. fastgron handles it in under half a second. The difference? SIMD-accelerated parsing and a willingness to rethink how we interact with JSON at scale.
Context
JSON has won the API wars, but exploring large JSON responses remains painful. Tools like jq are powerful but slow on multi-hundred-megabyte payloads. The original gron tool, created by Tom Hudson, introduced an elegant solution: flatten JSON into discrete assignment statements where each line shows the full path to a value. This makes JSON “greppable”—you can use standard Unix tools to find what you need. The problem? Performance. When you’re working with API responses measured in hundreds of megabytes, or when you’re processing JSON logs at scale, waiting 30+ seconds for output becomes untenable.
Adam Ritter’s fastgron reimplements the gron concept in C++20, leveraging the simdjson library to achieve processing speeds that fundamentally change what’s possible. At 400MB/s input throughput and 1.8GB/s output generation on an M1 MacBook Pro, fastgron transforms “wait and see” workflows into interactive exploration. More importantly, it handles files that crash the original implementation entirely, opening up use cases that simply weren’t viable before.
Technical Insight
fastgron’s architecture centers on simdjson, a parser that uses SIMD (Single Instruction, Multiple Data) instructions to process JSON at speeds approaching RAM bandwidth. While most JSON parsers validate and parse byte-by-byte, simdjson processes chunks of data in parallel, achieving throughput that traditional parsers can’t match. This isn’t just theoretical—on the 190MB citylots.json benchmark file, fastgron completes in 0.447 seconds while gron takes 36.7 seconds.
The GRON format itself is brilliantly simple. Instead of nested structures, each value becomes a standalone assignment showing its complete path:
$ cat testdata/two.json
{
"name": "Tom",
"github": "https://github.com/tomnomnom/",
"likes": ["code", "cheese", "meat"],
"contact": {
"email": "mail@tomnomnom.com",
"twitter": "@TomNomNom"
}
}
$ fastgron testdata/two.json
json = {}
json.name = "Tom"
json.github = "https://github.com/tomnomnom/"
json.likes = []
json.likes[0] = "code"
json.likes[1] = "cheese"
json.likes[2] = "meat"
json.contact = {}
json.contact.email = "mail@tomnomnom.com"
json.contact.twitter = "@TomNomNom"
Every value appears on its own line with the complete path from root to leaf. This makes grep, awk, and sed immediately useful. Need all email addresses? fastgron data.json | grep email. Want to find which objects contain a specific user ID? fastgron data.json | grep user_id.
The bidirectional transformation capability is where fastgron’s design shines. The --ungron flag reverses the process, converting filtered GRON output back into valid JSON. This enables powerful workflows:
$ fastgron "https://api.github.com/repos/adamritter/fastgron/commits?per_page=1" | fgrep commit.author | fastgron --ungron
[
{
"commit": {
"author": {
"date": "2023-05-30T18:11:03Z",
"email": "58403584+adamritter@users.noreply.github.com",
"name": "adamritter"
}
}
}
]
You can grep for the fields you care about, then reconstruct just those portions as valid JSON for further processing. This is especially valuable when working with bloated API responses where you only need a subset of fields.
fastgron also includes built-in filtering with the -F flag for fixed-string search, eliminating the need to pipe to grep in many cases. Path expressions allow targeted extraction: .#.3.population or cities.#.population to drill into specific array indices or object hierarchies. The tool supports reading from files, URLs (via libcurl), or stdin, and includes a stream mode (-s) for processing JSON Lines format where each line is a separate JSON document.
For path extraction specifically, the performance advantage over jq is even more pronounced—18x faster according to benchmarks. When you need to extract user.profile.address.city from a million records, that difference matters. The ungron operation also demonstrates remarkable performance: converting an 840MB GRON file back to JSON takes 6.1 seconds, while the original gron crashes after over a minute of attempting the same task.
Gotcha
Windows users face a significant limitation: the released binaries lack libcurl support, meaning you can’t directly fetch JSON from HTTP/HTTPS URLs. You’ll need to download files separately or pipe curl output to fastgron, which eliminates some of the convenience.
The path query feature, while promising, is still evolving. The README notes support for complex expressions like .{id,users[1:-3:2].{name,address}} and path renaming with accessors, but describes it as “a minimal, limited implementation right now.” If you’ve grown accustomed to jq’s comprehensive path manipulation and transformation capabilities, fastgron will feel incomplete. It excels at extraction and filtering, but complex transformations—combining multiple fields, computing values, conditional logic—remain jq’s domain. As a relatively young project from 2023, the ecosystem integration and community resources are limited compared to established tools. You won’t find the wealth of Stack Overflow answers, blog posts, and examples that exist for jq. When fastgron’s built-in features don’t cover your use case, you’re more likely to be pioneering solutions than finding existing ones.
Verdict
Use fastgron if you’re working with JSON files over 100MB where exploration speed matters, or when you need to extract specific paths from large datasets faster than jq can deliver. It’s ideal for API debugging, log analysis, and data pipeline work where you’re searching for needles in JSON haystacks. The ability to grep GRON output and reconstruct filtered JSON makes it excellent for iterative data exploration. Skip it if you’re working with small JSON files where the performance difference is negligible and jq’s mature feature set provides more value. Avoid it for complex transformations that require computation, conditional logic, or sophisticated field manipulation—jq remains the better choice there. And if you’re on Windows and need URL fetching, you’ll be frustrated until libcurl support arrives in the official builds.