Back to Articles

jsonlines: Why Python's Standard Library Fails at Line-Delimited JSON

[ View on GitHub ]

jsonlines: Why Python’s Standard Library Fails at Line-Delimited JSON

Hook

Python can parse JSON in milliseconds and handle files with ease, yet there’s no built-in way to read the single most common format for log files and machine learning datasets: JSON Lines.

Context

JSON Lines (also known as newline-delimited JSON or NDJSON) is everywhere in modern data infrastructure. Every line is a complete, valid JSON object, making it perfect for streaming logs, append-only datasets, and ETL pipelines. Unlike standard JSON arrays, you can process JSON Lines files line-by-line without loading the entire structure into memory, and appending new records is as simple as writing a new line. Despite this format’s ubiquity—used by Elasticsearch bulk APIs, AWS CloudWatch Logs, Hugging Face datasets, and countless data processing tools—Python’s standard library offers no convenience methods for it. You’re left writing the same boilerplate: open a file, iterate lines, call json.loads() on each one, wrap it in try-except blocks for malformed lines, and handle encoding edge cases. The jsonlines library emerged to eliminate this repetitive code, providing a clean, Pythonic API that respects the principle of doing one thing well.

Technical Insight

open file

mode='r'

mode='w'

read line

raw line

parse JSON

Python object

Python object

serialize JSON

JSON string

write line + newline

User Code

jsonlines.open

Reader Class

Writer Class

json.loads/dumps

File I/O

System architecture — auto-generated

At its core, jsonlines wraps Python’s built-in json module with reader and writer classes that handle the mechanical details of line-by-line processing. The library’s elegance comes from its commitment to Python idioms: context managers for automatic resource cleanup, iterator protocols for memory-efficient streaming, and sensible defaults.

Here’s the canonical example of reading a JSON Lines file. Without jsonlines, you’d write:

import json

with open('data.jsonl', 'r') as f:
    for line in f:
        obj = json.loads(line)
        print(obj['name'])

With jsonlines, the code becomes more declarative:

import jsonlines

with jsonlines.open('data.jsonl') as reader:
    for obj in reader:
        print(obj['name'])

The difference seems subtle, but the library handles critical details: it provides cleaner error messages when encountering malformed JSON, and the context manager ensures proper file closure even when exceptions occur mid-stream.

Writing JSON Lines is equally clean. The library provides a writer interface that serializes Python objects line-by-line:

import jsonlines

data = [
    {'id': 1, 'name': 'Alice'},
    {'id': 2, 'name': 'Bob'}
]

with jsonlines.open('output.jsonl', mode='w') as writer:
    for obj in data:
        writer.write(obj)

You can write objects one at a time using writer.write(obj), which is useful when processing streaming data or generating records on-the-fly. The writer handles newline characters automatically.

The library is deliberately minimal with no external dependencies beyond Python’s standard library. This design choice means it installs quickly, and won’t create dependency conflicts in your environment. The reader and writer classes are thin wrappers that delegate actual JSON parsing to the stdlib, so you get the same battle-tested serialization behavior you’d expect from json.loads() and json.dumps().

Gotcha

The jsonlines library excels at its narrow scope but won’t solve every JSON Lines problem. Performance is limited by the standard library’s json module—if you’re processing gigabytes of data per second, you’ll want to combine faster parsers like orjson or ujson with manual iteration instead. The library also assumes file-like objects, so while you can pass objects with read() and write() methods, there’s no built-in support for streaming from network sockets, HTTP responses, or message queues without wrapping them first.

Compression is notably absent. If your JSON Lines files are gzipped (common for archived logs), you need to handle decompression separately using gzip.open() before passing the file handle to jsonlines. There’s no parallel processing support either—reading a 10GB file will process one line at a time in a single thread. For CPU-bound workloads or multi-core machines, you’ll need to implement your own chunking and multiprocessing logic. Finally, the library doesn’t provide any schema validation or data transformation utilities. It’s purely about I/O, so you’ll need separate tools like jsonschema or pydantic if you want to validate record structure.

Verdict

Use jsonlines if you’re working with JSON Lines files in Python and want clean, maintainable code without reinventing basic I/O patterns. It’s the right choice for data science notebooks, ETL scripts, log processing, and anywhere you’d otherwise write boilerplate json.loads() loops. The library’s minimal footprint makes it safe to add to any project without dependency concerns. Skip it if you need maximum performance (use orjson with manual iteration), built-in compression support (handle with gzip first), or you’re already using pandas for data analysis (read_json with lines=True is sufficient). Also skip if you need streaming from non-file sources like network sockets—you’ll need custom I/O logic that jsonlines doesn’t provide.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/data-knowledge/wbolster-jsonlines.svg)](https://starlog.is/api/badge-click/data-knowledge/wbolster-jsonlines)