Elasticdump: The Swiss Army Knife for Elasticsearch and OpenSearch Migrations
Hook
Over 7,900 GitHub users have starred a JavaScript CLI tool for moving Elasticsearch data, yet a version upgrade introduced parallel processing that changed document ordering guarantees—a tradeoff many migration scripts may not anticipate.
Context
When you need to move data between Elasticsearch clusters, you face an uncomfortable choice: use Elasticsearch’s native snapshot/restore API (fast, reliable, but locked to compatible versions and shared filesystems) or cobble together scripts with the scroll and bulk APIs. Neither handles the common scenarios DevOps teams face daily—migrating between cloud providers, downgrading cluster versions for testing, exporting subsets of data based on queries, or maintaining separate disaster recovery copies in S3.
Elasticdump emerged as a widely-used solution because it treats data movement as a pipeline problem. Instead of assuming you’re moving from Elasticsearch to Elasticsearch within the same version family, it abstracts the transport layer entirely. Your input can be a production cluster, a JSON file, S3, or even stdin. Your output follows the same pattern. This architectural decision—treating clusters as just another I/O stream—makes elasticdump the tool you reach for when native solutions don’t fit. With OpenSearch support added in v6.76.0, it’s now compatible with both ecosystems.
Technical Insight
Elasticdump’s core is a Node.js streaming pipeline that wraps Elasticsearch’s scroll API for reading and bulk API for writing. When you run a dump command, it doesn’t load your entire index into memory. Instead, it scrolls through documents in batches (configurable via the limit parameter), transforms them if needed, and streams them to the output destination.
Here’s a complete index migration that preserves metadata:
# Step 1: Copy analyzers (must come first)
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=analyzer
# Step 2: Copy mappings (depends on analyzers)
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=mapping
# Step 3: Copy data
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=data
The --type flag determines what elasticdump extracts. Beyond data, you can dump mapping, analyzer, alias, and template objects. This granularity means you can reconstruct an index’s structure and content, even across version boundaries.
The tool shines when you combine transports. Need to backup an index with compression? Pipe through gzip:
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=$ \
| gzip > /data/my_index.json.gz
The --output=$ flag sends output to stdout, letting you chain with any Unix tool. For S3 direct integration, elasticdump supports native S3 URLs with credentials:
elasticdump \
--s3AccessKeyId "${access_key_id}" \
--s3SecretAccessKey "${access_key_secret}" \
--input="s3://${bucket_name}/${file_name}.json" \
--output=http://production.es.com:9200/my_index
The architectural shift in v6.1.0 introduced overlapping promise processing for parallel operations. Previously, elasticdump waited for each batch to complete before fetching the next. Now it processes multiple batches concurrently, improving throughput on high-latency connections. The tradeoff: documents no longer arrive in scroll order. For most use cases (rebuilding indices, backups), order doesn’t matter. But if you’re migrating time-series data where insertion order affects internal document IDs or routing, this changes behavior.
Query-based exports showcase elasticdump’s flexibility beyond full dumps:
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=query.json \
--searchBody='{"query":{"term":{"username":"admin"}}}'
You can also load the query from a file using --searchBody=@/data/searchbody.json, essential for complex queries that don’t fit in a command line.
Gotcha
The version warnings in the README aren’t just formalities—they’re breaking changes. Files created with elasticdump v0.x won’t load with v1.0+. The format changed completely, and there’s no migration tool. If you have old backup files, you need to keep the old elasticdump version around or manually transform the JSON structure.
Version 6.1.0’s parallel processing improvement comes with a documented tradeoff: the ordering is no longer guaranteed. If your application logic depends on document order (common in time-series data, event sourcing, or when document IDs are auto-generated based on insertion sequence), this changes behavior. The documents arrive, but not in the order you expect.
The multi-step process for complete index migrations (analyzer → mapping → data) isn’t atomic. If your analyzer dump succeeds but the mapping dump fails halfway through, you’re left with a partially configured index. Elasticdump doesn’t track state or offer rollback. You need to script cleanup logic yourself or accept that failed migrations leave debris.
Version 5.0.0 removed the s3Bucket and s3RecordKey parameters in favor of s3urls, breaking existing scripts. Version 2.0.0 removed bulk options entirely. Each major version tends to introduce breaking changes that require script updates.
Verdict
Use elasticdump if you’re migrating between different Elasticsearch versions, moving clusters across cloud providers, need query-filtered exports, or want a backup solution that works with both Elasticsearch and OpenSearch. Its 7,900+ stars reflect widespread adoption from ops teams dealing with real-world migration complexity. The multi-transport support (cluster, file, S3, stdin/stdout) makes it highly flexible. Skip it if you need guaranteed document ordering in versions 6.1.0+ (the parallel processing changes this), require single-command atomic migrations (you’ll need to script the multi-step analyzer→mapping→data flow yourself), or you’re just doing same-version backups within a cluster (native snapshot/restore may be simpler). For very large datasets, test performance against your specific requirements, as the JavaScript runtime and JSON serialization may have overhead compared to native tools.