Back to Articles

ElastAlert: The Archived Python Framework That Taught Elasticsearch To Send Alerts

[ View on GitHub ]

ElastAlert: The Archived Python Framework That Taught Elasticsearch To Send Alerts

Hook

Before Elasticsearch had native alerting, Yelp’s engineering team needed a way to detect when things went wrong in their growing data streams—so they built a Python framework that could detect spikes, anomalies, and flatlines by asking the right questions at the right time.

Context

In the early 2010s, the ELK stack (Elasticsearch, Logstash, Kibana) became the de facto standard for log aggregation and search. Teams could visualize their data beautifully in Kibana dashboards, but there was a critical gap: no built-in way to get notified when patterns emerged in that data. You could see a traffic spike in Kibana after it happened, but you couldn’t get paged when it started.

Yelp faced this exact problem. As stated in their README, they were managing an ‘ever increasing amount of data and logs’ in Elasticsearch and quickly realized Kibana ‘needed a companion tool for alerting on inconsistencies.’ Commercial solutions weren’t flexible enough for their diverse use cases. So they built ElastAlert: a Python framework that periodically queries Elasticsearch, applies pattern-matching rules, and dispatches alerts to whatever systems teams already used. The principle was simple: ‘If you can see it in Kibana, ElastAlert can alert on it.’ It became popular enough to gather 8,000+ GitHub stars before Yelp archived it in favor of a community-maintained fork. While you shouldn’t use the original today, understanding its architecture reveals elegant solutions to time-series pattern detection that remain relevant.

Technical Insight

Alert Outputs

Rule Types

Load Rules

Execute Query

Raw Events

frequency/spike/flatline

State Tracking

Match Found

email

slack

pagerduty

Last Query Time

YAML Rule Configs

Rule Scheduler Loop

Elasticsearch Indices

Rule Type Engine

Pattern Matcher

State Store

Alert Dispatcher

Email

Slack

PagerDuty

System architecture — auto-generated

ElastAlert’s core insight is that most alerting needs fit into a handful of temporal patterns. Instead of writing custom queries for each alert, you define rules using pre-built types that encapsulate common monitoring paradigms. A rule configuration is just YAML:

name: High Error Rate
type: frequency
index: logstash-*
num_events: 50
timeframe:
  minutes: 5
filter:
- term:
    level: "ERROR"
alert:
- email
email:
- ops-team@company.com

This rule checks Elasticsearch periodically, queries a time window of data, and alerts if 50+ ERROR-level events occurred. The frequency rule type handles the time-window logic, state tracking, and duplicate suppression automatically.

Under the hood, ElastAlert runs a continuous polling loop. For each rule, it appears to maintain a cursor tracking the last queried timestamp, executes the Elasticsearch query for the time window, then passes results to the rule type’s matching logic. The spike rule type is particularly clever—it compares event counts between a reference window and the current window to detect sudden increases:

name: Traffic Spike Detection
type: spike
index: nginx-logs-*
timeframe:
  minutes: 15
threshold_ref: 2
threshold_cur: 3
spike_height: 3
spike_type: up
filter:
- term:
    status: 200

This detects when current request volume is 3x the baseline from 15 minutes ago. The architecture separates ‘what to query’ (the filter), ‘what pattern to detect’ (rule type), and ‘where to send alerts’ (alert handlers). Want to send the same spike alert to both Slack and PagerDuty? Just list both in the alert array—both are supported according to the README’s list of built-in alert types.

The new_term rule type demonstrates sophisticated state management. It tracks unique values for a field and alerts when a previously unseen value appears—perfect for detecting new error messages or unauthorized access patterns:

name: New Error Types
type: new_term
index: app-logs-*
fields:
- error.type
alert:
- slack
slack_webhook_url: "https://hooks.slack.com/services/YOUR/WEBHOOK"

ElastAlert appears to maintain internal state to track which terms it’s seen, when rules last ran, and any errors. This persistence enables it to survive restarts without losing context about historical patterns.

Extensibility is built in through Python classes. Custom rule types and alert handlers can be ‘easily imported or written’ according to the README. The framework handles all the querying, scheduling, and error handling—you just write the pattern-matching or notification logic.

The cardinality rule type showcases another pattern: alerting on the number of unique values rather than event counts. This detects scenarios like ‘more than 100 unique users experienced errors’ or ‘fewer than 5 servers are reporting metrics’ (possible infrastructure failure). ElastAlert translates these high-level patterns into Elasticsearch aggregation queries, sparing you from writing complex DSL by hand.

Gotcha

The elephant in the room: this repository is officially archived and unmaintained. Yelp explicitly directs users to ElastAlert2, a community fork that continues development. The README’s first line states in bold: ‘ElastAlert is no longer maintained. Please use ElastAlert2 instead.’ Using the original version means no security patches, no compatibility updates for newer Elasticsearch versions, and no bug fixes.

Even in its active days, ElastAlert had architectural constraints. The polling model introduces inherent latency—you might not detect an issue immediately after it starts. This is acceptable for log analysis but potentially problematic for high-frequency metrics or SLA-sensitive systems. The file-based configuration also becomes unwieldy at scale. Managing hundreds of YAML files across teams, tracking changes, and handling secrets requires building tooling around ElastAlert. Third-party tools emerged to address this—the README mentions a Kibana plugin available at the ‘ElastAlert Kibana plugin repository’—but these are external dependencies with their own maintenance considerations. Finally, the single-process nature of the tool means that if the process fails, alerts stop, requiring external orchestration for high-availability setups.

Verdict

Skip this tool entirely—it’s archived and deprecated. Use if: absolutely never for new projects; the only exception is if you’re on a legacy deployment with complex custom rule types that can’t be immediately migrated, but even then, plan your migration to ElastAlert2 immediately. The README explicitly states ‘Please use ElastAlert2 instead.’ ElastAlert2 maintains compatibility while supporting modern Elasticsearch versions and active development. For greenfield projects, also evaluate Elasticsearch’s native alerting capabilities if available in your license tier, or other modern observability platforms. ElastAlert’s architectural patterns—separating query definition from pattern matching from notification—remain instructive for anyone building event-driven systems, but the original implementation is a museum piece, not a production tool.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/yelparchive-elastalert.svg)](https://starlog.is/api/badge-click/developer-tools/yelparchive-elastalert)