Back to Articles

Hyperscan: How Intel Matches 50,000 Regex Patterns at 10+ Gigabits Per Second

[ View on GitHub ]

Hyperscan: How Intel Matches 50,000 Regex Patterns at 10+ Gigabits Per Second

Hook

While most regex engines struggle to process more than a few hundred megabits per second, Hyperscan can scan network traffic at 10+ Gbps while simultaneously matching 50,000 patterns. The secret isn't just clever algorithms—it's hardware-aware compilation.

Context

Network security has always been an arms race between throughput and inspection depth. Traditional intrusion detection systems face a brutal tradeoff: either inspect traffic against comprehensive signature databases and become a bottleneck, or limit pattern matching to maintain line-rate speeds and miss threats. A typical Snort deployment might maintain 30,000+ rules for detecting malware, exploits, and policy violations, but conventional regex engines like PCRE evaluate patterns sequentially—checking each one individually against the data stream. At 10 Gbps, you have roughly 1.2 nanoseconds per byte. There's simply no time for sequential evaluation.

Intel developed Hyperscan specifically for this use case: deep packet inspection at wire speed. Rather than treating regex matching as a software problem, Hyperscan approaches it as a hardware acceleration challenge. The library compiles multiple patterns into hybrid automata structures optimized for Intel's SIMD instruction sets, enabling parallel evaluation of thousands of patterns simultaneously. It's used in production by major security vendors, software-defined networking platforms, and anywhere else that needs to ask "does this byte stream match any of these 10,000 patterns?" without becoming the performance bottleneck.

Technical Insight

Hyperscan's architecture rests on a deceptively simple insight: different regex patterns have different performance characteristics, and a one-size-fits-all approach leaves performance on the table. The library employs a compile-analyze-optimize pipeline that examines your pattern set and selects from multiple matching engines based on pattern complexity.

At compilation time, Hyperscan decomposes patterns into a graph of matching components. Simple literal strings might use a modified Aho-Corasick algorithm accelerated with SIMD instructions. Patterns with bounded repetition get compiled into DFA structures that can process multiple bytes per state transition. Complex patterns fall back to NFA evaluation, but even this path is optimized with SIMD parallel state evaluation. The genius is that a single scan can execute multiple engines simultaneously—literals processed with vector instructions while NFAs track more complex states.

Here's what a basic Hyperscan integration looks like:

#include <hs/hs.h>

// Compile multiple patterns into a database
const char *patterns[] = {
    "(GET|POST) /admin",
    "<script[^>]*>.*?</script>",
    "(?i)password\\s*=\\s*['\"]\\w+['\"]" 
};
unsigned int ids[] = {1, 2, 3};
unsigned int flags[] = {
    HS_FLAG_CASELESS,
    HS_FLAG_DOTALL | HS_FLAG_MULTILINE,
    0
};

hs_database_t *database;
hs_compile_error_t *compile_err;

if (hs_compile_multi(patterns, flags, ids, 3, 
                     HS_MODE_STREAM, NULL, 
                     &database, &compile_err) != HS_SUCCESS) {
    fprintf(stderr, "Compilation failed: %s\n", compile_err->message);
    hs_free_compile_error(compile_err);
    return -1;
}

// Create a scratch space for matching (reusable across scans)
hs_scratch_t *scratch = NULL;
hs_alloc_scratch(database, &scratch);

// Matching callback invoked for each match
int match_handler(unsigned int id, unsigned long long from,
                  unsigned long long to, unsigned int flags, void *ctx) {
    printf("Pattern %u matched at [%llu, %llu]\n", id, from, to);
    return 0; // Continue matching
}

// Open a stream for scanning continuous data
hs_stream_t *stream;
hs_open_stream(database, 0, &stream);

// Scan data chunks as they arrive
char *chunk1 = "GET /admin HTTP/1.1\r\n";
hs_scan_stream(stream, chunk1, strlen(chunk1), 0, scratch, match_handler, NULL);

char *chunk2 = "Host: vulnerable.site\r\n";
hs_scan_stream(stream, chunk2, strlen(chunk2), 0, scratch, match_handler, NULL);

// Close stream and cleanup
hs_close_stream(stream, scratch, match_handler, NULL);
hs_free_scratch(scratch);
hs_free_database(database);

The streaming API is crucial for network applications. Unlike traditional regex engines that require complete input strings, Hyperscan maintains state across scan calls, allowing you to feed it packet-sized chunks as they arrive. The library handles all the complexity of tracking partial matches across chunk boundaries.

The SIMD acceleration becomes visible when you examine pattern matching performance. For literal string patterns, Hyperscan uses vector instructions to compare 16-32 bytes simultaneously against multiple patterns. Instead of checking if byte[0] matches pattern A, then pattern B, then pattern C sequentially, it loads patterns into vector registers and performs parallel comparisons. With AVX-512, that's 64 bytes checked against multiple patterns in a single instruction.

The compilation phase is where Hyperscan makes its fundamental tradeoff. A database compiled from 10,000 patterns might consume 50-100MB of memory and take 30 seconds to compile, but once compiled, that database enables microsecond-latency scans across gigabytes of data. The pattern database is read-only and thread-safe, so typical architectures compile once at startup (or cache the compiled database to disk) and share it across worker threads.

Hyperscan also supports advanced features like approximate matching (matching with edit distance), logical combinations (pattern A AND pattern B must both match), and start-of-match reporting for tracking where matches began, not just where they ended. The HS_FLAG_SOM_LEFTMOST flag enables this, though it adds performance overhead—another example of Hyperscan's philosophy of making you explicitly opt into features with performance costs.

Gotcha

The x86-64 dependency isn't just a minor portability issue—it's architectural. Hyperscan's performance relies fundamentally on Intel SIMD instructions, and while a community fork called Vectorscan has added ARM NEON support, you're moving away from Intel's maintained codebase. If your infrastructure runs on ARM servers (increasingly common in cloud environments), you'll need to carefully evaluate whether Vectorscan meets your needs or accept significantly degraded performance.

Pattern compilation can become a serious operational challenge with large signature sets. I've seen production systems where compiling 40,000 patterns requires 8GB of RAM and 2-3 minutes of CPU time. This makes dynamic pattern updates problematic—you can't just reload patterns on the fly without careful architecture planning. The typical solution is blue-green pattern databases: compile the new database in a background process, then atomically swap it in. But this doubles your memory footprint during the transition. Budget for 3-4x your compiled database size in available RAM during updates. Also be aware that Hyperscan doesn't support all PCRE constructs. Backreferences are the most notable omission—patterns like (\w+)\s+\1 (matching a repeated word) won't compile. The library will reject these patterns at compilation time with clear error messages, but if you're migrating from a PCRE-based system, you'll need to audit your pattern set for unsupported features.

Verdict

Use if: You're building network security infrastructure, DPI systems, log analysis platforms, or any application that needs to match dozens to thousands of regex patterns simultaneously against high-throughput data on x86-64 hardware. The compilation overhead is acceptable when patterns change infrequently (hours to days), and you need the absolute best runtime performance. It's particularly compelling when you're currently bottlenecked on pattern matching with traditional engines.

Skip if: You're matching single patterns or just a handful of rules where PCRE2 or RE2 would suffice. You need full PCRE compatibility including backreferences. You're deploying on ARM architecture without accepting the complexity of Vectorscan. Your patterns change frequently (multiple times per minute) and you can't absorb the compilation latency. You're doing application-level text processing where the pattern set is small and throughput requirements are measured in megabytes, not gigabits. For these cases, you're paying Hyperscan's complexity tax without receiving the performance benefits that justify it.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/intel-hyperscan.svg)](https://starlog.is/api/badge-click/developer-tools/intel-hyperscan)