Back to Articles

Inside Cucumber's Polyglot Architecture: How Six Libraries Power BDD Across Languages

[ View on GitHub ]

Inside Cucumber’s Polyglot Architecture: How Six Libraries Power BDD Across Languages

Hook

Most developers think of Cucumber as a single testing framework, but it’s actually six independent libraries that communicate through a JSON protocol—and that architectural choice changes everything about how BDD scales across languages.

Context

Behavior-Driven Development promised to bridge the gap between business stakeholders and developers by writing tests in plain English. Cucumber made that promise real with Gherkin, a human-readable language for test scenarios. But as development teams adopted different programming languages, they faced a dilemma: either maintain separate BDD tools for each language or find a way to share the core logic.

Cucumber chose the harder path: build a modular, polyglot architecture where parsing, pattern matching, and messaging components work identically across languages. The cucumber/common repository serves as the organizational hub for this distributed system. It’s not a code repository in the traditional sense—it contains no executable files. Instead, it’s a coordination point for issues that span the six core libraries that make up Cucumber’s architecture, each maintained in its own polyglot repository with implementations across multiple languages.

Technical Insight

parse

AST

matched steps

emits

stream

consume

manipulate

filter

.feature Files

(Gherkin Syntax)

Gherkin Parser

Cucumber Expressions

(Step Matching)

Test Runner

(Any Language)

Messages Protocol

(JSON)

Query API

(Search Results)

Reporters & Tools

(Any Language)

Gherkin Utils

Tag Expressions

System architecture — auto-generated

Cucumber’s architecture separates concerns into six specialized libraries, each solving a distinct problem in the BDD workflow. The gherkin library parses .feature files—Gherkin’s human-readable test scenario format—into structured data that other tools can consume. The parser appears to be implemented across multiple languages, designed so a .feature file works unchanged across different language environments.

The cucumber-expressions library handles pattern matching for connecting Gherkin steps to code. Rather than relying solely on regex patterns, it provides an expression syntax for defining step definitions with typed parameters, making step definitions less coupled to string parsing logic.

The breakthrough architectural decision is the messages library—a JSON-based protocol that enables language-agnostic communication. When Cucumber runs tests, it appears to emit standardized JSON messages describing test execution: which scenarios ran, which steps passed or failed, timing data, and error details. This allows test runners in one language to generate messages that reporting tools in another language can understand.

The query library provides an API for searching through message streams, letting tools extract information from test results without knowing which language generated them. The gherkin-utils library offers utilities for manipulating parsed Gherkin documents programmatically, useful for tools that generate or modify feature files.

Finally, tag-expressions parse filtering queries like @smoke and not @wip to determine which scenarios to run. The library implements expression parsing with defined precedence rules, ensuring filters behave consistently across implementations.

What makes this polyglot approach work is consistency across implementations. The cucumber/common repository exists because some issues affect every implementation and need coordinated discussion across all the component libraries.

Gotcha

The cucumber/common repository itself contains no executable code. If you clone it expecting to find Cucumber’s source code, you’ll be disappointed—it’s purely organizational infrastructure for cross-cutting issues. This can be confusing for new contributors who need to navigate to the six different component repositories to find the actual code they want to modify. The README provides links, but there’s no master repository where all the implementation pieces come together.

The polyglot architecture, while elegant in theory, likely creates maintenance burden in practice. When a new feature is added to cucumber-expressions, it needs implementation across multiple languages. Feature parity across implementations appears to be an ongoing challenge. If you’re contributing to Cucumber, expect to potentially work across multiple language codebases even for seemingly isolated changes. The modular design also means there’s no single “Cucumber repository” to star, watch, or contribute to—engagement is fragmented across the six component repositories, which may make it harder to build a unified community.

Verdict

Use if: You’re reporting an issue that affects multiple Cucumber components and you’re unsure which specific library is responsible, or you want to understand Cucumber’s overall architecture before diving into implementation details. This repository is the right starting point for cross-cutting concerns. Skip if: You need actual code, implementation examples, language-specific documentation, or you’re ready to contribute code changes. Head directly to the component-specific repositories (cucumber-expressions, gherkin, messages, tag-expressions, query, or gherkin-utils) where the actual implementation work happens. Also skip if you’re building a simple BDD setup for a single-language project—you don’t need to understand this organizational structure to use Cucumber effectively.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/cucumber-common.svg)](https://starlog.is/api/badge-click/developer-tools/cucumber-common)