Back to Articles

Cortex: A Compositional Framework for Recurrent Memory Stacks in PyTorch

[ View on GitHub ]

Cortex: A Compositional Framework for Recurrent Memory Stacks in PyTorch

Hook

What if building a mixture-of-experts LSTM stack was as simple as specifying a config, and swapping in mLSTM or attention-based memory required changing just one line?

Context

Modern AI systems—particularly agents and long-context applications—require sophisticated memory mechanisms. But PyTorch’s built-in recurrent modules conflate temporal computation with architectural patterns like residual connections, projections, and normalization. Want to add gating around your LSTM? You’re rewriting boilerplate. Want to stack multiple different recurrent mechanisms as routed experts? You’re managing state dictionaries by hand and wiring dimensions manually.

Cortex tackles this by separating concerns at the architecture level. It distinguishes between cores (the primitive stateful computation like LSTM or mLSTM) and scaffolds (the projection/gating/residual wrapper around cores). This separation enables compositional memory stacks where you can mix and match temporal mechanisms, add MoE-style routing between them, and nest them in deep hierarchies—all while maintaining a uniform stateful interface. The library also introduces ‘fabric,’ a biologically-inspired spatial recurrent substrate where cells communicate on a 2D/3D lattice rather than in sequential layers, offering an alternative paradigm for agent memory systems.

Technical Insight

Cortex’s architecture revolves around five key abstractions that compose cleanly. At the bottom are cores—runtime stateful units like LSTMCore, mLSTMCore, or XLCore that handle recurrence or attention-style memory. One level up, scaffolds wrap cores with projection logic, gating, residuals, and normalization (examples: PreUpScaffold, PostUpScaffold, AdapterScaffold). The public-facing cell combines a scaffold and core as a configuration object (like AxonCellConfig() or mLSTMCellConfig()). Then columns implement routed mixtures of expert scaffolds for MoE-style parallel execution, and stacks layer multiple columns into deep recurrent backbones.

The critical design insight is the uniform interface. Every abstraction—core, scaffold, column, stack—shares the same signature:

def forward(
    x: Tensor,
    state: TensorDict | None,
    *,
    resets: ResetMask | None = None,
) -> tuple[Tensor, TensorDict | None]:
    ...

Inputs are [B, T, H] for sequence mode or [B, H] for step mode. State is always a TensorDict, with nesting determined by abstraction level. Cores store flat state like {"h": ..., "c": ...}, scaffolds wrap core state under the core’s class name ({"LSTMCore": {"h": ..., "c": ...}}), and columns/stacks nest further. The optional resets mask handles episode boundaries, automatically propagating through the stack to each core. This uniformity means you can swap a single core for an entire multi-layer mixture-of-experts stack without changing the calling code.

Scaffolds also implement automatic hidden dimension inference. Instead of manually specifying every intermediate dimension, you provide d_hidden and the scaffold calculates projections, gates, and output sizes. This reduces configuration errors and accelerates prototyping. The library supports multiple backend implementations (Triton, CUDA, PyTorch) per core type, selectable via set_backend(), enabling performance optimization without architectural changes.

Beyond layered stacks, Cortex includes fabric—a radically different substrate. A fabric is a spatial recurrent medium: cells arranged on a 2D or 3D lattice with sparse local connectivity. Instead of information flowing layer-by-layer, cells communicate with nearby neighbors over multiple timesteps. The fabric anatomy defines lattice shape, local radius, and boundary ports. Cells are assigned from families like sLSTM or AxonCell. The visible state state["cells"] with shape [B, N, H_cell] persists across timesteps for message passing. This design is inspired by cortical patches, where boundary cells act as sensory/readout interfaces and information propagates spatially through the substrate. The fabric supports two execution modes: ‘stream’ mode provides canonical exact semantics processing one timestep at a time, while ‘diffusion’ mode offers a batched alternative that is faster but not semantically identical to exact streaming when fabric propagation over time matters—a design consideration requiring awareness of the correctness-performance tradeoff.

The library’s modularity shines in routed mixtures. A Column wraps multiple expert scaffolds, each potentially using different cores (one expert might be mLSTM, another XL-style attention). A router selects and mixes their outputs. Stacks layer these columns, creating deep MoE memory systems. The state management is fully automatic—each expert’s state is tracked in the column’s TensorDict, indexed by scaffold type and position. This architectural flexibility lets researchers rapidly experiment with hybrid memory designs that would require substantial manual state-wrangling in vanilla PyTorch.

Gotcha

Cortex’s biggest limitation is its early-stage maturity. With limited adoption indicators (6 stars on GitHub), you’re adopting a project with a smaller user base to draw upon for community support and shared experience. The README serves as the primary documentation resource, which may require consulting the source code for advanced use cases beyond what’s explicitly documented.

There’s a potential point of friction in the packaging: the library is published as cortexcore but imported as cortex. While the README explicitly documents this (pip install cortexcore / import cortex), it’s worth noting during initial setup. The fabric substrate, while architecturally interesting, presents a design consideration around execution modes. The README notes that ‘diffusion’ mode—faster than ‘stream’ mode—is not semantically identical to exact streaming when fabric propagation over time matters, meaning optimization choices require awareness of the correctness-performance tradeoff. For applications requiring deterministic results, understanding these semantic differences is important. Finally, the library targets researchers prototyping novel memory architectures. If you just need standard LSTM/GRU for straightforward recurrent applications, PyTorch’s built-in modules come with more extensive community resources and established usage patterns.

Verdict

Use Cortex if you’re a researcher or advanced practitioner building custom recurrent memory systems and need compositional modularity. It excels at rapid prototyping when you want to experiment with different temporal mechanisms (LSTM vs. mLSTM vs. attention-based cores), stack them in MoE configurations, or explore spatial recurrent architectures via fabrics. The clean separation of cores from scaffolds and uniform stateful interface makes it easy to swap components and build deep memory stacks without manual state-wrangling. It’s particularly valuable if you’re exploring hybrid agent memory designs that combine multiple recurrent mechanisms. Skip Cortex if you need widely-adopted components with extensive community resources and established usage patterns. The library’s limited adoption indicators (6 stars) suggest an early-stage project. Also skip it for standard LSTM/GRU applications where PyTorch’s built-in modules suffice—Cortex’s abstractions add complexity that’s only justified when you need the compositional flexibility.

// QUOTABLE

What if building a mixture-of-experts LSTM stack was as simple as specifying a config, and swapping in mLSTM or attention-based memory required changing just one line?

[ Tweet This ]
// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/metta-ai-cortex.svg)](https://starlog.is/api/badge-click/developer-tools/metta-ai-cortex)