Back to Articles

System 2 Research: Mapping the Bridge Between Cognitive Science and LLM Reasoning

[ View on GitHub ]

System 2 Research: Mapping the Bridge Between Cognitive Science and LLM Reasoning

Hook

While the AI community obsesses over scaling laws and parameter counts, a curated collection is quietly reminding us that the hardest problems in AI reasoning were identified decades ago—and the solutions might already exist in cognitive science literature we’ve forgotten.

Context

The rapid advancement of large language models has created a paradox: systems that can write poetry and pass bar exams still struggle with basic multi-step reasoning tasks that humans handle effortlessly. This gap exists because LLMs excel at pattern matching (what cognitive scientist Daniel Kahneman calls “System 1” thinking—fast, intuitive, automatic) but fail at deliberative reasoning (“System 2”—slow, effortful, logical). The open-thought/system-2-research repository emerged as a response to this disconnect, serving as a bridge between classical cognitive architecture research and today’s cutting-edge LLM agent papers.

Unlike typical awesome-lists that sprawl across broad domains, this repository laser-focuses on one critical question: how do we build AI systems capable of genuine deliberative reasoning? It combines foundational cognitive architectures like SOAR and ACT-R with recent papers on self-improving agents and multi-agent architectures from 2025. The repository’s value lies not in code or tutorials, but in providing researchers and practitioners a structured entry point into decades of work that directly addresses current LLM limitations. For anyone building agent systems, understanding this lineage isn’t academic trivia—it’s essential context for avoiding reinventing wheels and recognizing which “novel” LLM techniques may be rediscoveries of cognitive science principles.

Technical Insight

Classical Foundations

Modern Research

Foundational Concepts

ReAct, Planning

Self-Rewarding, CoT

SOAR, ACT-R,

SPAUN

Architecture Patterns

Classical Cognitive

Architectures

System 2 Reasoning

Framework

LLM-Based

Agents

Reasoning

Improvements

Agent Papers

Collection

Working Memory &

Goal Hierarchies

Knowledge Repository

for Researchers

System architecture — auto-generated

The repository’s architecture reveals a deliberate conceptual framework that maps the landscape of reasoning research. At the foundation sits the “Cognitive Architectures” section, featuring systems like SOAR (State, Operator, And Result) by John Laird and Allen Newell, ACT-R (Adaptive Control of Thought-Rational) by John Anderson, and SPAUN (Semantic Pointer Architecture Unified Network) by Chris Eliasmith. These aren’t mere historical curiosities—they represent working theories of how human cognition handles working memory, goal hierarchies, and procedural knowledge.

The transition to modern systems becomes visible in the “Agent Papers” section, which tracks the LLM-based agent explosion. Papers like “ReAct: Synergizing Reasoning and Acting in Language Models” explore how prompting strategies can enhance reasoning, while “Multi-agent Architecture Search via Agentic Supernet” (Feb 2025) shows evolution toward meta-level optimization. The repository captures a notable pattern: LLM agents appear to be converging on architectural patterns that cognitive architectures formalized decades ago. For example, “A Prefrontal Cortex-inspired Architecture for Planning in Large Language Models” explicitly draws from neuroscience to implement executive function in LLMs—effectively reimplementing concepts from EPIC (Executive Process/Interactive Control) architecture.

What makes this collection particularly valuable is its coverage of self-improvement mechanisms. Papers like “Self-Rewarding Language Models” and “Large Language Models Can Self-Improve At Web Agent Tasks” address the critical bottleneck of how agents evolve beyond their initial training. The repository also includes meta-level frameworks like KIX (KIX: A Metacognitive Generalization Framework), which attempts to formalize when and how agents should reflect on their own reasoning processes. While the repository doesn’t provide implementation guidance, it points to concrete resources—for instance, the ACE (Autonomous Cognitive Entity) framework links to daveshap/ACE_Framework on GitHub, which provides actual code for a six-layer cognitive architecture.

The organizational scheme itself teaches a lesson about reasoning system design: cognitive architectures provide the structural blueprint (working memory, goal management, learning mechanisms), while agent papers demonstrate how to instantiate these patterns with LLMs. A developer building an agent for software engineering tasks could trace a path from foundational architectures through “Cognitive Architectures for Language Agents” to “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering”—seeing how established architectural concepts inform 2024 state-of-the-art systems. The repository doesn’t spell out these connections explicitly, requiring users to synthesize across papers, but the categorization makes these lineages discoverable.

Recent additions reveal active curation aligned with frontier research directions. Papers like “Trace is the New AutoDiff” and “TextGrad: Automatic ‘Differentiation’ via Text” represent a trend treating LLM agent workflows as differentiable computational graphs, enabling gradient-based optimization of reasoning chains. The “LLM Reasoning Improvements / Training on Synthetic Data” subsection includes papers like START (Self-taught Reasoner with Tools) and LADDER (Self-Improving LLMs Through Recursive Problem Decomposition), showing the collection’s coverage of both architectural and training-focused approaches.

Gotcha

The repository’s strength—being a pure link collection—is also its primary limitation. Users must invest significant time reading full papers to assess relevance, as most entries lack summaries or critical analysis beyond titles and author information. The cognitive architectures section notes it is “looking for additional links & articles and summaries,” indicating acknowledged incompleteness in providing context. If you need to understand whether a particular architecture would benefit your specific use case, you’ll need to parse dense academic prose without guidance on how it compares to alternatives.

More critically, this is exclusively a reference collection with zero executable code, implementation guides, or tutorials. Papers like “The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery” sound immediately applicable, but the repository won’t help you translate concepts into working systems. Some entries link to GitHub repositories (ACE Framework links to daveshap/ACE_Framework, AI-Scientist links to SakanaAI/AI-Scientist), but many papers have no associated implementation references. The cognitive architectures section provides links to resources like Wikipedia pages and official sites, which offer varying levels of technical depth. For practitioners needing to ship agent systems on tight deadlines, this collection offers intellectual breadth at the cost of practical depth.

Verdict

Use if you’re conducting literature reviews for agent/reasoning research, designing novel agent architectures and need to understand what’s been tried before, or building a mental model of how classical AI and modern LLMs connect. This repository excels as a structured starting point that helps prevent reinventing established concepts—it shows you what’s been explored. It’s particularly valuable for research teams who can afford the time investment to trace conceptual lineages and synthesize across decades of work. Skip if you need working implementations, tutorials, or practical guides for building agents—this won’t help you debug a ReAct loop or optimize agent-computer interfaces. Also skip if you need comparative analysis or opinionated filtering; the repository presents links without indicating which papers represent dead ends versus foundational breakthroughs. For hands-on development, pair this with implementation-focused resources like LangChain/LlamaIndex documentation for patterns, or Papers with Code for papers with associated codebases.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/open-thought-system-2-research.svg)](https://starlog.is/api/badge-click/developer-tools/open-thought-system-2-research)