Back to Articles

Building Chain-of-Thought Reasoning for Any LLM in Two Files

[ View on GitHub ]

Building Chain-of-Thought Reasoning for Any LLM in Two Files

Hook

What if you could make a local 9B parameter model solve reasoning puzzles it normally fails—using nothing but clever API orchestration and minimal code?

Context

When OpenAI released GPT-4 with extended reasoning capabilities, it showcased how chain-of-thought (CoT) prompting could dramatically improve LLM performance on complex tasks. Models that would confidently miscount the letter ‘r’ in ‘strawberry’ could suddenly solve the problem when given space to ‘think’ through intermediate steps. But this capability seemed locked behind proprietary APIs and expensive tokens.

ReflectionAnyLLM emerged from a Reddit thread where developer antibitcoin shared their approach after receiving a threatening email from OpenAI about API usage patterns. The project demonstrates a counterintuitive insight: you don’t need special model training or proprietary reasoning modes to implement chain-of-thought—you just need to orchestrate multiple API calls with the right prompt structure. This two-file application (one HTML frontend, one PHP backend) proves that any LLM supporting OpenAI’s API format can be retrofitted with iterative reasoning, whether it’s running locally via LM Studio, remotely through Groq, or on any other compatible endpoint.

Technical Insight

Query

Chat History

30 msgs

POST Request

Step 1: Initial Query

Thought 1

Step 2: + Previous Thought

Thought 2

Steps 3-10

Iterative Reasoning

Final Answer + Chain

Reasoning Chain

+ Summary

Collapsible

Thought Process

User Browser

HTML Frontend

index.html

PHP Backend

chat.php

OpenAI-Compatible

LLM API

System architecture — auto-generated

The architecture is deceptively simple but architecturally interesting. The frontend sends a user query to chat.php, which orchestrates up to 10 sequential reasoning steps. Each API call builds on the previous thought, with the LLM itself deciding when it has reasoned sufficiently to provide a final answer.

The README describes the reasoning process working through dynamic steps determined by the LLM itself, with a configurable limit of 10 steps that can be increased by editing chat.php. The iterative approach allows the model to reference its own previous reasoning steps. When solving “How many R’s are in strawberry?”—a problem the README specifically mentions—models larger than 8 billion parameters can break down their thinking across multiple steps rather than rushing to an incorrect answer.

The frontend manages chat history with a 30-message sliding window to maintain concise history and optimize performance, as documented in the README. Each reasoning chain is collapsible in the UI—the full thought process is hidden by default, with only a summary displayed for quick reference. This keeps the interface clean while preserving transparency.

What makes this particularly elegant is the provider-agnostic design. The chat.php file requires configuration of the API endpoint URL and API key. Switching from a local LM Studio instance to Groq’s cloud API or OpenRouter requires changing these settings. The OpenAI-compatible API standard has created an unexpected benefit: reasoning systems built for one provider work universally.

The project demonstrates prompt engineering patterns that maximize reasoning quality through forced iteration. Rather than asking the LLM to “think step by step” in a single prompt, the sequential API calls create actual checkpoints where each thought must be coherent enough to inform the next step. This external scaffolding compensates for models that haven’t been explicitly trained for chain-of-thought reasoning.

One implementation detail: the README indicates the project requires PHP with the curl extension enabled for handling API requests. The architecture appears to use synchronous processing, which simplifies implementation but means users wait for all reasoning steps to complete before seeing output.

Gotcha

The README is admirably honest about this being a prototype, and the limitations are significant. The backend code is explicitly described as “not secured” and “not suitable for production environments.” The developer warns it was created quickly to demonstrate basic functionality and lacks robust security practices. Exposing chat.php directly to the internet would be risky.

The fixed 10-step reasoning limit is a hard constraint that must be modified by editing the chat.php file rather than adjusting a configuration parameter. For truly complex problems requiring deeper thought chains (mathematical proofs, multi-step code debugging), you’ll hit this ceiling and need to modify the source code.

The architecture appears to lack streaming support based on the simple two-file structure, meaning users likely get no feedback during potentially long waits for reasoning to complete. On a slow local model or rate-limited API, this creates a black-box experience. The README’s online demo warning notes it may “occasionally stop working if usage limits are exceeded,” suggesting the system doesn’t handle rate limiting gracefully.

The project keeps track of the last 30 messages according to the README, but the storage mechanism isn’t specified. The simple structure (just index.html and chat.php with no database requirement mentioned) suggests sessions may not persist across browser restarts, though the README does mention a downloadable chat history feature in JSON format.

The README explicitly notes this was “developed as a quick demo” and is a “prototype” that is “by no means polished or optimized,” setting appropriate expectations for code quality and feature completeness.

Verdict

Use if you’re exploring how chain-of-thought reasoning works under the hood, want a minimal starting point for building your own reasoning wrapper around local LLMs, or need a quick proof-of-concept that demonstrates iterative prompting techniques without framework overhead. This is ideal for educational purposes, understanding the mechanics of reasoning systems, or rapid prototyping where you control the entire environment. The project successfully demonstrates that models larger than 8B parameters can solve problems like the “strawberry count” using this approach. Skip if you need production-grade security (the developer explicitly warns against production use), sophisticated error handling, or reasoning chains deeper than 10 steps without code modification. Also skip if you’re looking for a complete application rather than a learning tool—frameworks like LangChain or LlamaIndex provide far more robust implementations of similar concepts with proper abstractions, though at the cost of much greater complexity. The project serves its stated purpose as a lightweight proof-of-concept but requires significant hardening for real-world deployment.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/llm-engineering/antibitcoin-reflectionanyllm.svg)](https://starlog.is/api/badge-click/llm-engineering/antibitcoin-reflectionanyllm)