Back to Articles

ClauDEX: Building a Deterministic Control Plane for Probabilistic AI Agents

[ View on GitHub ]
23
AI-Assisted Full Provenance Report →
Claude Code
AI Provenance badge [![AI Provenance](https://starlog.is/badge/provenance/juanandresgs/claude-ctrl.svg)](https://starlog.is/provenance/juanandresgs/claude-ctrl)

ClauDEX: Building a Deterministic Control Plane for Probabilistic AI Agents

Hook

Your AI coding assistant will eventually ignore your instructions—not out of malice, but because prompts degrade under context pressure. What if instead of asking nicely, you enforced constraints mechanically?

Context

The AI coding assistant landscape has exploded with tools like GitHub Copilot, Cursor, and Claude Code promising to accelerate development. But there's a fundamental problem: these systems are probabilistic at their core. You can prompt Claude to "never modify database schemas" or "always write tests," but as context windows fill and task complexity increases, prompt-based constraints erode. The model forgets, hallucinates, or reinterprets instructions in creative ways you didn't anticipate.

This isn't a failure of the models—it's their nature. LLMs are prediction engines, not rule executors. The industry's response has largely been better prompts, larger context windows, and fine-tuning. But juanandresgs/claude-ctrl takes a radically different approach: treat AI agents as untrusted probabilistic systems that require a deterministic enforcement layer. ClauDEX (the project's internal name) wraps Claude Code with a typed Python runtime, SQLite state management, and git hooks that mechanically enforce policies before code ever hits your repository. It's not about making Claude smarter—it's about making the system Claude operates within predictably safe.

Technical Insight

ClauDEX's architecture revolves around three core primitives: hooks as event adapters, a policy engine as the decision boundary, and SQLite as the single source of truth. When Claude Code performs any operation—running bash commands, executing git operations, modifying files—git hooks intercept these events and normalize them into a standard format. These normalized events get passed to the cc-policy CLI, which consults the policy engine and queries SQLite state to make allow/deny/warn decisions. This isn't post-hoc analysis; it's real-time enforcement at the mechanical layer.

Here's what a simplified policy check looks like in practice:

# Hook intercepts Claude attempting to run a command
# cc-policy evaluates against current system state

from claudex.policy import PolicyEngine, EventType
from claudex.state import StateManager

engine = PolicyEngine(db_path=".claudex/state.db")
state = StateManager()

# Normalized event from hook
event = {
    "type": EventType.GIT_COMMIT,
    "agent_role": "implementer",
    "target_branch": "main",
    "worktree": "/tmp/feature-123"
}

# Policy evaluation with first-deny-wins semantics
decision = engine.evaluate(event, state.get_context())

if decision.action == "deny":
    print(f"BLOCKED: {decision.reason}")
    sys.exit(1)
elif decision.action == "warn":
    print(f"WARNING: {decision.reason}")
    # Log to audit trail but allow
    state.record_warning(event, decision)

The policy engine uses first-deny-wins evaluation, meaning a single violated constraint blocks the entire operation. This differs from traditional allow-lists or deny-lists by composing rules as explicit capability grants. The Implementer role can write to isolated worktrees but cannot directly commit to main. The Reviewer role can approve changes but cannot modify code. The Codex role is entirely read-only—a CLI critic that reviews code for convergence without the ability to change anything.

This role-based isolation creates a multi-agent workflow loop that's both auditable and mechanically enforced. The Planner defines scope and creates structured task definitions stored in SQLite. The Guardian provisions isolated git worktrees—separate filesystem trees attached to the same repository, preventing cross-contamination between concurrent tasks. The Implementer writes code within its worktree prison. The Codex agent reviews for convergence by checking structured completion records: "Does this implementation match the plan? Are tests passing? Is the diff reasonable?" The Reviewer makes the final human-or-AI validation decision. Finally, the Guardian lands approved changes by merging worktrees back to the main branch.

Each state transition—task created, worktree provisioned, code committed, review completed—gets recorded in SQLite with timestamps, agent identifiers, and event metadata. This creates a complete audit trail of what every agent attempted, what was allowed, what was blocked, and why. You can query this history to understand system behavior over time:

-- Find all denied operations in the last 24 hours
SELECT agent_role, event_type, reason, timestamp 
FROM policy_decisions 
WHERE action = 'deny' 
  AND timestamp > datetime('now', '-1 day')
ORDER BY timestamp DESC;

-- Analyze which agents trigger the most warnings
SELECT agent_role, COUNT(*) as warning_count
FROM policy_decisions
WHERE action = 'warn'
GROUP BY agent_role
ORDER BY warning_count DESC;

The system implements what the documentation calls "cybernetic governance"—Self-Evaluating Self-Adaptive Programs (SESAP). This isn't just marketing jargon; it's a specific architectural pattern where the control plane observes system behavior, compares it against expected patterns, and adapts enforcement policies based on actual outcomes. If the Implementer repeatedly triggers convergence failures on a specific task type, the Planner can adjust how it decomposes similar tasks in the future. If certain git operations consistently require Guardian intervention, policy rules can be tightened proactively.

The v5.0 release introduced abstraction layers to reduce coupling to Claude Code's specific hook system, but the fundamental model remains: hooks as the enforcement surface, typed Python runtime for policy evaluation, SQLite for state persistence, and multi-agent orchestration with explicit capability boundaries. The project is self-hosting, meaning ClauDEX's own development uses ClauDEX as the control plane—a form of dogfooding that validates whether the governance model actually works for complex software projects.

Gotcha

The biggest limitation is tight coupling to Claude Code's ecosystem. Despite v5.0's abstraction efforts, porting ClauDEX to work with Cursor, Aider, or GPT-Engineer would require reimplementing substantial portions of the event adapter layer. Each AI coding tool has different hook points, execution models, and state management approaches. The git hook system is flexible, but not every tool even uses git as its primary coordination mechanism.

Operational overhead is significant. You're running a Python runtime, maintaining a SQLite database, managing multiple git worktrees, and orchestrating multi-agent workflows for what might be simple code changes. A one-line bug fix goes through Planner → Guardian → Implementer → Codex → Reviewer → Guardian instead of just making the edit. For small projects or rapid prototyping, this is overkill. The learning curve is steep—the documentation assumes familiarity with systems thinking, cybernetic control theory, and git worktree mechanics. Terms like "capability isolation," "convergence critics," and "self-evaluating programs" create conceptual barriers for developers who just want safer AI assistance. The philosophical framing, while intellectually interesting, can obscure practical usage patterns for newcomers who need concrete examples before abstract theory.

Verdict

Use ClauDEX if: you're building production systems where AI coding agents need deterministic safety guarantees, you're working on regulated codebases where audit trails matter more than velocity, you're researching governance frameworks for autonomous AI systems, or you're a systems thinker who values mechanical correctness over prompt engineering folklore. Skip if: you need simple AI pair programming without architectural overhead, you're using non-Claude coding assistants and don't want to build custom adapters, your projects prioritize iteration speed over enforcement rigor, or you prefer lightweight prompt strategies to complex multi-agent orchestration. This is a power tool for teams treating AI agents as untrusted components in safety-critical systems—not a drop-in replacement for Copilot.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-agents/juanandresgs-claude-ctrl.svg)](https://starlog.is/api/badge-click/ai-agents/juanandresgs-claude-ctrl)