// LATEST

Automation

Web-Shepherd: Teaching AI Agents to Navigate the Web With Interpretable Checklists

By Rob Ragan ★ 53 Python Feb 16, 2026
Cybersecurity

FuzzForge AI: When Your AI Assistant Becomes a Security Engineer

By Rob Ragan ★ 698 Python Feb 16, 2026
LLM Engineering

TOON: The Data Format That Makes LLMs 40% Cheaper to Feed

By Rob Ragan ★ 22.6k TypeScript Feb 16, 2026
Cybersecurity

BountyBench: Testing Whether AI Can Actually Hunt Vulnerabilities

By Rob Ragan ★ 64 Jupyter Notebook Feb 16, 2026
Cybersecurity

Dynamic Risk Assessment: Teaching AI Hackers When to Stop Before They Cause Real Damage

By Rob Ragan ★ 8 Python Feb 16, 2026
Cybersecurity

WASP: The First Security Benchmark That Proves Your AI Agent Can Be Hijacked

By Rob Ragan ★ 68 Python Feb 16, 2026
Data & Knowledge

ReasonRAG: Why Process Rewards Beat Outcome Rewards in Agentic Retrieval

By Rob Ragan ★ 9 Python Feb 16, 2026
AI Agents

GUARDIAN: Detecting When Your AI Agents Start Gaslighting Each Other

By Rob Ragan ★ 3 Python Feb 16, 2026
AI Agents

MCP Context Forge: Building an Enterprise Gateway for the Model Context Protocol

By Rob Ragan ★ 3.3k Python Feb 16, 2026
AI Agents

AgentAuditor: The AI Safety Tool Hiding Behind Academic Peer Review

By Rob Ragan ★ 4 Unknown Feb 16, 2026
AI Dev Tools

RAPTOR: Turning Claude Code Into a Security Research Agent With .claude.md Files

By Rob Ragan ★ 1.1k Python Feb 16, 2026
AI Agents

Happy: Turn Claude Code Into a Mobile-Controlled AI Agent With E2E Encryption

By Rob Ragan ★ 12.1k TypeScript Feb 16, 2026
AI Agents

TheAgentCompany: The First Benchmark That Makes AI Agents Get a Real Job

By Rob Ragan ★ 640 Python Feb 16, 2026
AI Agents

TTI: Training Web Agents That Get Smarter By Learning From Their Own Mistakes

By Rob Ragan ★ 72 Python Feb 16, 2026
AI Agents

AGI SDK: Building a Benchmark Where AI Agents Actually Shop on Amazon (Sort Of)

By Rob Ragan ★ 431 Python Feb 16, 2026
AI Agents

SPORT-Agents: Teaching Multimodal AI to Learn from Its Own Mistakes

By Rob Ragan ★ 19 Python Feb 16, 2026
AI Agents

RF-Agent: Teaching Language Models to Design Reward Functions Through Tree Search

By Rob Ragan ★ 3 Jupyter Notebook Feb 16, 2026
Cybersecurity

SEC-bench: Automated Benchmarking for LLM Security Agents in Real-World Vulnerability Scenarios

By Rob Ragan ★ 55 Python Feb 16, 2026
AI Agents

Superpowers: Teaching AI Agents to Stop Cowboy Coding

By Rob Ragan ★ 52.3k Shell Feb 16, 2026
Automation

Stagehand: The Browser Automation Framework That Writes Its Own Selectors

By Rob Ragan ★ 21.1k TypeScript Feb 16, 2026
Infrastructure

Steel Browser: The Missing Infrastructure Layer for AI Agent Automation

By Rob Ragan ★ 6.4k TypeScript Feb 16, 2026
Cybersecurity

HackingBuddyGPT: Teaching LLMs to Escalate Privileges in 50 Lines of Code

By Rob Ragan ★ 951 Python Feb 16, 2026
Cybersecurity

ARTEMIS: When AI Agents Hunt for Zero-Days While You Sleep

By Rob Ragan ★ 378 Rust Feb 16, 2026
AI Agents

LatentMAS: How Skipping Token Generation Makes Multi-Agent Systems 7× Faster

By Rob Ragan ★ 764 Python Feb 16, 2026