// LATEST

Developer Tools

LangGround: Teaching Multi-Agent RL Systems to Communicate Like Humans Using GPT-4

By Rob Ragan ★ 17 Python Apr 3, 2026
Developer Tools

Inside oxp-python: A Stainless-Generated SDK That Proves Auto-Generation Has Grown Up

By Rob Ragan ★ 17 Python Apr 3, 2026
Developer Tools

SPORT: Teaching Multimodal Agents to Self-Improve Without Human Feedback

By Rob Ragan ★ 20 Python Apr 3, 2026
Developer Tools

Memoria: Why AI Agents Need More Than RAG to Remember

By Rob Ragan ★ 20 Jupyter Notebook Apr 3, 2026
Developer Tools

ELT-Bench: The First Realistic Benchmark for Evaluating AI Agents on Data Pipeline Automation

By Rob Ragan ★ 24 Python Apr 3, 2026
Developer Tools

IBProtector: Defending LLMs Against Jailbreaks Using Information Bottleneck Theory

By Rob Ragan ★ 27 Python Apr 3, 2026
Developer Tools

Building a Multi-Agent Pentesting System with AutoGen: When LLMs Orchestrate Security Workflows

By Rob Ragan ★ 28 Python Apr 3, 2026
Developer Tools

Teaching Web Agents to Think Before They Click: World Models for LLM-Based Navigation

By Rob Ragan ★ 29 Python Apr 3, 2026
Developer Tools

Building Production Claude Apps in Minutes: Inside Anthropic's TypeScript Quickstarts

By Rob Ragan ★ 31 TypeScript Apr 3, 2026
Developer Tools

CoGames: Building and Benchmarking Cooperative AI Agents for the Alignment League

By Rob Ragan ★ 31 Python Apr 3, 2026
Developer Tools

URET: IBM's Graph-Based Framework for Adversarial Testing Beyond Images

By Rob Ragan ★ 32 Jupyter Notebook Apr 3, 2026
Developer Tools

XSS-AGENT: When AI Takes the Wheel in Browser Exploitation

By Rob Ragan ★ 33 PHP Apr 3, 2026
Developer Tools

SRMT: Teaching Robot Swarms to Navigate Using Shared Attention Instead of Radios

By Rob Ragan ★ 34 Python Apr 3, 2026
Developer Tools

AIVSS: Adapting CVSS Vulnerability Scoring for the Age of AI Systems

By Rob Ragan ★ 38 Python Apr 3, 2026
Developer Tools

LLM-Check: Detecting Hallucinations by Analyzing What Language Models Think, Not Just What They Say

By Rob Ragan ★ 39 Jupyter Notebook Apr 3, 2026
Developer Tools

AgentSonar: Detecting Shadow AI Through Network Traffic Heuristics

By Rob Ragan ★ 39 Go Apr 3, 2026
Developer Tools

Building RAG Chat with Exa's Web Search API and OpenAI o3-mini

By Rob Ragan ★ 43 TypeScript Apr 3, 2026
Developer Tools

Vibe Security Radar: Tracking CVEs Caused by AI-Generated Code

By Rob Ragan ★ 44 Python Apr 3, 2026
Developer Tools

Shift: Teaching AI to Manipulate HTTP Traffic Like a Penetration Tester

By Rob Ragan ★ 46 TypeScript Apr 3, 2026
Developer Tools

Web-Shepherd: Training Process Reward Models to Guide Web Agents Through Long-Horizon Tasks

By Rob Ragan ★ 53 Python Apr 3, 2026
Developer Tools

PenGym: Training Reinforcement Learning Agents on Real Penetration Testing Infrastructure

By Rob Ragan ★ 56 Python Apr 3, 2026
Developer Tools

SEC-bench: Automated Benchmarking for LLM Agents on Real-World Security Vulnerabilities

By Rob Ragan ★ 63 Python Apr 3, 2026
Developer Tools

HackBench: The Security Benchmark Where LLMs Learn to Exploit Real Vulnerabilities

By Rob Ragan ★ 69 Rich Text Format Apr 3, 2026
Developer Tools

InterCode-CTF: How Simple Prompts Cracked 95% of Security Challenges (And What That Means for LLM Benchmarking)

By Rob Ragan ★ 70 Python Apr 3, 2026