// LATEST
Developer Tools
LangGround: Teaching Multi-Agent RL Systems to Communicate Like Humans Using GPT-4
Developer Tools
Inside oxp-python: A Stainless-Generated SDK That Proves Auto-Generation Has Grown Up
Developer Tools
SPORT: Teaching Multimodal Agents to Self-Improve Without Human Feedback
Developer Tools
Memoria: Why AI Agents Need More Than RAG to Remember
Developer Tools
ELT-Bench: The First Realistic Benchmark for Evaluating AI Agents on Data Pipeline Automation
Developer Tools
IBProtector: Defending LLMs Against Jailbreaks Using Information Bottleneck Theory
Developer Tools
Building a Multi-Agent Pentesting System with AutoGen: When LLMs Orchestrate Security Workflows
Developer Tools
Teaching Web Agents to Think Before They Click: World Models for LLM-Based Navigation
Developer Tools
Building Production Claude Apps in Minutes: Inside Anthropic's TypeScript Quickstarts
Developer Tools
CoGames: Building and Benchmarking Cooperative AI Agents for the Alignment League
Developer Tools
URET: IBM's Graph-Based Framework for Adversarial Testing Beyond Images
Developer Tools
XSS-AGENT: When AI Takes the Wheel in Browser Exploitation
Developer Tools
SRMT: Teaching Robot Swarms to Navigate Using Shared Attention Instead of Radios
Developer Tools
AIVSS: Adapting CVSS Vulnerability Scoring for the Age of AI Systems
Developer Tools
LLM-Check: Detecting Hallucinations by Analyzing What Language Models Think, Not Just What They Say
Developer Tools
AgentSonar: Detecting Shadow AI Through Network Traffic Heuristics
Developer Tools
Building RAG Chat with Exa's Web Search API and OpenAI o3-mini
Developer Tools
Vibe Security Radar: Tracking CVEs Caused by AI-Generated Code
Developer Tools
Shift: Teaching AI to Manipulate HTTP Traffic Like a Penetration Tester
Developer Tools
Web-Shepherd: Training Process Reward Models to Guide Web Agents Through Long-Horizon Tasks
Developer Tools
PenGym: Training Reinforcement Learning Agents on Real Penetration Testing Infrastructure
Developer Tools
SEC-bench: Automated Benchmarking for LLM Agents on Real-World Security Vulnerabilities
Developer Tools
HackBench: The Security Benchmark Where LLMs Learn to Exploit Real Vulnerabilities
Developer Tools