Developer Tools
388 articles
Developer Tools
Inside the Risk Bubble: How Princeton's Framework Measures AI Agent Capabilities in Offensive Security
Developer Tools
RF-Agent: Using Monte Carlo Tree Search to Generate Reward Functions for Reinforcement Learning
Developer Tools
ReasonRAG: Teaching RAG Systems to Think Step-by-Step with Process Rewards
Developer Tools
NegMerge: Solving Machine Unlearning's Hyperparameter Problem Through Sign-Consensual Weight Merging
Developer Tools
LangGround: Teaching Multi-Agent RL Systems to Communicate Like Humans Using GPT-4
Developer Tools
Inside oxp-python: A Stainless-Generated SDK That Proves Auto-Generation Has Grown Up
Developer Tools
SPORT: Teaching Multimodal Agents to Self-Improve Without Human Feedback
Developer Tools
Memoria: Why AI Agents Need More Than RAG to Remember
Developer Tools
ELT-Bench: The First Realistic Benchmark for Evaluating AI Agents on Data Pipeline Automation
Developer Tools
IBProtector: Defending LLMs Against Jailbreaks Using Information Bottleneck Theory
Developer Tools
Building a Multi-Agent Pentesting System with AutoGen: When LLMs Orchestrate Security Workflows
Developer Tools
Teaching Web Agents to Think Before They Click: World Models for LLM-Based Navigation
Developer Tools
Building Production Claude Apps in Minutes: Inside Anthropic's TypeScript Quickstarts
Developer Tools
CoGames: Building and Benchmarking Cooperative AI Agents for the Alignment League
Developer Tools
URET: IBM's Graph-Based Framework for Adversarial Testing Beyond Images
Developer Tools
XSS-AGENT: When AI Takes the Wheel in Browser Exploitation
Developer Tools
SRMT: Teaching Robot Swarms to Navigate Using Shared Attention Instead of Radios
Developer Tools
AIVSS: Adapting CVSS Vulnerability Scoring for the Age of AI Systems
Developer Tools
LLM-Check: Detecting Hallucinations by Analyzing What Language Models Think, Not Just What They Say
Developer Tools
AgentSonar: Detecting Shadow AI Through Network Traffic Heuristics
Developer Tools
Building RAG Chat with Exa's Web Search API and OpenAI o3-mini
Developer Tools
Vibe Security Radar: Tracking CVEs Caused by AI-Generated Code
Developer Tools
Shift: Teaching AI to Manipulate HTTP Traffic Like a Penetration Tester
Developer Tools