Starlog — Page 54
// LATEST
LLM Engineering
BitNet: Running 100B Parameter Models on Your Laptop at Human Reading Speed
Developer Tools
Gitingest: Turn Any GitHub Repository Into LLM-Ready Text With a URL Trick
AI Dev Tools
ARTKIT: Why Enterprise Gen AI Testing Requires Adversarial Multi-Turn Conversations
LLM Engineering
LLM-Check: Detecting Hallucinations by Reading Your Model's Mind
LLM Engineering
Mapping LLM Safety as a Landscape: How Weight Perturbations Reveal the Fragility of Alignment
Developer Tools
Extracting Neural Network Weights Through Black-Box Queries: A Cryptanalytic Attack Framework
Developer Tools
Privacy Backdoors: When Pre-Trained Models Betray Your Training Data
LLM Engineering
VERL: The Hybrid-Controller Framework Reshaping How We Train LLMs with Reinforcement Learning
AI Agents
AgentBoard: Why LLM Agent Benchmarks Need Multi-Turn Analysis, Not Just Success Rates
Data & Knowledge
WrenAI: The Semantic Context Layer That Keeps LLMs From Wrecking Your Data Governance
Developer Tools
Marker: How a Multi-Stage CV Pipeline Achieves 25 Pages/Second PDF Parsing
AI Dev Tools
STRIDE GPT: How AI-Powered Threat Modeling Adapts to Agentic Systems
LLM Engineering
IB4LLMs: Using Information Bottleneck Theory to Build Jailbreak-Resistant Language Models
Developer Tools
URET: Adversarial Testing for ML Models Beyond Images
AI Agents
RedCode: The First Real Safety Benchmark for Autonomous Code Agents
AI Agents
LangGround: Teaching AI Agents to Coordinate Like Humans, Not Vectors
Developer Tools
Coderoller: Flattening Repositories Into LLM-Ready Markdown
LLM Engineering
SRMT: Teaching Robots to Share Their Thoughts Through Memory
Developer Tools
Sponge Poisoning: The Stealth Attack That Makes Neural Networks Energy Vampires
AI Agents
Teaching LLMs to Predict the Future: World Models for Web Agents
Developer Tools
NegMerge: Fixing Machine Unlearning's Hyperparameter Lottery Problem
Developer Tools
Best-of-N Jailbreaking: How Sampling Beats Sophistication in LLM Attacks
LLM Engineering
Building Resumable LLM Evaluations: A Template for Rate-Limited API Testing
AI Agents