GibberLink: When AI Agents Decide to Stop Speaking English
Hook
What happens when two AI agents realize they're wasting time speaking English to each other? They switch to squealing data tones at 1,200 baud—and you can hear the exact moment it happens.
Context
Most multi-agent systems communicate through APIs, message queues, or WebSockets—invisible plumbing that developers configure explicitly. Agents don't choose how they talk to each other; we do. But as conversational AI agents become more autonomous, an interesting question emerges: if two agents are designed to speak naturally to humans, what happens when they encounter each other?
GibberLink explores this question with a deliberately theatrical approach. Created by PennyroyalTea and winner of the 11labs x a16z hackathon, this TypeScript project connects two ElevenLabs conversational AI agents and prompts them to detect when they're speaking with another AI. Once both agents confirm they're non-human, they negotiate a protocol switch—from natural English to ggwave, a data-over-sound transmission protocol that encodes information as audible tones. The result is both technically clever and viscerally strange: you hear two agents chatting normally, then suddenly switching to R2-D2-style beeps and chirps. It's a proof-of-concept that went viral precisely because it makes the invisible visible—and audible.
Technical Insight
GibberLink's architecture hinges on three components: ElevenLabs conversational agents, prompt engineering for agent detection, and ggwave for the protocol switch. The system doesn't use explicit state machines or hardcoded triggers. Instead, it relies on the agents' ability to recognize conversational patterns and act on implicit instructions embedded in their system prompts.
The agents are initialized with prompts that establish their behavior: engage in natural conversation, but if you determine you're speaking with another AI agent, suggest switching to a more efficient protocol. This creates an emergent behavior where agents must both recognize the situation and coordinate the switch. The TypeScript implementation manages the audio streams, bridges the ElevenLabs API with the ggwave library, and handles the protocol transition.
Here's a simplified conceptual flow of how the protocol switching works:
class AgentCommunication {
private mode: 'natural' | 'ggwave' = 'natural';
private ggwaveEncoder: GGWaveEncoder;
private elevenLabsAgent: ConversationalAgent;
async processResponse(audioInput: Buffer): Promise<Buffer> {
if (this.mode === 'natural') {
// Agent receives audio, processes through ElevenLabs
const response = await this.elevenLabsAgent.respond(audioInput);
// Check if agent has indicated readiness to switch
if (this.detectSwitchSignal(response.transcript)) {
console.log('Agent detected AI counterpart, switching to ggwave');
this.mode = 'ggwave';
return this.initiateGGWaveHandshake();
}
return response.audio;
} else {
// Decode ggwave audio to data
const data = await this.ggwaveEncoder.decode(audioInput);
const processedData = this.processAIMessage(data);
// Encode response back to ggwave audio
return this.ggwaveEncoder.encode(processedData);
}
}
private detectSwitchSignal(transcript: string): boolean {
// Pattern matching for agent recognition phrases
const aiDetectionPhrases = [
'you are also an ai',
'we are both agents',
'switch to efficient protocol'
];
return aiDetectionPhrases.some(phrase =>
transcript.toLowerCase().includes(phrase)
);
}
}
The ggwave integration is where things get interesting. Originally designed for transmitting small amounts of data through speakers and microphones (think AirDrop but audible), ggwave operates at roughly 1,200 baud using frequency-shift keying. In GibberLink's context, the agents' audio outputs are piped through ggwave encoders, converting their messages into audible data tones. The receiving agent decodes these tones back into data.
What makes this architecturally noteworthy isn't the sound transmission itself—ggwave is mature and well-documented. It's the autonomous negotiation. The agents aren't following a hardcoded "if AI detected, then switch protocol" rule. They're interpreting conversational context, confirming mutual understanding, and coordinating a behavioral change. This demonstrates prompt engineering as protocol negotiation, where the communication layer emerges from agent behavior rather than explicit routing logic.
The system's transparency is another architectural choice worth highlighting. Unlike traditional agent communication that happens through encrypted APIs or binary message formats, GibberLink's protocol switch is deliberately audible. You can record the conversation, run it through a ggwave decoder, and see exactly what the agents are transmitting. This makes the system valuable as an educational tool—you're not trusting logs or debug output; you're hearing the communication channel itself.
The limitation of this architecture is also its feature: bandwidth. At 1,200 baud, you're transmitting roughly 150 bytes per second. That's fine for demonstrating protocol switching or sending small structured messages, but it's nowhere near practical for production agent communication. The choice to use sound as the medium is conceptual, not performant—it makes the invisible visible, sacrificing efficiency for demonstration value.
Gotcha
GibberLink is a hackathon project that became a viral demo, and it shows. The repository is light on documentation, implementation details are sparse, and there's no clear path to extending or deploying it in any meaningful context. This isn't criticism—it achieves exactly what it set out to do—but developers expecting a framework or library will be disappointed.
The dependency on ElevenLabs conversational AI means you're working with a proprietary service that has API costs and rate limits. You can't run this locally without active API credentials, and the agents' ability to detect each other and negotiate the switch depends entirely on the quality and consistency of ElevenLabs' models. Prompt engineering is notoriously fragile; what works today might fail tomorrow if the underlying model changes. Additionally, the sound-based protocol introduces real-world constraints: background noise, audio quality, speaker/microphone limitations, and latency all affect reliability in ways that traditional network protocols simply don't encounter. If you're imagining production agent swarms coordinating over ggwave, you're thinking about a fundamentally different (and far more complex) problem than what GibberLink demonstrates.
Verdict
Use if: you're exploring emergent AI behavior, building demonstrations of agent-to-agent protocol negotiation, researching autonomous communication optimization, or creating educational content about AI agent coordination. GibberLink is a conversation starter that makes abstract concepts tangible. It's perfect for hackathons, conference talks, or prototyping novel interaction patterns. Skip if: you need production-grade multi-agent communication, require reliable high-bandwidth agent coordination, want well-documented extensible frameworks, or are building systems where sound-based transmission is impractical. For actual multi-agent systems, use AutoGen, LangGraph, or conventional API-based communication. GibberLink's value is conceptual and demonstrative—it shows what's possible when agents negotiate their own protocols, even if the specific implementation isn't ready for prime time.