01: Building the Open-Source Answer to Rabbit R1’s Voice-Controlled Computer
Hook
While companies like Rabbit R1 built proprietary voice-controlled AI devices, a team of open-source developers built something more ambitious: a voice interface that actually executes code on your computer and runs on affordable ESP32 hardware.
Context
Voice assistants have plateaued. Siri, Alexa, and Google Assistant excel at setting timers and playing music, but they can’t actually do anything meaningful on your computer—they can’t refactor code, manipulate spreadsheets, or automate complex workflows. Proprietary devices like the Rabbit R1 attempted to break this ceiling with voice-controlled computer interaction, but remained closed-source and vendor-locked.
The 01 project emerges from this gap, described as an open-source platform for intelligent devices inspired by the Rabbit R1 and Star Trek computer. Powered by Open Interpreter, it provides a natural language voice interface for computers with an ambitious goal: what if you could talk to your computer like the Star Trek crew talks to their ship’s computer, and have it actually execute commands, write code, and control software? More radically, what if this capability could run on ESP32 microcontrollers you can build yourself?
Technical Insight
The 01 architecture addresses a fundamental challenge: how do you provide sophisticated voice-controlled computing across hardware ranging from battery-powered ESP32 chips to full desktop machines? The answer is a tiered server architecture with multiple client implementations.
At the foundation sits Open Interpreter, which handles code execution and command interpretation. On top of this, 01 provides two server options. The Light Server is optimized for low-power devices like ESP32 microcontrollers. The LiveKit Server is designed for devices with higher processing power and supports OpenAI’s Realtime API for multimodal interactions.
Here’s how you’d launch the full-featured LiveKit server with multimodal support:
poetry run 01 --server livekit --qr --expose --multimodal
This command spins up a LiveKit server instance, generates a QR code for mobile client connection (--qr), exposes the server to external networks (--expose), and enables OpenAI’s Realtime API for multimodal interactions (--multimodal). The QR code approach is elegant—mobile clients can instantly connect without manual IP configuration.
The profile-based configuration system deserves attention. Rather than hardcoding behavior, 01 uses editable profiles in software/source/server/profiles that define the language model, system messages, and other behavioral parameters. This means you can create distinct profiles for different use cases—different capability levels and behavioral constraints.
The client diversity is where 01 truly differentiates itself. The ESP32 implementation runs on microcontrollers, handling voice capture and network communication. The mobile apps (Android and iOS) provide interfaces for on-the-go computer control. The desktop client allows voice control of the machine you’re physically using. This isn’t just multi-platform—it’s multi-tier computing where the same backend serves radically different hardware capabilities.
According to the README, 01’s capabilities include executing code, browsing the web, managing files, and controlling third-party software. The system appears to route voice input through language models that can then trigger these actions, though the exact execution pipeline isn’t detailed in the documentation.
Gotcha
The README’s warning isn’t subtle: “This experimental project is under rapid development and lacks basic safeguards. Until a stable 1.0 release, only run this on devices without sensitive information or access to paid services.” This is refreshingly honest and critically important.
The core architectural decision—allowing voice commands to execute code—is both 01’s superpower and its main risk. The README explicitly warns about the lack of basic safeguards and recommends avoiding devices with sensitive information or access to paid services. The documentation mentions a profile system for customization but doesn’t detail specific security boundaries or execution constraints in the README itself. The safety documentation linked in the README discusses risks and safety measures, indicating this is an area requiring careful consideration.
The dependency on OpenAI’s Realtime API for the multimodal features creates potential cost and availability constraints. While the documentation mentions customizing language models through profiles, the cutting-edge multimodal capabilities demonstrated in the example command require API access. For hobbyists building ESP32 devices, API costs could become a consideration, though the README doesn’t specify what functionality is available with local models versus cloud APIs.
The project is explicitly described as experimental and under rapid development, meaning APIs, configurations, and features may change before the stable 1.0 release.
Verdict
Use 01 if you’re building custom voice-controlled hardware projects, want to experiment with open-source alternatives to proprietary voice-controlled devices, or need a flexible platform for maker projects that require natural language interfaces on constrained hardware like ESP32. It’s ideal for developers comfortable with experimental software who want to create DIY voice assistants without vendor lock-in, or researchers exploring the intersection of LLMs and computer control. Skip it if you need production-ready security for any environment with sensitive data, require stable APIs and guaranteed backward compatibility, or need to run it on devices with access to paid services or sensitive information. The project maintainers explicitly recommend waiting for the 1.0 release if you need safety guardrails. This is bleeding-edge open-source innovation with all the power and risks that implies—the README is transparent about its experimental nature and current limitations.