VoiceInk: Building Privacy-First Voice Transcription with Local Whisper Models on macOS
Hook
Most voice dictation tools send your audio to the cloud. VoiceInk processes everything locally with high accuracy, then applies pre-configured settings based on which app you’re using—all without a single API call.
Context
Voice dictation on macOS has always been a compromise. Apple’s built-in solution routes audio through their servers and struggles with technical terminology. Dragon NaturallySpeaking costs hundreds of dollars and hasn’t seen meaningful macOS updates in years. Third-party solutions like Otter.ai require subscriptions and cloud processing, raising privacy concerns for anyone handling sensitive information.
VoiceInk takes a different approach: 100% local processing using optimized Whisper models through whisper.cpp and FluidAudio’s Parakeet implementation. Developer Pax spent five months building a native Swift application that transcribes speech with advertised 99% accuracy while keeping every byte of audio data on your Mac. The project recently went open source under GPL v3, allowing developers to build from source while a paid license funds continued development and provides automatic updates.
Technical Insight
VoiceInk’s architecture revolves around three core systems: global hotkey management, local ML inference, and context detection. The app uses sindresorhus’s KeyboardShortcuts library for configurable push-to-talk functionality, which triggers recording sessions. During recording, MediaRemoteAdapter automatically pauses any playing media—a subtle touch that prevents background audio from contaminating transcriptions.
The transcription engine leverages whisper.cpp, Georgi Gerganov’s high-performance C++ implementation of OpenAI’s Whisper model. VoiceInk also integrates FluidAudio’s Parakeet model as an alternative transcription backend, giving users options based on their accuracy versus speed preferences.
The standout feature is Power Mode, which uses macOS capabilities to detect the active application and current URL context, then applies your pre-configured transcription settings. If you’ve configured settings for Slack, it applies those when you’re in that app. Switch to your IDE, and it applies your technical writing configuration. This context awareness extends to the personal dictionary feature, where you can train specific vocabularies with custom words, industry terms, and smart text replacements.
The app uses SelectedTextKit for working with selected text on macOS. Smart Modes add another layer of intelligence by providing AI-powered writing style transformations, with the system described as understanding screen content and adapting to context, though the specific implementation details aren’t detailed in the documentation.
The privacy architecture is deliberately simple: no network stack for transcription. Audio flows from the microphone to the local model and back to your application without touching any external services. The only network calls are for the optional AI Assistant mode (described as a ChatGPT-like conversational assistant) and update checks via Sparkle. For organizations with strict data policies, this local-first approach means voice dictation can finally be approved for use with confidential information.
Building from source requires following the detailed BUILDING.md instructions. The project uses Swift Package Manager for dependency resolution, pulling in libraries like Swift Atomics for thread-safe concurrent programming—essential when managing audio recording, ML inference, and UI updates simultaneously without blocking the main thread.
Gotcha
VoiceInk’s macOS 14.4+ requirement immediately excludes anyone on older hardware or earlier OS versions. The README doesn’t specify Intel Mac support or performance characteristics on different hardware configurations.
The ‘not accepting pull requests’ policy is unusual for an open-source project. While you can fork and modify VoiceInk for personal use under GPL v3, you can’t contribute improvements back upstream. Bug reports and feature suggestions are welcome, but code contributions aren’t. This effectively makes it ‘source available’ rather than truly community-driven open source. For developers hoping to fix issues or add features, you’re maintaining a fork instead of collaborating.
The 99% accuracy claim lacks independent verification or methodology details. Whisper’s accuracy varies dramatically based on accent, background noise, speaking speed, and domain-specific vocabulary. While the personal dictionary helps with specialized terms, expect an adjustment period where you’re training the system and learning optimal speaking patterns. Real-world accuracy for technical dictation—variable names, framework-specific terminology, code snippets—will require substantial dictionary customization.
Verdict
Use VoiceInk if you’re on macOS 14.4+ and need fast, privacy-focused dictation that keeps sensitive voice data completely offline. The Power Mode context awareness and personal dictionary make it particularly valuable for professionals who alternate between different applications and need domain-specific vocabulary support. The paid license is reasonable given the development investment, and the open-source availability lets you audit privacy claims or build custom versions. Skip it if you’re on older macOS versions, need cross-platform support for Windows or Linux workflows, want to contribute code improvements to the project, or require guaranteed accuracy for mission-critical transcription where errors have serious consequences.