1 hour ago · Tech · hide · 0 comments

Clone voices from a few seconds of audio, generate speech in 23 languages across 7 TTS engines, dictate into any text field with a global hotkey, and give any MCP-aware AI agent a voice of your choosing. The two cloud incumbents sit on opposite halves of the voice I/O loop — ElevenLabs on output, WisprFlow on input. Voicebox does both, bridges them with a bundled local LLM for refinement and per-profile personas, and runs the whole thing on your machine. Apart from full privacy, as you are running it locally, you do not have the token limitations of many online cloud services. There are some good YouTube videos showing how to get going with Voicebox, and also featuring what all it can do. It is not just a simple text-to-speech processor. See https://github.com/jamiepine/voicebox

No comments yet. Log in to reply on the Fediverse. Comments will appear here.