TJvox: macOS-Style Dictation for Linux
I type a lot. Emails, code, chat messages, angry comments on the internet. My wrists were starting to file complaints. macOS has this delightful built-in dictation feature where you double-tap a key, speak, and it types for you. Linux... did not. At least not in a way that didn't involve shouting at a browser tab or paying some cloud API by the syllable. So I built my own.
What It Does #
TJvox is a local, offline voice dictation app for Linux (Wayland-first, because that’s what I run). Hit a hotkey, speak, and it transcribes your rambling into text using OpenAI’s Whisper — running entirely on your own machine. No data leaves the house.
Sure, there are shinier transcription models out there. But Whisper is small, simple, and has proper documentation on how to actually use it. I might swap it out for something fancier down the road, but for now it does the job without making me read a 200-page research paper first. Your terrible poetry stays between you and your GPU.
The Stack of Overkill #
| Layer | Technology | Why |
|---|---|---|
| Audio | PipeWire | Native Linux audio, because PulseAudio and I have history. |
| Transcription | whisper.cpp via whisper-rs | Local, offline, surprisingly good at understanding my accent. |
| UI | GTK4 + Cairo | A recording overlay and a system tray icon. Fancy. |
| Output | wl-clipboard, ydotool, wtype | Wayland text injection is… an adventure. |
| History | SQLite | Because sometimes you dictate something brilliant and immediately lose it. |
| Post-processing | Optional LLM | Feed your transcript to a local model for grammar cleanup. |
The Fun Parts #
Text Replacements #
Whisper is great, but it doesn’t know you want a literal period when you say “period.” TJvox has a replacement engine that turns spoken punctuation into actual punctuation. Say “new paragraph” and get a real line break. It’s the small things.
[replacements]
"period" = "."
"comma" = ","
"new paragraph" = "\n\n"
Hot vs Cold Whisper #
Loading a Whisper model takes time and RAM, so TJvox supports two modes. Hot keeps the model loaded in memory for faster starts at the cost of RAM. Cold loads on demand, which is slower but leaves your memory alone. Pick your poison.
Push-to-Talk #
For the gamers among us, there’s an optional push-to-talk mode. Hold a key, speak, release. No toggling. Just like Ventrilo in 2005, but for writing documentation.
The Wayland Problem #
Building this for Wayland was… character-building. X11 has xdotool. Wayland has a fragmented ecosystem of wl-clipboard, ydotool, wtype, and prayers. TJvox tries them in order, detects your compositor where possible, and falls back to clipboard pasting when all else fails. It’s not perfect, but it works on my machine™ — specifically KDE Plasma.
Usage #
Set a global shortcut to tjvox toggle. Speak. Toggle again. Text appears. That’s it. There’s also a daemon mode, a GUI mode with a tray icon, and a history command for when you forget what you said.
tjvox # GUI + tray (default)
tjvox run # Single session
tjvox daemon # Background daemon
tjvox toggle # Toggle recording from anywhere
Is It Done? #
It works. I use it now and then — mostly when my wrists stage a revolt or I’m feeling too lazy to type a long email. It occasionally mistranscribes my Norwegian into what I can only assume is an ancient dialect of Elvish. There’s always room for improvement — better Wayland support, more output methods, maybe a proper settings GUI someday.
But for now? I can dictate this sentence without touching my keyboard. And my wrists are grateful.