| title | linkTitle | cascade | ||
|---|---|---|---|---|
voic.sh — Voice typing for Wayland Linux |
Overview |
|
Offline, privacy-first voice typing. Speak into your mic, text appears in your focused app. Or pipe a WAV file and get text on stdout.
Build: Rust, C compiler, cmake, pkg-config, libclang, ALSA headers — sudo apt install build-essential cmake pkg-config libclang-dev libasound2-dev
Run (mic mode): sudo apt install wl-clipboard wtype ydotool — GNOME 45+ / KDE 6.1+ work without wtype/ydotool; pipe mode has no runtime deps
Early stage — free-time side project. Primary target: Ubuntu + GNOME + Wayland. Other distros and compositors are welcome, but I can only test on this setup. If something doesn't work, open an issue. See CONTRIBUTING.md for how to help.
cargo install voicsh
# First run — detects your desktop and picks the best model:
voicsh init
# Mic mode:
voicsh # continuous mic → text into focused app
voicsh --once # single utterance → exit
# Pipe mode (no mic/runtime deps needed):
cat file.wav | voicsh
voicsh help # all commands and optionsMic/WAV → VAD → Chunker → Whisper → Post-processor → Text injection
↓
portal / wtype / ydotool
- Audio captured via cpal (mic) or hound (WAV file)
- Voice activity detection splits speech into chunks
- whisper-rs transcribes each chunk locally
- Text injected via xdg-desktop-portal (GNOME/KDE), wtype, or ydotool
Pipe mode (cat file.wav | voicsh) skips injection and writes to stdout.
# Ubuntu/Debian:
sudo apt install build-essential cmake pkg-config libclang-dev libasound2-dev
cargo install voicsh
voicsh init # benchmark hardware, pick model, download
voicsh install-gnome-extension # GNOME Shell panel indicator (optional)Other distros, GPU acceleration, and pipe-only builds: see INSTALL.md.
voicsh injects transcribed text into your focused app. voicsh init auto-detects the best backend:
| Desktop | Backend | Notes |
|---|---|---|
| GNOME 45+ | Portal (RemoteDesktop) | No extra tools needed |
| KDE 6.1+ | Portal or wtype | |
| Sway / Hyprland | wtype | sudo apt install wtype |
| Fallback | ydotool | Needs ydotoold daemon |
Override at runtime: voicsh --injection-backend wtype
Override via env: VOICSH_BACKEND=portal voicsh
Override in config: [injection] section — run voicsh config dump to see all options.
wl-clipboard (wl-copy) is required for clipboard-based injection.
Note: wtype/ydotool inject via clipboard paste — this overwrites your clipboard. Portal types directly without touching the clipboard.
Voice commands trigger only when spoken as standalone utterances — pause, say the command, pause. Text that merely contains a command word passes through unchanged:
[pause] "period" [pause] → .
[pause] "new line" [pause] → (line break)
"the period of history" → "the period of history"
"press enter to continue" → "press enter to continue"
[pause] "all caps" [pause] "wow" [pause] "end caps" → "WOW"
Built-in commands are available for English, German, Spanish, French, Portuguese, Italian, Dutch, Polish, Russian, Japanese, Chinese, and Korean — see post_processor.rs for the full list. Discover all commands for a language:
voicsh config list --language=en # English voice commands
voicsh config list --language=ko # Korean voice commands
voicsh config list --language=en,de # multiple languagesAdd custom commands in [voice_commands.commands] in config — they take precedence over built-ins. To disable voice commands entirely: voice_commands.enabled = false.
voicsh config dump # commented template with all options and defaults
voicsh config list # current active configuration
voicsh config list stt # just the [stt] section
voicsh config get stt.model # single value
voicsh config set stt.model small.enConfig file: ~/.config/voicsh/config.toml. Environment overrides: VOICSH_MODEL, VOICSH_LANGUAGE, VOICSH_BACKEND.
GNOME extension — panel indicator with recording state, model info, and Super+Alt+V toggle:
voicsh install-gnome-extensionShell completions: voicsh completions bash|zsh|fish — run voicsh completions --help for install paths.
MIT
- whisper.cpp - Whisper inference engine
- whisper-rs - Rust bindings
- cpal - Cross-platform audio
- Inspired by nerd-dictation, voxd, and BlahST