GitHub - burka/voicsh: voic.sh voice to keystroke for linux

title

linkTitle

cascade

voic.sh — Voice typing for Wayland Linux

Overview

type
docs

Offline, privacy-first voice typing. Speak into your mic, text appears in your focused app. Or pipe a WAV file and get text on stdout.

Build: Rust, C compiler, cmake, pkg-config, libclang, ALSA headers — sudo apt install build-essential cmake pkg-config libclang-dev libasound2-dev

Run (mic mode): sudo apt install wl-clipboard wtype ydotool — GNOME 45+ / KDE 6.1+ work without wtype/ydotool; pipe mode has no runtime deps

Early stage — free-time side project. Primary target: Ubuntu + GNOME + Wayland. Other distros and compositors are welcome, but I can only test on this setup. If something doesn't work, open an issue. See CONTRIBUTING.md for how to help.

Quick start

cargo install voicsh

# First run — detects your desktop and picks the best model:
voicsh init

# Mic mode:
voicsh                          # continuous mic → text into focused app
voicsh --once                   # single utterance → exit

# Pipe mode (no mic/runtime deps needed):
cat file.wav | voicsh

voicsh help                     # all commands and options

How it works

Mic/WAV → VAD → Chunker → Whisper → Post-processor → Text injection
                                                        ↓
                                                portal / wtype / ydotool

Audio captured via cpal (mic) or hound (WAV file)
Voice activity detection splits speech into chunks
whisper-rs transcribes each chunk locally
Text injected via xdg-desktop-portal (GNOME/KDE), wtype, or ydotool

Pipe mode (cat file.wav | voicsh) skips injection and writes to stdout.

Install

# Ubuntu/Debian:
sudo apt install build-essential cmake pkg-config libclang-dev libasound2-dev
cargo install voicsh
voicsh init                    # benchmark hardware, pick model, download
voicsh install-gnome-extension # GNOME Shell panel indicator (optional)

Other distros, GPU acceleration, and pipe-only builds: see INSTALL.md.

Text injection

voicsh injects transcribed text into your focused app. voicsh init auto-detects the best backend:

Desktop	Backend	Notes
GNOME 45+	Portal (RemoteDesktop)	No extra tools needed
KDE 6.1+	Portal or wtype
Sway / Hyprland	wtype	`sudo apt install wtype`
Fallback	ydotool	Needs `ydotoold` daemon

Override at runtime: voicsh --injection-backend wtype Override via env: VOICSH_BACKEND=portal voicsh Override in config: [injection] section — run voicsh config dump to see all options.

wl-clipboard (wl-copy) is required for clipboard-based injection.

Note: wtype/ydotool inject via clipboard paste — this overwrites your clipboard. Portal types directly without touching the clipboard.

Voice commands

Voice commands trigger only when spoken as standalone utterances — pause, say the command, pause. Text that merely contains a command word passes through unchanged:

[pause] "period" [pause]          → .
[pause] "new line" [pause]        → (line break)
"the period of history"           → "the period of history"
"press enter to continue"        → "press enter to continue"
[pause] "all caps" [pause] "wow" [pause] "end caps" → "WOW"

Built-in commands are available for English, German, Spanish, French, Portuguese, Italian, Dutch, Polish, Russian, Japanese, Chinese, and Korean — see post_processor.rs for the full list. Discover all commands for a language:

voicsh config list --language=en     # English voice commands
voicsh config list --language=ko     # Korean voice commands
voicsh config list --language=en,de  # multiple languages

Add custom commands in [voice_commands.commands] in config — they take precedence over built-ins. To disable voice commands entirely: voice_commands.enabled = false.

Configuration

voicsh config dump              # commented template with all options and defaults
voicsh config list              # current active configuration
voicsh config list stt          # just the [stt] section
voicsh config get stt.model     # single value
voicsh config set stt.model small.en

Config file: ~/.config/voicsh/config.toml. Environment overrides: VOICSH_MODEL, VOICSH_LANGUAGE, VOICSH_BACKEND.

Shell integration

GNOME extension — panel indicator with recording state, model info, and Super+Alt+V toggle:

voicsh install-gnome-extension

Shell completions: voicsh completions bash|zsh|fish — run voicsh completions --help for install paths.

License

MIT

Acknowledgments

whisper.cpp - Whisper inference engine
whisper-rs - Rust bindings
cpal - Cross-platform audio
Inspired by nerd-dictation, voxd, and BlahST

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
.github		.github
benches		benches
gnome		gnome
scripts		scripts
src		src
test-containers		test-containers
tests		tests
website		website
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
BENCHMARKING.md		BENCHMARKING.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
DEVELOPMENT.md		DEVELOPMENT.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
RELEASING.md		RELEASING.md
ROADMAP.md		ROADMAP.md
build.rs		build.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick start

How it works

Install

Text injection

Voice commands

Configuration

Shell integration

License

Acknowledgments

About

Uh oh!

Releases 7

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

burka/voicsh

Folders and files

Latest commit

History

Repository files navigation

Quick start

How it works

Install

Text injection

Voice commands

Configuration

Shell integration

License

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages