Have a natural, spoken conversation with an AI!
This project lets you chat with a Large Language Model (LLM) using just your voice, receiving spoken responses in near real-time. Think of it as your own digital conversation partner.
FastVoiceTalk_compressed_step3_h264.mp4
(early preview - first reasonably stable version)
β Project Status: Community-Driven
This project is no longer being actively maintained by me due to time constraints. I've taken on too many projects and I have to step back. I will no longer be implementing new features or providing user support.
I will continue to review and merge high-quality, well-written Pull Requests from the community from time to time. Your contributions are welcome and appreciated!
A sophisticated client-server system built for low-latency interaction:
- ποΈ Capture: Your voice is captured by your browser.
- β‘οΈ Stream: Audio chunks are whisked away via WebSockets to a Python backend.
- βοΈ Transcribe:
RealtimeSTTrapidly converts your speech to text. - π€ Think: The text is sent to an LLM (like Ollama or OpenAI) for processing.
- π£οΈ Synthesize: The AI's text response is turned back into speech using
RealtimeTTS. - β¬ οΈ Return: The generated audio is streamed back to your browser for playback.
- π Interrupt: Jump in anytime! The system handles interruptions gracefully.
- Fluid Conversation: Speak and listen, just like a real chat.
- Real-Time Feedback: See partial transcriptions and AI responses as they happen.
- Low Latency Focus: Optimized architecture using audio chunk streaming.
- Smart Turn-Taking: Dynamic silence detection (
turndetect.py) adapts to the conversation pace. - Flexible AI Brains: Pluggable LLM backends (Ollama default, OpenAI support via
llm_module.py). - Customizable Voices: Choose from different Text-to-Speech engines (Kokoro, Coqui, Orpheus via
audio_module.py). - Web Interface: Clean and simple UI using Vanilla JS and the Web Audio API.
- Dockerized Deployment: Recommended setup using Docker Compose for easier dependency management.
- Backend: Python < 3.13, FastAPI
- Frontend: HTML, CSS, JavaScript (Vanilla JS, Web Audio API, AudioWorklets)
- Communication: WebSockets
- Containerization: Docker, Docker Compose
- Core AI/ML Libraries:
RealtimeSTT(Speech-to-Text)RealtimeTTS(Text-to-Speech)transformers(Turn detection, Tokenization)torch/torchaudio(ML Framework)ollama/openai(LLM Clients)
- Audio Processing:
numpy,scipy
This project leverages powerful AI models, which have some requirements:
- Operating System:
- Docker: Linux is recommended for the best GPU integration with Docker.
- Windows: The provided script (
install.bat) provides automated setup. - macOS: Native support with optimized installation scripts and configuration.
- Manual: Manual steps are possible on all platforms but may require more troubleshooting (especially for DeepSpeed).
- π Python: 3.9 or higher (if setting up manually).
- π GPU: A powerful CUDA-enabled NVIDIA GPU is highly recommended, especially for faster STT (Whisper) and TTS (Coqui). Performance on CPU-only or weaker GPUs will be significantly slower.
- The setup assumes CUDA 12.1. Adjust PyTorch installation if you have a different CUDA version.
- Docker (Linux): Requires NVIDIA Container Toolkit.
- π³ Docker (Optional but Recommended): Docker Engine and Docker Compose v2+ for the containerized setup.
- π§ Ollama (Optional): If using the Ollama backend without Docker, install it separately and pull your desired models. The Docker setup includes an Ollama service.
- π OpenAI API Key (Optional): If using the OpenAI backend, set the
OPENAI_API_KEYenvironment variable (e.g., in a.envfile or passed to Docker).
Clone the repository first:
git clone https://github.com/KoljaB/RealtimeVoiceChat.git
cd RealtimeVoiceChatNow, choose your adventure:
π Option A: Docker Installation (Recommended for Linux/GPU)
This is the most straightforward method, bundling the application, dependencies, and even Ollama into manageable containers.
-
Build the Docker images: (This takes time! It downloads base images, installs Python/ML dependencies, and pre-downloads the default STT model.)
docker compose build
_(If you want to customize models/settings in
code/_.py, do it before this step!)* -
Start the services (App & Ollama): (Runs containers in the background. GPU access is configured in
docker-compose.yml.)docker compose up -d
Give them a minute to initialize.
-
(Crucial!) Pull your desired Ollama Model: (This is done after startup to keep the main app image smaller and allow model changes without rebuilding. Execute this command to pull the default model into the running Ollama container.)
# Pull the default model (adjust if you configured a different one in server.py) docker compose exec ollama ollama pull hf.co/bartowski/huihui-ai_Mistral-Small-24B-Instruct-2501-abliterated-GGUF:Q4_K_M # (Optional) Verify the model is available docker compose exec ollama ollama list
-
Stopping the Services:
docker compose down
-
Restarting:
docker compose up -d
-
Viewing Logs / Debugging:
- Follow app logs:
docker compose logs -f app - Follow Ollama logs:
docker compose logs -f ollama - Save logs to file:
docker compose logs app > app_logs.txt
- Follow app logs:
π οΈ Option B: Manual Installation (Windows Script / venv)
This method requires managing the Python environment yourself. It offers more direct control but can be trickier, especially regarding ML dependencies.
B1) Using the Windows Install Script:
- Ensure you meet the prerequisites (Python, potentially CUDA drivers).
- Run the script. It attempts to create a venv, install PyTorch for CUDA 12.1, a compatible DeepSpeed wheel, and other requirements.
(This opens a new command prompt within the activated virtual environment.) Proceed to the "Running the Application" section.
install.bat
B2) Manual Steps (Linux/macOS/Windows):
-
Create & Activate Virtual Environment:
python -m venv venv # Linux/macOS: source venv/bin/activate # Windows: .\venv\Scripts\activate
-
Upgrade Pip:
python -m pip install --upgrade pip
-
Navigate to Code Directory:
cd code -
Install PyTorch (Crucial Step - Match Your Hardware!):
- With NVIDIA GPU (CUDA 12.1 Example):
# Verify your CUDA version! Adjust 'cu121' and the URL if needed. pip install torch==2.5.1+cu121 torchaudio==2.5.1+cu121 torchvision --index-url https://download.pytorch.org/whl/cu121 - CPU Only (Expect Slow Performance):
# pip install torch torchaudio torchvision - Find other PyTorch versions: https://pytorch.org/get-started/previous-versions/
- With NVIDIA GPU (CUDA 12.1 Example):
-
Install Other Requirements:
pip install -r requirements.txt
- Note on DeepSpeed: The
requirements.txtmay include DeepSpeed. Installation can be complex, especially on Windows. Theinstall.battries a precompiled wheel. If manual installation fails, you might need to build it from source or consult resources like deepspeedpatcher (use at your own risk). Coqui TTS performance benefits most from DeepSpeed.
- Note on DeepSpeed: The
π Option C: macOS Installation (Native Support)
This project has native macOS support with optimized configuration and easy setup scripts.
C1) Quick One-Command Setup:
curl -sSL https://raw.githubusercontent.com/andy-aimer/RealtimeVoiceChat/main/deployment/macos/quick_install_macos.sh | bashC2) Manual macOS Setup:
-
Install Homebrew (if not already installed):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" -
Install dependencies:
brew install python@3.11 portaudio ffmpeg git
-
Create virtual environment:
python3.11 -m venv venv_macos source venv_macos/bin/activate pip install --upgrade pip pip install -r requirements.txt -
Generate SSL certificates:
mkdir -p ~/ssl openssl req -x509 -newkey rsa:2048 -keyout ~/ssl/server.key -out ~/ssl/server.crt -days 365 -nodes \ -subj "/C=US/ST=State/L=City/O=Organization/CN=localhost"
Convenient aliases (added automatically with quick setup):
alias rtvc="cd ~/RealtimeVoiceChat && source venv_macos/bin/activate"
alias start-rtvc="cd ~/RealtimeVoiceChat && ./start_macos.sh"
alias start-rtvc-ssl="cd ~/RealtimeVoiceChat && ./start_macos_ssl.sh"Access Points:
- HTTP:
http://localhost:8000 - HTTPS:
https://localhost:8443 - Monitoring:
http://localhost:8001
For complete macOS installation guide, see: deployment/macos_installation.md
If using Docker:
Your application is already running via docker compose up -d! Check logs using docker compose logs -f app.
If using Manual/Script Installation:
- Activate your virtual environment (if not already active):
# Linux/macOS: source ../venv/bin/activate # Windows: ..\venv\Scripts\activate
- Navigate to the
codedirectory (if not already there):cd code - Start the FastAPI server:
python server.py
Accessing the Client (Both Methods):
- Open your web browser to
http://localhost:8000(or your server's IP if running remotely/in Docker on another machine). - Grant microphone permissions when prompted.
- Click "Start" to begin chatting! Use "Stop" to end and "Reset" to clear the conversation.
Want to tweak the AI's voice, brain, or how it listens? Modify the Python files in the code/ directory.
docker compose build to ensure they are included in the image.
-
TTS Engine & Voice (
server.py,audio_module.py):- Change
START_ENGINEinserver.pyto"coqui","kokoro", or"orpheus". - Adjust engine-specific settings (e.g., voice model path for Coqui, speaker ID for Orpheus, speed) within
AudioProcessor.__init__inaudio_module.py.
- Change
-
LLM Backend & Model (
server.py,llm_module.py):- Set
LLM_START_PROVIDER("ollama"or"openai") andLLM_START_MODEL(e.g.,"hf.co/..."for Ollama, model name for OpenAI) inserver.py. Remember to pull the Ollama model if using Docker (see Installation Step A3). - Customize the AI's personality by editing
system_prompt.txt.
- Set
-
STT Settings (
transcribe.py):- Modify
DEFAULT_RECORDER_CONFIGto change the Whisper model (model), language (language), silence thresholds (silence_limit_seconds), etc. The defaultbase.enmodel is pre-downloaded during the Docker build.
- Modify
-
Turn Detection Sensitivity (
turndetect.py):- Adjust pause duration constants within the
TurnDetector.update_settingsmethod.
- Adjust pause duration constants within the
-
SSL/HTTPS (
server.py):- Set
USE_SSL = Trueand provide paths to your certificate (SSL_CERT_PATH) and key (SSL_KEY_PATH) files. - Docker Users: You'll need to adjust
docker-compose.ymlto map the SSL port (e.g., 443) and potentially mount your certificate files as volumes.
Generating Local SSL Certificates (Windows Example w/ mkcert)
- Install Chocolatey package manager if you haven't already.
- Install mkcert:
choco install mkcert - Run Command Prompt as Administrator.
- Install a local Certificate Authority:
mkcert -install - Generate certs (replace
your.local.ip):mkcert localhost 127.0.0.1 ::1 your.local.ip* This creates.pemfiles (e.g.,localhost+3.pemandlocalhost+3-key.pem) in the current directory. UpdateSSL_CERT_PATHandSSL_KEY_PATHinserver.pyaccordingly. Remember to potentially mount these into your Docker container.
- Set
Automatic hardware protection for long-running deployments on Raspberry Pi 5.
The system now includes built-in thermal monitoring to protect your hardware from overheating during extended use. When CPU temperature exceeds safe thresholds, the system automatically reduces workload to prevent thermal throttling and potential hardware damage.
How It Works:
- Monitors CPU temperature continuously (every 5 seconds by default)
- Triggers protection at 85Β°C - pauses LLM inference to reduce CPU load
- Resumes normal operation at 80Β°C (5Β°C hysteresis prevents oscillation)
- Displays visual warning in UI when protection is active
- Logs structured events for monitoring and debugging
Environment Variables:
Configure thermal thresholds via environment variables (add to .env file or Docker environment):
# Thermal Protection Configuration
THERMAL_TRIGGER_THRESHOLD=85.0 # Temperature (Β°C) to trigger protection (default: 85.0)
THERMAL_RESUME_THRESHOLD=80.0 # Temperature (Β°C) to resume normal operation (default: 80.0)
THERMAL_CHECK_INTERVAL=5.0 # Seconds between temperature checks (default: 5.0)Platform Support:
- Raspberry Pi 5: Full thermal monitoring via
/sys/class/thermal/thermal_zone0/temp - Other platforms (macOS/Windows/Linux): Gracefully disabled (returns -1, no errors)
Docker Configuration:
Add thermal environment variables to docker-compose.yml:
services:
app:
environment:
- THERMAL_TRIGGER_THRESHOLD=85.0
- THERMAL_RESUME_THRESHOLD=80.0
- THERMAL_CHECK_INTERVAL=5.0Monitoring Thermal State:
Check thermal status via the health endpoint:
curl http://localhost:8000/healthResponse includes thermal state:
{
"status": "healthy",
"thermal": {
"temperature": 75.2,
"protection_active": false,
"inference_paused": false,
"trigger_threshold": 85.0,
"resume_threshold": 80.0
}
}Testing on Pi 5:
Simulate thermal load using stress-ng:
# Install stress-ng
sudo apt-get install stress-ng
# Generate CPU load to trigger thermal protection
stress-ng --cpu 4 --timeout 60s --metrics-briefMonitor thermal state in logs or UI banner during stress testing.
Got ideas or found a bug? Contributions are welcome! Feel free to open issues or submit pull requests.
The core codebase of this project is released under the MIT License (see the LICENSE file for details).
This project relies on external specific TTS engines (like Coqui XTTSv2) and LLM providers which have their own licensing terms. Please ensure you comply with the licenses of all components you use.