Speech Features
Configure local wake word detection and speech-to-text for voice control.
🎙️ Speech Features (Voice Control)
Transform your Home Assistant MCP server into a fully voice-controlled smart assistant. The speech integration provides local, privacy-focused voice processing without relying on cloud services for transcription.
Overview
The speech stack consists of three main components:
- Wake Word Detection: Powered by
wyoming-openwakeword(default: “Hey Jarvis”). - Speech-to-Text (STT): Fast, local transcription using
faster-whisper. - Audio Integration: Direct PulseAudio integration for seamless microphone access.
Prerequisites
- A working microphone connected to your host machine.
- PulseAudio or PipeWire (with PulseAudio emulation) running on the host.
- Docker and Docker Compose installed.
Quick Start
To launch the MCP server with the complete speech stack, use the dedicated Docker Compose file:
docker-compose -f docker-compose.speech.yml up -d
This will start three containers:
homeassistant-mcp: The main MCP server.fast-whisper: The local speech-to-text engine.wake-word: The wake word detection service.
Configuration
Speech features are configured via environment variables in your .env file.
Feature Flags
ENABLE_SPEECH_FEATURES=true
ENABLE_WAKE_WORD=true
ENABLE_SPEECH_TO_TEXT=true
Audio Settings
Configure these to match your microphone and environment:
NOISE_THRESHOLD=0.05
MIN_SPEECH_DURATION=1.0
SILENCE_DURATION=0.5
SAMPLE_RATE=16000
CHANNELS=1
CHUNK_SIZE=1024
PulseAudio Integration
The containers need access to your host’s PulseAudio server. The default configuration assumes user ID 1000:
PULSE_SERVER=unix:/run/user/1000/pulse/native
AUDIO_GID=29 # The 'audio' group ID on your host machine
Note: You may need to adjust AUDIO_GID to match the audio group ID on your host system (find it by running getent group audio).
Whisper Configuration
Configure the local speech-to-text engine:
ASR_MODEL=base
ASR_ENGINE=faster_whisper
WHISPER_BEAM_SIZE=5
COMPUTE_TYPE=float32
LANGUAGE=en
Troubleshooting
Microphone Not Detected
Ensure the AUDIO_GID matches your host system and that the PulseAudio socket path is correct. You can test microphone access inside the container:
docker exec -it wake-word arecord -l
High CPU Usage
The fast-whisper container can be CPU intensive during transcription. You can limit its resources in the docker-compose.speech.yml file or switch to a smaller model (e.g., tiny instead of base).