Speech Features

Configure local wake word detection and speech-to-text for voice control.

🎙️ Speech Features (Voice Control)

Transform your Home Assistant MCP server into a fully voice-controlled smart assistant. The speech integration provides local, privacy-focused voice processing without relying on cloud services for transcription.

Overview

The speech stack consists of three main components:

Wake Word Detection: Powered by wyoming-openwakeword (default: “Hey Jarvis”).
Speech-to-Text (STT): Fast, local transcription using faster-whisper.
Audio Integration: Direct PulseAudio integration for seamless microphone access.

Prerequisites

A working microphone connected to your host machine.
PulseAudio or PipeWire (with PulseAudio emulation) running on the host.
Docker and Docker Compose installed.

Quick Start

To launch the MCP server with the complete speech stack, use the dedicated Docker Compose file:

docker-compose -f docker-compose.speech.yml up -d

This will start three containers:

homeassistant-mcp: The main MCP server.
fast-whisper: The local speech-to-text engine.
wake-word: The wake word detection service.

Configuration

Speech features are configured via environment variables in your .env file.

Feature Flags

ENABLE_SPEECH_FEATURES=true
ENABLE_WAKE_WORD=true
ENABLE_SPEECH_TO_TEXT=true

Audio Settings

Configure these to match your microphone and environment:

NOISE_THRESHOLD=0.05
MIN_SPEECH_DURATION=1.0
SILENCE_DURATION=0.5
SAMPLE_RATE=16000
CHANNELS=1
CHUNK_SIZE=1024

PulseAudio Integration

The containers need access to your host’s PulseAudio server. The default configuration assumes user ID 1000:

PULSE_SERVER=unix:/run/user/1000/pulse/native
AUDIO_GID=29 # The 'audio' group ID on your host machine

Note: You may need to adjust AUDIO_GID to match the audio group ID on your host system (find it by running getent group audio).

Whisper Configuration

Configure the local speech-to-text engine:

ASR_MODEL=base
ASR_ENGINE=faster_whisper
WHISPER_BEAM_SIZE=5
COMPUTE_TYPE=float32
LANGUAGE=en

Troubleshooting

Microphone Not Detected

Ensure the AUDIO_GID matches your host system and that the PulseAudio socket path is correct. You can test microphone access inside the container:

docker exec -it wake-word arecord -l

High CPU Usage

The fast-whisper container can be CPU intensive during transcription. You can limit its resources in the docker-compose.speech.yml file or switch to a smaller model (e.g., tiny instead of base).

Previous ← Testing