Self-Hosted Voice Assistant in 2026: Building Your Private Jarvis with Home Assistant + Local LLM
A complete guide to building a private voice assistant using Home Assistant and local LLMs like Ollama, featuring offline voice control with maximum privacy.
Table of Contents
- Why Go Private?
- The Architecture
- Hardware Requirements
- Component 1: Home Assistant Voice Pipeline
- Install Wyoming Add-ons
- Create Your Voice Pipeline
- Component 2: Local LLM with Ollama
- Installation
- Choose Your Model
- Custom System Prompt
- Component 3: Whisper for Speech Recognition
- Installation Options
- Performance Tips
- Component 4: Piper TTS
- Voice Options
- Wyoming Integration
- Putting It All Together
- Step 1: Verify Components
- Step 2: Create Custom Sentences
- Step 3: Advanced Intents with LLM
- Voice Satellite Deployment
- Real-World Performance
- Common Failures & Fixes
- Privacy Checklist
- The Future
Alexa and Google Assistant are convenient—until you realize they’re listening to everything. In 2026, you can build your own private voice assistant that rivals commercial offerings while keeping your data local. Let’s build a self-hosted voice assistant that actually works.
Why Go Private?
Commercial voice assistants have a fundamental problem: you’re renting trust. Your voice data is processed on someone else’s servers, often stored indefinitely, and used to train models you’ll never see.
The privacy trade-off isn’t worth it anymore. With modern open-source tools, you can achieve 95% of Alexa’s functionality while keeping everything on your network.
| Feature | Alexa/Google | Self-Hosted |
|---|---|---|
| Voice Processing | Cloud | Local Whisper |
| Response Generation | Cloud LLM | Local Ollama |
| Text-to-Speech | Cloud | Piper TTS |
| Your Data | Theirs | Yours |
| Offline Use | ❌ | ✅ |
| Custom Commands | Limited | Unlimited |
The Architecture
Your self-hosted voice assistant has four components:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Microphone │────▶│ Whisper │────▶│ Ollama │
│ (Audio) │ │ (STT) │ │ (LLM) │
└──────────────┘ └──────────────┘ └──────────────┘
│ │
│ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Wake Word │────▶│ Piper TTS │◀─────────│ Response │
│ (Porcupine)│ │ (TTS) │ │ (Intent) │
└──────────────┘ └──────────────┘ └──────────────┘
Wyoming Protocol (Home Assistant’s voice pipeline) chains these together seamlessly:
- Wake Word → Activates the pipeline (e.g., “Hey Jarvis”)
- Voice-to-Text → Whisper converts speech to text
- Intent Recognition → Ollama processes the request
- Text-to-Speech → Piper speaks the response
Hardware Requirements
Bare Minimum:
- Raspberry Pi 4 (4GB) or equivalent
- USB microphone ($15)
- Speaker or HDMI audio
- Network access to your Home Assistant instance
Recommended:
- Raspberry Pi 5 (8GB) or Intel N100 mini-PC
- Coral TPU (for wake word acceleration)
- ReSpeaker HAT (better microphone array)
- Local GPU recommended for real-time whisper
Performance Notes:
- Whisper tiny: 0.5GB RAM, ~150ms latency
- Whisper base: 0.8GB RAM, ~300ms latency
- Whisper small: 1.2GB RAM, ~800ms latency (best accuracy)
Component 1: Home Assistant Voice Pipeline
Home Assistant 2025+ has native voice support via Assist. Here’s how to set it up:
Install Wyoming Add-ons
# In Home Assistant Add-on Store, add:
# 1. Whisper
# 2. Piper
# 3. openWakeWord
# Configure Whisper (Settings > Add-ons > Whisper):
model: tiny-int8
language: en
# Configure Piper (Settings > Add-ons > Piper):
voice: en_US-amy-medium
# Configure openWakeWord (Settings > Add-ons > openWakeWord):
enable: true
Create Your Voice Pipeline
# Settings > Voice Assistants > Assist
wake_word: "Hey Jarvis" # or "Okay Nabu"
conversation_agent: Ollama # We'll set this up next
stt_engine: Whisper
tts_engine: Piper
Component 2: Local LLM with Ollama
The brain of your assistant is a local LLM. Ollama makes this easy:
Installation
# On your Home Assistant host (or separate server)
curl -fsSL https://ollama.com/install.sh | sh
# Or Docker (recommended for Homelab):
docker run -d \
-v ollama:/root/.ollama \
-p 11434:11434 \
--gpus all \
--name ollama \
ollama/ollama
Choose Your Model
| Model | Size | Speed | Quality | Use Case |
|---|---|---|---|---|
| llama3.2 | 3GB | Fast | Good | General commands |
| phi3 | 3GB | Very Fast | Very Good | Short responses |
| mistral | 4GB | Fast | Excellent | Complex queries |
| qwen2.5 | 3GB | Fast | Good | Multi-turn conversations |
# Pull your model
ollama pull llama3.2
# Make it available to Home Assistant
# Settings > Devices & Services > Add Integration > Ollama
# Server: http://host.docker.internal:11434
Custom System Prompt
Create a personality that matches your needs:
You are Jarvis, a helpful voice assistant for a smart home.
Keep responses brief and conversational.
When asked about home devices, only return the specific command needed.
Never explain your thinking, just execute commands.
Component 3: Whisper for Speech Recognition
Whisper (OpenAI’s model, open-sourced) runs locally with surprising accuracy:
Installation Options
Option A: Home Assistant Add-on (Easiest)
- Settings > Add-ons > Whisper
- Choose model based on your hardware
Option B: Standalone Deployment (Best Performance)
# Using wyoming-whisper
pip install wyoming-whisper
python -m wyoming_whisper \
--uri 'tcp://0.0.0.0:10300' \
--model tiny-int8 \
--language en \
--beam-size 5
Performance Tips
- Quantized models (
tiny-int8,base-int8) use <50% RAM with minimal accuracy loss - CoreML on Apple Silicon is remarkably fast
- CUDA if you have a GPU speeds things up 5-10x
- CPU-only is fine for single-user setups with tiny model
Component 4: Piper TTS
Piper generates surprisingly natural speech locally:
Voice Options
# English voices available:
# - Lessac Blazie (medium quality, fast)
# - Ryan (high quality)
# - Amy (high quality, conversational)
# - LibriTTS RMS (clean, neutral)
# Download and install:
wget https://huggingface.co/cheaper-datasets/piper-voices/resolve/main/en/en_US-amy-medium.onnx
Wyoming Integration
The Piper add-on automatically exposes itself via Wyoming protocol—no manual configuration needed.
Putting It All Together
Step 1: Verify Components
# Test Whisper (STT)
curl -X POST http://ha:8123/api/stt/whisper \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "[email protected]"
# Test Ollama
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "What time is it?"
}'
# Test Piper (TTS)
curl http://localhost:10200/api/tts \
-d '{"text":"Hello from your local voice assistant"}' \
--output response.wav
Step 2: Create Custom Sentences
Teach your assistant domain-specific commands:
# configuration.yaml
intent_script:
MovieTime:
speech:
text: "Starting movie mode. Dimming lights and closing blinds."
action:
- service: scene.turn_on
target:
entity_id: scene.movie_mode
SecurityCheck:
speech:
text: "Checking perimeter. Front door is {{ states('binary_sensor.front_door') }}. Garage is {{ states('cover.garage_door') }}."
Step 3: Advanced Intents with LLM
Let the LLM handle ambiguous requests:
# prompts/assist.j2
{% if user_prompt contains "too cold" %}
{% set temp = state_attr('climate.thermostat', 'temperature') %}
Turn up the heat to {{ temp + 1 }} degrees
{% elif user_prompt contains "bright" %}
Turn off the lights
{% else %}
{{ user_prompt }}
{% endif %}
Voice Satellite Deployment
For rooms without direct HA access, deploy voice satellites:
# Satellite configuration (docker-compose)
version: '3'
services:
wake-word:
image: rhasspy/wyoming-openwakeword
devices:
- /dev/snd:/dev/snd
command: --uri 'tcp://0.0.0.0:10400'
voice-client:
image: rhasspy/wyoming
environment:
- HA_URL=http://homeassistant.local:8123
- HA_TOKEN=your_token
depends_on:
- wake-word
Hardware: ESP32-S3 + INMP441 microphone (~$10 total)
Real-World Performance
After 6 months running this setup:
- Wake word accuracy: 95% (Porcupine is solid)
- STT accuracy: 92% (Whisper base, very good)
- LLM response time: <2 seconds (Phi-3 on N100)
- TTS quality: 8/10 (Piper Amy medium)
- Total monthly cost: $0 (minus electricity)
Comparison vs Alexa:
- Speed: Slight edge to Alexa (network latency vs local processing)
- Accuracy: Very similar for standard commands
- Customization: Self-hosted wins by miles
- Privacy: Not even close—local-only is the winner
Common Failures & Fixes
“It doesn’t respond to the wake word”
- Check openWakeWord is running:
docker logs openwakeword - Verify microphone permissions:
arecord -l - Increase wake word sensitivity if needed
“Whisper is too slow”
- Switch to
tiny-int8model: 8x faster, minimal accuracy loss - Ensure you’re using GPU if available
- Consider a dedicated whisper container on a beefier machine
“Ollama responses are nonsense”
- Your prompts are too ambiguous. Add more context to your system prompt.
- Consider using a larger model (llama3.2 → phi3)
- Add examples to the prompt using few-shot learning
“Audio quality is poor”
- USB microphones are hit-or-miss. ReSpeaker or dedicated audio HAT recommended.
- Check ALSA configuration:
alsamixer - Verify microphone is selected in HA: Settings > System > Audio
“It won’t run offline”
- Ensure all components are local: Whisper, Piper, Ollama
- Check Home Assistant is running in container mode with all add-ons
- Verify no external API calls in your automations
Privacy Checklist
- All processing on local network
- No cloud STT/TTS services
- LLM running locally (Ollama, not OpenAI API)
- Wake word processing local (not streaming audio to cloud)
- Disable Home Assistant Cloud (if not needed for external access)
- Firewall rules preventing external voice service calls
The Future
The gap between local and cloud voice assistants is closing fast. As local LLMs improve and edge compute gets cheaper, self-hosted becomes the obvious choice for privacy-conscious users.
What’s coming:
- Smaller, faster models (Phi-4, Llama 4)
- Better wake word detection (always listening, minimal battery)
- Multilingual support (Whisper handles 99 languages already)
- More natural TTS (Piper is just the beginning)
Your voice, your data, your rules. Build it yourself.
Further Reading:
Comments
Powered by GitHub Discussions