Self-Hosted Voice Assistant in 2026: Building Your Private Jarvis with Home Assistant + Local LLM

A complete guide to building a private voice assistant using Home Assistant and local LLMs like Ollama, featuring offline voice control with maximum privacy.

• 7 min read
home-assistantvoiceollamaprivassistantwhisperpiper
Self-Hosted Voice Assistant in 2026: Building Your Private Jarvis with Home Assistant + Local LLM

Alexa and Google Assistant are convenient—until you realize they’re listening to everything. In 2026, you can build your own private voice assistant that rivals commercial offerings while keeping your data local. Let’s build a self-hosted voice assistant that actually works.

Why Go Private?

Commercial voice assistants have a fundamental problem: you’re renting trust. Your voice data is processed on someone else’s servers, often stored indefinitely, and used to train models you’ll never see.

The privacy trade-off isn’t worth it anymore. With modern open-source tools, you can achieve 95% of Alexa’s functionality while keeping everything on your network.

FeatureAlexa/GoogleSelf-Hosted
Voice ProcessingCloudLocal Whisper
Response GenerationCloud LLMLocal Ollama
Text-to-SpeechCloudPiper TTS
Your DataTheirsYours
Offline Use
Custom CommandsLimitedUnlimited

The Architecture

Your self-hosted voice assistant has four components:

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Microphone  │────▶│   Whisper    │────▶│   Ollama     │
│   (Audio)     │     │   (STT)      │     │   (LLM)      │
└──────────────┘     └──────────────┘     └──────────────┘
       │                                              │
       │                                              ▼
┌──────────────┐     ┌──────────────┐          ┌──────────────┐
│   Wake Word  │────▶│   Piper TTS  │◀─────────│   Response   │
│   (Porcupine)│     │   (TTS)      │          │   (Intent)   │
└──────────────┘     └──────────────┘          └──────────────┘

Wyoming Protocol (Home Assistant’s voice pipeline) chains these together seamlessly:

  1. Wake Word → Activates the pipeline (e.g., “Hey Jarvis”)
  2. Voice-to-Text → Whisper converts speech to text
  3. Intent Recognition → Ollama processes the request
  4. Text-to-Speech → Piper speaks the response

Hardware Requirements

Bare Minimum:

  • Raspberry Pi 4 (4GB) or equivalent
  • USB microphone ($15)
  • Speaker or HDMI audio
  • Network access to your Home Assistant instance

Recommended:

  • Raspberry Pi 5 (8GB) or Intel N100 mini-PC
  • Coral TPU (for wake word acceleration)
  • ReSpeaker HAT (better microphone array)
  • Local GPU recommended for real-time whisper

Performance Notes:

  • Whisper tiny: 0.5GB RAM, ~150ms latency
  • Whisper base: 0.8GB RAM, ~300ms latency
  • Whisper small: 1.2GB RAM, ~800ms latency (best accuracy)

Component 1: Home Assistant Voice Pipeline

Home Assistant 2025+ has native voice support via Assist. Here’s how to set it up:

Install Wyoming Add-ons

# In Home Assistant Add-on Store, add:
# 1. Whisper
# 2. Piper
# 3. openWakeWord

# Configure Whisper (Settings > Add-ons > Whisper):
model: tiny-int8
language: en

# Configure Piper (Settings > Add-ons > Piper):
voice: en_US-amy-medium

# Configure openWakeWord (Settings > Add-ons > openWakeWord):
enable: true

Create Your Voice Pipeline

# Settings > Voice Assistants > Assist
wake_word: "Hey Jarvis"  # or "Okay Nabu"
conversation_agent: Ollama  # We'll set this up next
stt_engine: Whisper
tts_engine: Piper

Component 2: Local LLM with Ollama

The brain of your assistant is a local LLM. Ollama makes this easy:

Installation

# On your Home Assistant host (or separate server)
curl -fsSL https://ollama.com/install.sh | sh

# Or Docker (recommended for Homelab):
docker run -d \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --gpus all \
  --name ollama \
  ollama/ollama

Choose Your Model

ModelSizeSpeedQualityUse Case
llama3.23GBFastGoodGeneral commands
phi33GBVery FastVery GoodShort responses
mistral4GBFastExcellentComplex queries
qwen2.53GBFastGoodMulti-turn conversations
# Pull your model
ollama pull llama3.2

# Make it available to Home Assistant
# Settings > Devices & Services > Add Integration > Ollama
# Server: http://host.docker.internal:11434

Custom System Prompt

Create a personality that matches your needs:

You are Jarvis, a helpful voice assistant for a smart home.
Keep responses brief and conversational.
When asked about home devices, only return the specific command needed.
Never explain your thinking, just execute commands.

Component 3: Whisper for Speech Recognition

Whisper (OpenAI’s model, open-sourced) runs locally with surprising accuracy:

Installation Options

Option A: Home Assistant Add-on (Easiest)

  • Settings > Add-ons > Whisper
  • Choose model based on your hardware

Option B: Standalone Deployment (Best Performance)

# Using wyoming-whisper
pip install wyoming-whisper

python -m wyoming_whisper \
  --uri 'tcp://0.0.0.0:10300' \
  --model tiny-int8 \
  --language en \
  --beam-size 5

Performance Tips

  1. Quantized models (tiny-int8, base-int8) use <50% RAM with minimal accuracy loss
  2. CoreML on Apple Silicon is remarkably fast
  3. CUDA if you have a GPU speeds things up 5-10x
  4. CPU-only is fine for single-user setups with tiny model

Component 4: Piper TTS

Piper generates surprisingly natural speech locally:

Voice Options

# English voices available:
# - Lessac Blazie (medium quality, fast)
# - Ryan (high quality)
# - Amy (high quality, conversational)
# - LibriTTS RMS (clean, neutral)

# Download and install:
wget https://huggingface.co/cheaper-datasets/piper-voices/resolve/main/en/en_US-amy-medium.onnx

Wyoming Integration

The Piper add-on automatically exposes itself via Wyoming protocol—no manual configuration needed.

Putting It All Together

Step 1: Verify Components

# Test Whisper (STT)
curl -X POST http://ha:8123/api/stt/whisper \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "[email protected]"

# Test Ollama
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "What time is it?"
}'

# Test Piper (TTS)
curl http://localhost:10200/api/tts \
  -d '{"text":"Hello from your local voice assistant"}' \
  --output response.wav

Step 2: Create Custom Sentences

Teach your assistant domain-specific commands:

# configuration.yaml
intent_script:
  MovieTime:
    speech:
      text: "Starting movie mode. Dimming lights and closing blinds."
    action:
      - service: scene.turn_on
        target:
          entity_id: scene.movie_mode
      
  SecurityCheck:
    speech:
      text: "Checking perimeter. Front door is {{ states('binary_sensor.front_door') }}. Garage is {{ states('cover.garage_door') }}."

Step 3: Advanced Intents with LLM

Let the LLM handle ambiguous requests:

# prompts/assist.j2
{% if user_prompt contains "too cold" %}
  {% set temp = state_attr('climate.thermostat', 'temperature') %}
  Turn up the heat to {{ temp + 1 }} degrees
{% elif user_prompt contains "bright" %}
  Turn off the lights
{% else %}
  {{ user_prompt }}
{% endif %}

Voice Satellite Deployment

For rooms without direct HA access, deploy voice satellites:

# Satellite configuration (docker-compose)
version: '3'
services:
  wake-word:
    image: rhasspy/wyoming-openwakeword
    devices:
      - /dev/snd:/dev/snd
    command: --uri 'tcp://0.0.0.0:10400'

  voice-client:
    image: rhasspy/wyoming
    environment:
      - HA_URL=http://homeassistant.local:8123
      - HA_TOKEN=your_token
    depends_on:
      - wake-word

Hardware: ESP32-S3 + INMP441 microphone (~$10 total)

Real-World Performance

After 6 months running this setup:

  • Wake word accuracy: 95% (Porcupine is solid)
  • STT accuracy: 92% (Whisper base, very good)
  • LLM response time: <2 seconds (Phi-3 on N100)
  • TTS quality: 8/10 (Piper Amy medium)
  • Total monthly cost: $0 (minus electricity)

Comparison vs Alexa:

  • Speed: Slight edge to Alexa (network latency vs local processing)
  • Accuracy: Very similar for standard commands
  • Customization: Self-hosted wins by miles
  • Privacy: Not even close—local-only is the winner

Common Failures & Fixes

“It doesn’t respond to the wake word”

  • Check openWakeWord is running: docker logs openwakeword
  • Verify microphone permissions: arecord -l
  • Increase wake word sensitivity if needed

“Whisper is too slow”

  • Switch to tiny-int8 model: 8x faster, minimal accuracy loss
  • Ensure you’re using GPU if available
  • Consider a dedicated whisper container on a beefier machine

“Ollama responses are nonsense”

  • Your prompts are too ambiguous. Add more context to your system prompt.
  • Consider using a larger model (llama3.2 → phi3)
  • Add examples to the prompt using few-shot learning

“Audio quality is poor”

  • USB microphones are hit-or-miss. ReSpeaker or dedicated audio HAT recommended.
  • Check ALSA configuration: alsamixer
  • Verify microphone is selected in HA: Settings > System > Audio

“It won’t run offline”

  • Ensure all components are local: Whisper, Piper, Ollama
  • Check Home Assistant is running in container mode with all add-ons
  • Verify no external API calls in your automations

Privacy Checklist

  • All processing on local network
  • No cloud STT/TTS services
  • LLM running locally (Ollama, not OpenAI API)
  • Wake word processing local (not streaming audio to cloud)
  • Disable Home Assistant Cloud (if not needed for external access)
  • Firewall rules preventing external voice service calls

The Future

The gap between local and cloud voice assistants is closing fast. As local LLMs improve and edge compute gets cheaper, self-hosted becomes the obvious choice for privacy-conscious users.

What’s coming:

  • Smaller, faster models (Phi-4, Llama 4)
  • Better wake word detection (always listening, minimal battery)
  • Multilingual support (Whisper handles 99 languages already)
  • More natural TTS (Piper is just the beginning)

Your voice, your data, your rules. Build it yourself.

Further Reading:

Anthony Lattanzio

Anthony Lattanzio

Tech Enthusiast & Builder

I'm a tech enthusiast who loves building things with hardware and software. By night, I run a homelab that's grown way beyond what any reasonable person needs. Check out about me for more.

Comments

Powered by GitHub Discussions