Self-Host Langfuse: Complete LLM Observability for Your Homelab

Running LLMs at home means tracking every token, debugging agent loops, and understanding exactly what your models are doing. Langfuse brings enterprise-grade LLM observability to your homelab without sending data to third parties.

Why LLM Observability Matters

Every time you call an LLM—whether it’s a simple chat completion or a multi-step agent workflow—you’re dealing with:

Cost: API calls cost money; local models consume electricity
Latency: Complex chains can take seconds or minutes
Quality: Are outputs meeting expectations?
Debugging: When agents fail, why did they fail?

Without observability, you’re flying blind. Langfuse gives you visibility into all of this.

What is Langfuse?

Langfuse is an open-source LLM engineering platform that provides:

Tracing: End-to-end visualization of LLM application flows
Prompt Management: Version control for prompts with testing and deployment
Evaluation: Human and automated scoring for output quality
Token Tracking: Monitor usage across all your LLM applications
Agent Observability: Visualize multi-step agent executions

The best part? It’s completely self-hostable. Run it in your homelab and keep your data private.

Langfuse dashboard showing traces and metrics Langfuse gives you complete visibility into every LLM call

Architecture: How Langfuse Works

Langfuse v3 uses a modern microservices architecture designed for production scale:

Langfuse architecture diagram showing components Langfuse v3 architecture with dedicated containers for scalability

Core Components

Component	Purpose
Langfuse Web	UI and API server
Langfuse Worker	Async event processing
Postgres	Transactional data (users, projects)
Clickhouse	Observability data (traces, scores)
Redis/Valkey	Caching and queue operations
S3/Blob Storage	Raw events and multi-modal data

This architecture means:

High ingestion throughput without timeouts
Fast analytical queries via Clickhouse
Recoverable events (stored in S3 first)
Background migrations for smooth upgrades

Homelab Deployment with Docker Compose

Let’s get Langfuse running in your homelab. The Docker Compose setup is perfect for homelab scale—single VM, full functionality.

System Requirements

Scale	CPU	RAM	Disk
Testing	2 cores	8 GiB	50 GiB
Production	4+ cores	16 GiB	100 GiB

Observability data grows fast. Plan your storage accordingly.

Step 1: Clone and Configure

# Clone the Langfuse repository
git clone https://github.com/langfuse/langfuse.git
cd langfuse

# Open docker-compose.yml and update all secrets marked with # CHANGEME
# Use strong, random passwords for production
nano docker-compose.yml

The key secrets to change:

SALT_01, SALT_02, SALT_03 - Used for encryption
ENCRYPTION_KEY - 32-byte key for sensitive data
POSTGRES_PASSWORD - Database password
Various API secrets

Step 2: Start Langfuse

# Start all services
docker compose up -d

# Watch the logs
docker compose logs -f langfuse-web

Wait about 2-3 minutes. When you see “Ready” in the logs, you’re good to go.

Step 3: Access the UI

Open http://localhost:3000 (or http://your-vm-ip:3000 for a remote server).

Create your first organization and project. You’ll get API keys for your applications.

Langfuse setup screen Create your first project to start collecting traces

Integrating with Your LLM Applications

Langfuse provides SDKs for Python and JavaScript/TypeScript. Here’s how to integrate with a local Ollama setup.

Python SDK Setup

from langfuse import Langfuse

# Initialize with your project keys
langfuse = Langfuse(
    public_key="pk-xxx",
    secret_key="sk-xxx",
    host="http://your-langfuse-server:3000"
)

# Trace a generation
trace = langfuse.trace(
    name="chat-completion",
    metadata={"model": "llama3.2", "source": "ollama"}
)

generation = trace.generation(
    name="llm-call",
    model="llama3.2",
    input=[{"role": "user", "content": "Hello!"}],
    output={"role": "assistant", "content": "Hi there!"}
)

# Flush to ensure data is sent
langfuse.flush()

With OpenAI-Compatible APIs

If you’re using Ollama’s OpenAI-compatible endpoint or another compatible service:

# Use the callback handler for automatic tracing
from langfuse.openai import OpenAI
from openai import OpenAI as BaseOpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",  # Ollama
    api_key="ollama"  # Required but not used
)

# Traces are automatically sent to Langfuse
response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

JavaScript/TypeScript SDK

import { Langfuse } from "langfuse";

const langfuse = new Langfuse({
  publicKey: "pk-xxx",
  secretKey: "sk-xxx",
  baseUrl: "http://your-langfuse-server:3000"
});

const trace = langfuse.trace({
  name: "agent-workflow",
  metadata: { agent: "my-agent" }
});

Key Features for Homelab Use

Token Tracking

Monitor exactly how many tokens your applications consume:

Input/output tokens per request
Daily/weekly/monthly aggregation
Cost estimation (configure your token costs)
Breakdown by model and application

Token tracking dashboard Track token usage across all your LLM applications

Prompt Versioning

Managing prompts is like managing code:

# Get the current version of a prompt
prompt = langfuse.get_prompt("my-prompt-name")

# Use it in your application
formatted = prompt.compile({"variable": "value"})

# Prompts are cached client-side for performance

Features:

Version history with diff views
A/B testing different prompts
Deploy to production without code changes
Rollback if something breaks

Session-Based Tracing

Group traces by user session to understand journeys:

trace = langfuse.trace(
    name="user-session",
    id="user-123-session-abc",
    user_id="user-123",
    session_id="session-abc"
)

Perfect for debugging multi-turn conversations or agent workflows.

Score-based Evaluation

Add quality metrics to your traces:

# Human evaluation
trace.score(name="helpfulness", value=5)

# Automated evaluation
trace.score(name="hallucination-check", value=0.2, data_type="NUMERIC")

Create dashboards to track quality over time.

Dashboard Views

Langfuse provides several built-in views:

Sessions View

Group traces by session or user
See the full conversation flow
Filter by date, model, metadata

Traces View

Individual trace inspection
Nested spans for complex operations
Input/output inspection
Timing breakdown

Generations View

Filter by model
See prompt used
Compare inputs/outputs
Token counts and latency

Scores View

Evaluation results
Trend analysis
Filter by score name

Production Considerations

Security

Authentication: Enable email/password or SSO:

# In docker-compose.yml environment
AUTH_DISABLE_LOCAL_AUTH: "false"  # For local auth

For SSO with Google:

AUTH_GOOGLE_CLIENT_ID: "your-client-id"
AUTH_GOOGLE_CLIENT_SECRET: "your-secret"

Encryption: All sensitive data is encrypted at rest using the ENCRYPTION_KEY you configured.

Network Isolation: Run Langfuse in an air-gapped environment with no internet access. Perfect for sensitive homelab data.

Data Retention

Observability data grows. Plan for retention:

Clickhouse stores all traces/scores
S3/MinIO stores raw events
Configure retention policies based on your needs

Backups

Docker Compose doesn’t include automated backups. For production:

Database backups: Regular Postgres dumps
Clickhouse backups: Use Clickhouse backup tools
S3/MinIO: Configure object versioning

Scaling

The Docker Compose setup scales vertically. For horizontal scaling (high availability), migrate to Kubernetes.

Integration with Homelab Stack

Langfuse fits naturally into a monitoring stack:

Tool	Purpose
Langfuse	LLM observability
Prometheus/Grafana	Infrastructure metrics
Uptime Kuma	Uptime monitoring
Loki	Log aggregation

All self-hosted, all under your control.

Langfuse vs. Alternatives

Feature	Langfuse	LangSmith	Helicone	W&B
Self-Hosted	✅ Full features	❌	⚠️ Limited	❌
Open Source	✅ MIT	❌	⚠️ Partial	❌
Prompt Management	✅	✅	❌	✅
Agent Tracing	✅	✅	✅	✅
Evaluations	✅	✅	❌	✅
Cost	Free	Paid	Freemium	Freemium

For homelab use, Langfuse is the clear winner: fully open-source, fully self-hosted, no usage limits.

Best Practices for Homelab Scale

1. Start Small, Grow as Needed

Docker Compose on a single VM is perfect for exploration. Scale to Kubernetes only when you need high availability.

2. Configure Retention Early

Observability data accumulates fast. Set retention policies before disk fills up.

3. Use Prompt Versioning

Treat prompts like code. Version them, test changes, and deploy confidently.

4. Add Custom Metadata

trace = langfuse.trace(
    name="my-app",
    metadata={
        "environment": "homelab",
        "server": "proxmox-vm-3",
        "model_version": "llama3.2-3b"
    }
)

Filter traces by metadata for powerful debugging.

5. Set Up Alerts for Cost Spikes

If someone accidentally creates an infinite loop in an agent, you want to know before your API bill explodes.

When to Choose Cloud Over Self-Hosted

Consider Langfuse Cloud if:

You don’t want to manage infrastructure
Your team needs zero-setup collaboration
You need managed backups and updates
You want support contracts

Self-hosted is better when:

Data must stay on-premises
You have homelab infrastructure ready
You want zero per-seat/usage costs
Complete control is required

Getting Started Checklist

Conclusion

Langfuse brings enterprise LLM observability to your homelab without the enterprise price tag. With Docker Compose deployment, you can be up and running in minutes, gaining visibility into every token, prompt, and agent execution.

For homelab enthusiasts running local LLMs—whether Ollama, LocalAI, or cloud APIs—the ability to trace, evaluate, and optimize is invaluable. Langfuse self-hosted means complete control, zero usage fees, and your data never leaves your network.

Next Steps: Check out the Langfuse documentation for SDK integration guides, evaluation strategies, and advanced configuration options.

Have questions about self-hosting Langfuse? Join the Langfuse Discord or check the GitHub discussions.