Fix: Langfuse Not Working — SDK Init, Tracing Generations, LangChain Wrapper, and Self-Hosted Setup
Quick Answer
How to fix Langfuse errors — Python/JS SDK init, trace/span/generation hierarchy, LangChain CallbackHandler, OpenAI wrapper, missing usage/cost data, prompt management, and self-hosted Postgres setup.
The Error
You initialize Langfuse but no traces appear in the dashboard:
from langfuse import Langfuse
langfuse = Langfuse()
# Make some LLM calls...
# Dashboard stays empty.Or the LangChain integration doesn’t attach traces:
from langfuse.callback import CallbackHandler
handler = CallbackHandler()
result = llm.invoke("Hello", config={"callbacks": [handler]})
# No trace shows in Langfuse.Or generations have no token usage or cost:
langfuse.generation(
name="gpt-4o-call",
model="gpt-4o",
input=...,
output=...,
# usage missing!
)Or self-hosted Langfuse can’t connect to Postgres:
Error: Connection refused at db:5432Why This Happens
Langfuse is an open-source LLM observability platform. It ingests traces (a call), spans (units of work in a trace), and generations (LLM calls) via SDKs. Most issues map to:
- SDK initialization order. SDKs are async. If your process exits before traces flush, data is lost. Always
flush()before exit. - Env vars vs constructor args. SDKs read
LANGFUSE_SECRET_KEY/LANGFUSE_PUBLIC_KEY/LANGFUSE_HOST. Wrong host (defaults to cloud) or wrong keys silently send to /dev/null. - LangChain integration is a callback. You must pass the
CallbackHandleras a callback when invoking chains/LLMs. Without it, no instrumentation. - Token usage requires explicit data (or use the OpenAI wrapper). Manual generations need
usagefilled in — Langfuse won’t auto-compute.
Fix 1: Initialize the SDK Correctly
Python:
from langfuse import Langfuse
langfuse = Langfuse(
public_key="pk-lf-...",
secret_key="sk-lf-...",
host="https://cloud.langfuse.com", # or "https://us.cloud.langfuse.com" or your self-hosted URL
)Or via env vars (cleaner):
# .env
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.comfrom langfuse import Langfuse
langfuse = Langfuse() # Reads env varsJS:
import { Langfuse } from "langfuse";
const langfuse = new Langfuse({
publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
secretKey: process.env.LANGFUSE_SECRET_KEY!,
baseUrl: process.env.LANGFUSE_HOST,
});Critical: flush before exit. Traces are batched and sent async. Without a flush, short scripts lose data:
# At the end of your script:
langfuse.flush()// JS:
await langfuse.shutdownAsync();For long-running services (web servers), Langfuse flushes periodically — no explicit flush needed. For batch scripts, lambdas, CLIs — always flush.
Pro Tip: Add a process exit handler:
import atexit
atexit.register(langfuse.flush)Catches the common “forgot to flush” mistake.
Fix 2: Trace Hierarchy — Trace → Span → Generation
The three primary primitives:
- Trace — a request/session, holds the overall context.
- Span — a unit of work within a trace (e.g. “retrieve documents,” “format prompt”).
- Generation — an LLM call (special span type with model, input, output, usage).
# Create a trace:
trace = langfuse.trace(name="answer-question", user_id="user-42")
# Add a span:
span = trace.span(name="retrieve-docs", input={"query": "..."})
# ... do work ...
span.end(output=[{"doc": "..."}])
# Add a generation:
generation = trace.generation(
name="generate-answer",
model="gpt-4o",
input=messages,
output={"role": "assistant", "content": "..."},
usage={
"prompt_tokens": 250,
"completion_tokens": 50,
"total_tokens": 300,
},
model_parameters={"temperature": 0.7},
)For nested spans (an LLM call inside a span inside a trace):
trace = langfuse.trace(name="rag-query")
span = trace.span(name="retrieve")
# ... retrieval logic ...
span.end()
generation = trace.generation(
name="answer",
model="gpt-4o",
input=...,
output=...,
)For deeper nesting, span children:
parent_span = trace.span(name="parent")
child_span = parent_span.span(name="child")
child_span.end()
parent_span.end()Common Mistake: Forgetting .end() on spans. Without end, spans show “in progress” forever and duration is wrong.
For convenient decorators:
from langfuse.decorators import observe
@observe()
def my_function(query: str):
docs = retrieve(query)
return generate(query, docs)
# Every call to my_function creates a trace automatically.Fix 3: LangChain Integration
Pass the CallbackHandler to LangChain operations:
from langfuse.callback import CallbackHandler
from langchain.chat_models import ChatOpenAI
langfuse_handler = CallbackHandler(
public_key="pk-lf-...",
secret_key="sk-lf-...",
host="https://cloud.langfuse.com",
)
llm = ChatOpenAI(model="gpt-4o")
result = llm.invoke("Hello", config={"callbacks": [langfuse_handler]})The handler captures the LLM call as a generation, including prompts, response, and token usage (from OpenAI’s response metadata).
For chains:
from langchain.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
chain = ChatPromptTemplate.from_template("Answer: {q}") | llm
result = chain.invoke({"q": "What is Python?"}, config={"callbacks": [langfuse_handler]})The chain’s full structure (each runnable, retrieval steps, LLM calls) traces as a hierarchy.
For LangGraph:
from langfuse.callback import CallbackHandler
config = {"callbacks": [CallbackHandler()]}
result = graph.invoke({"input": "..."}, config=config)LangGraph’s nodes show as nested spans under the graph’s trace.
Common Mistake: Not passing config={"callbacks": [...]}. Without it, LangChain uses the default callback manager (which doesn’t include Langfuse). Always pass per-call or set globally with set_global_handler.
Fix 4: OpenAI / Anthropic Wrappers
The simplest auto-instrumentation: wrap your OpenAI client.
Python:
from langfuse.openai import openai # Drop-in replacement
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
# Logged to Langfuse automatically — no extra code.The langfuse.openai module exports an openai that proxies the real OpenAI SDK and logs every call.
For Anthropic:
from langfuse.anthropic import Anthropic
client = Anthropic(api_key="...")
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)JS:
import OpenAI from "openai";
import { observeOpenAI } from "langfuse";
const openai = observeOpenAI(new OpenAI());
const response = await openai.chat.completions.create({...});These wrappers automatically extract prompt, response, model, and token usage. No manual generation() calls needed.
Pro Tip: Use the wrapper for the bulk of your LLM calls. Reserve manual generation() for cases where the wrapper doesn’t fit (custom protocols, batched calls).
Fix 5: Cost Calculation
Langfuse computes cost from usage + model name. For built-in OpenAI/Anthropic models, costs are pre-defined. For custom models, register them in the Langfuse dashboard:
Dashboard → Settings → Models → Add model
Match pattern: my-custom-model.*
Input price: $0.001 per 1K tokens
Output price: $0.002 per 1K tokensOr via API for self-hosted:
curl -X POST "$LANGFUSE_HOST/api/public/models" \
-u "$LANGFUSE_PUBLIC_KEY:$LANGFUSE_SECRET_KEY" \
-H "content-type: application/json" \
-d '{
"modelName": "my-custom-model",
"matchPattern": "my-custom-model.*",
"inputPrice": 0.000001,
"outputPrice": 0.000002,
"tokenizerModel": "gpt-4",
"unit": "TOKENS"
}'Common Mistake: Logging usage in the wrong field. Langfuse expects prompt_tokens + completion_tokens (OpenAI format). Anthropic’s input_tokens + output_tokens need adaptation:
langfuse.generation(
name="claude-call",
model="claude-3-5-sonnet",
input=...,
output=...,
usage={
"input": response.usage.input_tokens, # Langfuse alias
"output": response.usage.output_tokens,
"total": response.usage.input_tokens + response.usage.output_tokens,
},
)Or use the language-specific keys (input/output for Langfuse, which it maps internally):
Fix 6: Sampling for Cost Control
In production, sample to reduce ingestion volume:
import random
# Sample 10%:
if random.random() < 0.1:
trace = langfuse.trace(...)
# ... full instrumentation
else:
# Run without Langfuse
passOr use Langfuse’s per-trace sampling:
trace = langfuse.trace(
name="...",
sample_rate=0.1, # 10% of traces ingested
)For LangChain:
handler = CallbackHandler(sample_rate=0.1)Sampling is per-trace — once a trace is included, all its spans/generations are. No partial traces.
Pro Tip: Sample at request boundaries (entry points), not at every span. Including a full trace or none keeps the data consistent.
Fix 7: Prompt Management
Manage prompts centrally:
# Fetch the latest version of a prompt:
prompt = langfuse.get_prompt("answer-prompt")
formatted = prompt.compile(question="What is Python?")
# Use `formatted` as the LLM input.Prompts are versioned in Langfuse Dashboard → Prompts. Updates to prompts don’t require code deploys — fetch the latest at runtime.
For caching to avoid hitting Langfuse on every request:
prompt = langfuse.get_prompt("answer-prompt", cache_ttl_seconds=300)
# Cached for 5 minutes.For version pinning:
prompt = langfuse.get_prompt("answer-prompt", version=3)To link traces to prompts:
trace = langfuse.trace(name="...")
generation = trace.generation(
name="...",
prompt=prompt, # Links this generation to the prompt version
model="gpt-4o",
input=...,
output=...,
)Langfuse shows aggregate metrics per prompt version (latency, cost, error rate) — useful for A/B testing prompts.
Fix 8: Self-Hosted Setup
Langfuse is open source and can be self-hosted. Docker Compose:
# docker-compose.yml
services:
db:
image: postgres:16
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
volumes:
- db_data:/var/lib/postgresql/data
langfuse:
image: langfuse/langfuse:latest
depends_on:
- db
ports:
- "3000:3000"
environment:
DATABASE_URL: postgresql://postgres:postgres@db:5432/postgres
NEXTAUTH_SECRET: "your-secret-here"
NEXTAUTH_URL: "http://localhost:3000"
SALT: "your-salt-here"
ENCRYPTION_KEY: "0000000000000000000000000000000000000000000000000000000000000000"
TELEMETRY_ENABLED: "false"
volumes:
db_data:docker compose up -d
# Open http://localhost:3000
# Sign up for the first admin user.Generate the encryption key:
openssl rand -hex 32For production deployment with HA:
- External Postgres (RDS, Cloud SQL, etc.).
- ClickHouse for analytics (newer Langfuse versions).
- Object storage (S3/R2) for large traces.
Point your SDK at the self-hosted URL:
langfuse = Langfuse(host="https://langfuse.example.com")Common Mistake: Forgetting to generate unique NEXTAUTH_SECRET, SALT, ENCRYPTION_KEY. The defaults in docs are placeholders. Generate per-deployment.
Still Not Working?
A few less-obvious failures:
- Traces appear but with wrong project. Multiple API key sets exist. Verify which project your
LANGFUSE_PUBLIC_KEYbelongs to. flush()hangs. Network issue or wrong host. CheckLANGFUSE_HOSTis reachable.- High latency on every request. SDK is sync by default in some configs. Ensure batching is enabled — Langfuse batches every ~1 second by default.
- Some LangChain calls untracked. Specific LangChain components may not call back. Use the
@observe()decorator to wrap them manually. - Token counts wrong. Some models (Anthropic, Bedrock) report differently. Map the response’s usage to Langfuse’s expected format.
- Streaming generations lose output. Stream events fire incrementally; aggregate the final output before logging. The OpenAI wrapper handles this; manual code must accumulate.
- Self-hosted Langfuse out of disk. Traces accumulate. Set retention policies in Dashboard → Settings, or drop old data from Postgres directly.
- CallbackHandler not capturing async ops. For async LangChain, use the async handler form:
from langfuse.callback import AsyncCallbackHandler(older versions) or just passCallbackHandler— recent versions support both.
For related LLM observability and tracing issues, see LangChain Python not working, LiteLLM not working, OpenAI API not working, and Sentry not working.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: DSPy Not Working — LM Configuration, Signatures, Modules, Optimizers, and Cache Surprises
How to fix DSPy errors — no LM configured, signature field types, ChainOfThought vs Predict, optimizer (MIPROv2) setup, retrieval module wiring, async usage, and cache invalidation between runs.
Fix: Instructor Not Working — Validation Loops, Mode Mismatch, Streaming, and Anthropic / Gemini Issues
How to fix Python Instructor errors — ValidationError loops, max_retries exhausted, mode=Mode.TOOLS vs JSON, partial streaming type errors, Anthropic and Gemini client patching, token usage tracking.
Fix: LiteLLM Not Working — Model Name Format, API Keys, Streaming, and Fallback Errors
How to fix LiteLLM errors — BadRequestError model not found, missing API key env vars, streaming chunk differences, fallback model not triggering, async drop_params, and proxy server 401.
Fix: LangGraph Not Working — State Errors, Checkpointer Setup, and Cyclic Graph Failures
How to fix LangGraph errors — state not updating between nodes, checkpointer thread_id required, StateGraph compile error, conditional edges not routing, streaming events missing, recursion limit exceeded, and interrupt handling.