Skip to content

Fix: Langfuse Not Working — SDK Init, Tracing Generations, LangChain Wrapper, and Self-Hosted Setup

FixDevs ·

Quick Answer

How to fix Langfuse errors — Python/JS SDK init, trace/span/generation hierarchy, LangChain CallbackHandler, OpenAI wrapper, missing usage/cost data, prompt management, and self-hosted Postgres setup.

The Error

You initialize Langfuse but no traces appear in the dashboard:

from langfuse import Langfuse
langfuse = Langfuse()

# Make some LLM calls...
# Dashboard stays empty.

Or the LangChain integration doesn’t attach traces:

from langfuse.callback import CallbackHandler
handler = CallbackHandler()

result = llm.invoke("Hello", config={"callbacks": [handler]})
# No trace shows in Langfuse.

Or generations have no token usage or cost:

langfuse.generation(
    name="gpt-4o-call",
    model="gpt-4o",
    input=...,
    output=...,
    # usage missing!
)

Or self-hosted Langfuse can’t connect to Postgres:

Error: Connection refused at db:5432

Why This Happens

Langfuse is an open-source LLM observability platform. It ingests traces (a call), spans (units of work in a trace), and generations (LLM calls) via SDKs. Most issues map to:

  • SDK initialization order. SDKs are async. If your process exits before traces flush, data is lost. Always flush() before exit.
  • Env vars vs constructor args. SDKs read LANGFUSE_SECRET_KEY / LANGFUSE_PUBLIC_KEY / LANGFUSE_HOST. Wrong host (defaults to cloud) or wrong keys silently send to /dev/null.
  • LangChain integration is a callback. You must pass the CallbackHandler as a callback when invoking chains/LLMs. Without it, no instrumentation.
  • Token usage requires explicit data (or use the OpenAI wrapper). Manual generations need usage filled in — Langfuse won’t auto-compute.

Fix 1: Initialize the SDK Correctly

Python:

from langfuse import Langfuse

langfuse = Langfuse(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="https://cloud.langfuse.com",  # or "https://us.cloud.langfuse.com" or your self-hosted URL
)

Or via env vars (cleaner):

# .env
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com
from langfuse import Langfuse
langfuse = Langfuse()  # Reads env vars

JS:

import { Langfuse } from "langfuse";

const langfuse = new Langfuse({
  publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
  secretKey: process.env.LANGFUSE_SECRET_KEY!,
  baseUrl: process.env.LANGFUSE_HOST,
});

Critical: flush before exit. Traces are batched and sent async. Without a flush, short scripts lose data:

# At the end of your script:
langfuse.flush()
// JS:
await langfuse.shutdownAsync();

For long-running services (web servers), Langfuse flushes periodically — no explicit flush needed. For batch scripts, lambdas, CLIs — always flush.

Pro Tip: Add a process exit handler:

import atexit
atexit.register(langfuse.flush)

Catches the common “forgot to flush” mistake.

Fix 2: Trace Hierarchy — Trace → Span → Generation

The three primary primitives:

  • Trace — a request/session, holds the overall context.
  • Span — a unit of work within a trace (e.g. “retrieve documents,” “format prompt”).
  • Generation — an LLM call (special span type with model, input, output, usage).
# Create a trace:
trace = langfuse.trace(name="answer-question", user_id="user-42")

# Add a span:
span = trace.span(name="retrieve-docs", input={"query": "..."})
# ... do work ...
span.end(output=[{"doc": "..."}])

# Add a generation:
generation = trace.generation(
    name="generate-answer",
    model="gpt-4o",
    input=messages,
    output={"role": "assistant", "content": "..."},
    usage={
        "prompt_tokens": 250,
        "completion_tokens": 50,
        "total_tokens": 300,
    },
    model_parameters={"temperature": 0.7},
)

For nested spans (an LLM call inside a span inside a trace):

trace = langfuse.trace(name="rag-query")

span = trace.span(name="retrieve")
# ... retrieval logic ...
span.end()

generation = trace.generation(
    name="answer",
    model="gpt-4o",
    input=...,
    output=...,
)

For deeper nesting, span children:

parent_span = trace.span(name="parent")
child_span = parent_span.span(name="child")
child_span.end()
parent_span.end()

Common Mistake: Forgetting .end() on spans. Without end, spans show “in progress” forever and duration is wrong.

For convenient decorators:

from langfuse.decorators import observe

@observe()
def my_function(query: str):
    docs = retrieve(query)
    return generate(query, docs)

# Every call to my_function creates a trace automatically.

Fix 3: LangChain Integration

Pass the CallbackHandler to LangChain operations:

from langfuse.callback import CallbackHandler
from langchain.chat_models import ChatOpenAI

langfuse_handler = CallbackHandler(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="https://cloud.langfuse.com",
)

llm = ChatOpenAI(model="gpt-4o")
result = llm.invoke("Hello", config={"callbacks": [langfuse_handler]})

The handler captures the LLM call as a generation, including prompts, response, and token usage (from OpenAI’s response metadata).

For chains:

from langchain.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

chain = ChatPromptTemplate.from_template("Answer: {q}") | llm

result = chain.invoke({"q": "What is Python?"}, config={"callbacks": [langfuse_handler]})

The chain’s full structure (each runnable, retrieval steps, LLM calls) traces as a hierarchy.

For LangGraph:

from langfuse.callback import CallbackHandler

config = {"callbacks": [CallbackHandler()]}
result = graph.invoke({"input": "..."}, config=config)

LangGraph’s nodes show as nested spans under the graph’s trace.

Common Mistake: Not passing config={"callbacks": [...]}. Without it, LangChain uses the default callback manager (which doesn’t include Langfuse). Always pass per-call or set globally with set_global_handler.

Fix 4: OpenAI / Anthropic Wrappers

The simplest auto-instrumentation: wrap your OpenAI client.

Python:

from langfuse.openai import openai  # Drop-in replacement

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
# Logged to Langfuse automatically — no extra code.

The langfuse.openai module exports an openai that proxies the real OpenAI SDK and logs every call.

For Anthropic:

from langfuse.anthropic import Anthropic
client = Anthropic(api_key="...")

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)

JS:

import OpenAI from "openai";
import { observeOpenAI } from "langfuse";

const openai = observeOpenAI(new OpenAI());
const response = await openai.chat.completions.create({...});

These wrappers automatically extract prompt, response, model, and token usage. No manual generation() calls needed.

Pro Tip: Use the wrapper for the bulk of your LLM calls. Reserve manual generation() for cases where the wrapper doesn’t fit (custom protocols, batched calls).

Fix 5: Cost Calculation

Langfuse computes cost from usage + model name. For built-in OpenAI/Anthropic models, costs are pre-defined. For custom models, register them in the Langfuse dashboard:

Dashboard → Settings → Models → Add model
  Match pattern: my-custom-model.*
  Input price: $0.001 per 1K tokens
  Output price: $0.002 per 1K tokens

Or via API for self-hosted:

curl -X POST "$LANGFUSE_HOST/api/public/models" \
  -u "$LANGFUSE_PUBLIC_KEY:$LANGFUSE_SECRET_KEY" \
  -H "content-type: application/json" \
  -d '{
    "modelName": "my-custom-model",
    "matchPattern": "my-custom-model.*",
    "inputPrice": 0.000001,
    "outputPrice": 0.000002,
    "tokenizerModel": "gpt-4",
    "unit": "TOKENS"
  }'

Common Mistake: Logging usage in the wrong field. Langfuse expects prompt_tokens + completion_tokens (OpenAI format). Anthropic’s input_tokens + output_tokens need adaptation:

langfuse.generation(
    name="claude-call",
    model="claude-3-5-sonnet",
    input=...,
    output=...,
    usage={
        "input": response.usage.input_tokens,    # Langfuse alias
        "output": response.usage.output_tokens,
        "total": response.usage.input_tokens + response.usage.output_tokens,
    },
)

Or use the language-specific keys (input/output for Langfuse, which it maps internally):

Fix 6: Sampling for Cost Control

In production, sample to reduce ingestion volume:

import random

# Sample 10%:
if random.random() < 0.1:
    trace = langfuse.trace(...)
    # ... full instrumentation
else:
    # Run without Langfuse
    pass

Or use Langfuse’s per-trace sampling:

trace = langfuse.trace(
    name="...",
    sample_rate=0.1,  # 10% of traces ingested
)

For LangChain:

handler = CallbackHandler(sample_rate=0.1)

Sampling is per-trace — once a trace is included, all its spans/generations are. No partial traces.

Pro Tip: Sample at request boundaries (entry points), not at every span. Including a full trace or none keeps the data consistent.

Fix 7: Prompt Management

Manage prompts centrally:

# Fetch the latest version of a prompt:
prompt = langfuse.get_prompt("answer-prompt")
formatted = prompt.compile(question="What is Python?")
# Use `formatted` as the LLM input.

Prompts are versioned in Langfuse Dashboard → Prompts. Updates to prompts don’t require code deploys — fetch the latest at runtime.

For caching to avoid hitting Langfuse on every request:

prompt = langfuse.get_prompt("answer-prompt", cache_ttl_seconds=300)
# Cached for 5 minutes.

For version pinning:

prompt = langfuse.get_prompt("answer-prompt", version=3)

To link traces to prompts:

trace = langfuse.trace(name="...")
generation = trace.generation(
    name="...",
    prompt=prompt,  # Links this generation to the prompt version
    model="gpt-4o",
    input=...,
    output=...,
)

Langfuse shows aggregate metrics per prompt version (latency, cost, error rate) — useful for A/B testing prompts.

Fix 8: Self-Hosted Setup

Langfuse is open source and can be self-hosted. Docker Compose:

# docker-compose.yml
services:
  db:
    image: postgres:16
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: postgres
    volumes:
      - db_data:/var/lib/postgresql/data

  langfuse:
    image: langfuse/langfuse:latest
    depends_on:
      - db
    ports:
      - "3000:3000"
    environment:
      DATABASE_URL: postgresql://postgres:postgres@db:5432/postgres
      NEXTAUTH_SECRET: "your-secret-here"
      NEXTAUTH_URL: "http://localhost:3000"
      SALT: "your-salt-here"
      ENCRYPTION_KEY: "0000000000000000000000000000000000000000000000000000000000000000"
      TELEMETRY_ENABLED: "false"

volumes:
  db_data:
docker compose up -d
# Open http://localhost:3000
# Sign up for the first admin user.

Generate the encryption key:

openssl rand -hex 32

For production deployment with HA:

  • External Postgres (RDS, Cloud SQL, etc.).
  • ClickHouse for analytics (newer Langfuse versions).
  • Object storage (S3/R2) for large traces.

Point your SDK at the self-hosted URL:

langfuse = Langfuse(host="https://langfuse.example.com")

Common Mistake: Forgetting to generate unique NEXTAUTH_SECRET, SALT, ENCRYPTION_KEY. The defaults in docs are placeholders. Generate per-deployment.

Still Not Working?

A few less-obvious failures:

  • Traces appear but with wrong project. Multiple API key sets exist. Verify which project your LANGFUSE_PUBLIC_KEY belongs to.
  • flush() hangs. Network issue or wrong host. Check LANGFUSE_HOST is reachable.
  • High latency on every request. SDK is sync by default in some configs. Ensure batching is enabled — Langfuse batches every ~1 second by default.
  • Some LangChain calls untracked. Specific LangChain components may not call back. Use the @observe() decorator to wrap them manually.
  • Token counts wrong. Some models (Anthropic, Bedrock) report differently. Map the response’s usage to Langfuse’s expected format.
  • Streaming generations lose output. Stream events fire incrementally; aggregate the final output before logging. The OpenAI wrapper handles this; manual code must accumulate.
  • Self-hosted Langfuse out of disk. Traces accumulate. Set retention policies in Dashboard → Settings, or drop old data from Postgres directly.
  • CallbackHandler not capturing async ops. For async LangChain, use the async handler form: from langfuse.callback import AsyncCallbackHandler (older versions) or just pass CallbackHandler — recent versions support both.

For related LLM observability and tracing issues, see LangChain Python not working, LiteLLM not working, OpenAI API not working, and Sentry not working.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles