Claude Code for Langfuse LLM Analytics — Guide
The Setup
You are monitoring your LLM application with Langfuse, an open-source observability platform for LLM apps. Langfuse traces every LLM call, tracks costs, measures latency, captures user feedback, and provides analytics dashboards. Claude Code can instrument LLM applications, but it generates basic console.log debugging instead of structured observability with Langfuse.
What Claude Code Gets Wrong By Default
-
Logs LLM calls to console. Claude adds
console.log(response)for debugging. Langfuse provides structured tracing with spans, generations, and metadata — console logs are unstructured and unsearchable. -
Calculates costs manually. Claude writes token counting and cost calculation code. Langfuse automatically tracks token usage and costs per model — it knows pricing for major providers and calculates costs from usage data.
-
Ignores trace hierarchy. Claude treats each LLM call independently. Langfuse uses traces (a complete user interaction), spans (logical steps), and generations (individual LLM calls) — this hierarchy shows how components interact.
-
Does not capture user feedback. Claude has no feedback mechanism. Langfuse provides a scores API for capturing thumbs up/down, ratings, or automated evaluation scores linked to specific traces.
The CLAUDE.md Configuration
# Langfuse LLM Observability
## Monitoring
- Platform: Langfuse (open-source LLM observability)
- Tracing: traces, spans, generations hierarchy
- Costs: automatic token and cost tracking
- Feedback: scores API for user ratings
## Langfuse Rules
- SDK: langfuse Python/JS SDK or OpenAI wrapper
- Trace: one trace per user interaction
- Span: logical steps within a trace
- Generation: individual LLM API calls
- Scores: attach feedback to traces
- OpenAI wrapper: from langfuse.openai import openai
- Flush: langfuse.flush() before process exit
## Conventions
- Initialize Langfuse with LANGFUSE_PUBLIC_KEY and SECRET_KEY
- Use @observe() decorator for automatic tracing (Python)
- Use trace.generation() for LLM calls within a trace
- Add metadata: user_id, session_id, tags
- Capture input/output for each generation
- Score traces with user feedback
- Self-hosted: Docker Compose for Langfuse server
Workflow Example
You want to add Langfuse tracing to a RAG chatbot. Prompt Claude Code:
“Add Langfuse observability to our RAG chatbot. Trace each user message as a trace, the retrieval step as a span, and the LLM generation as a generation. Include the retrieved documents as metadata and capture the user’s feedback score. Use the Python SDK.”
Claude Code should initialize the Langfuse client, create a trace with user_id and session_id, add a span for document retrieval with retrieved docs as metadata, create a generation for the LLM call with model name and token usage, and expose a feedback endpoint that calls langfuse.score() linked to the trace ID.
Common Pitfalls
-
Not flushing before process exit. Claude does not call
langfuse.flush()at the end. Langfuse batches events and sends them asynchronously — if the process exits before flush, traces are lost. Always flush in serverless functions and before shutdown. -
Missing session grouping. Claude creates individual traces without session context. Multi-turn conversations should share a
session_idso Langfuse groups them together for analysis. -
Tracing in production without sampling. Claude traces every single request. In high-traffic production, trace a percentage of requests with
sample_rate=0.1to reduce cost and data volume while maintaining statistical significance.