Skip to content

CompletionKit vs Langfuse

Langfuse is the open-source observability platform for LLM and agent apps: deep tracing, evals, dashboards, self-host at scale. CompletionKit is the focused prompt-improvement loop: it scores whether to trust the judge, drafts the next revision, and serves the winning prompt to your app.

Start free Read the docs

Last updated June 18, 2026.

The short version

Langfuse is a genuinely open-source LLM engineering platform, MIT-licensed at the core and self-hostable for free, whose lead capability is observability: nested traces and spans for multi-step chains and agents, token and latency capture, sessions, dashboards, built on OpenTelemetry with a wide range of SDK and framework integrations. It also does prompt management, datasets, experiments, and managed LLM-as-a-judge evals, and it natively computes human-vs-judge agreement stats. In January 2026 it was acquired by ClickHouse, a database and analytics company, alongside a $400M Series D; both sides say the MIT core and Langfuse Cloud carry on unchanged.

CompletionKit does the improvement loop and not much else on purpose: an LLM judge with a computed trust score, suggested next revisions drafted from the judge's feedback, and prompt versioning and serving. It is source-available under BSL 1.1, self-hostable in production on any plan, and independent. It is not an observability platform and is not trying to be, so if deep production tracing is what you need, Langfuse is the better fit.

Side by side

Langfuse CompletionKit
Primary focus Observability + tracing Prompt-improvement loop
Multi-provider (OpenAI, Anthropic, local)
LLM-as-a-judge scoring
Judge calibration / trust score
AI-suggested prompt revisions External
Versioned prompts served to your app
Tracing / observability depth
MCP server
Hosted runs + history
Free + self-hostable Self-host
License MIT (open source) BSL 1.1 (source-available)

Both compute human-vs-judge agreement natively: Langfuse added Pearson, Cohen kappa, and F1 between any two score sources in November 2025; CompletionKit shows a Wilson interval and a quadratic-weighted kappa. The difference is the loop around it. Langfuse has no in-product prompt optimizer; its 2026 workflow drives revisions through an external Claude Agent Skill over the CLI, API, or MCP, hence "External". CompletionKit drafts the next revision in-product from the judge's per-row feedback. On hosting: Langfuse's MIT core is free to self-host at scale, while CompletionKit's source-available engine is self-hostable in production on any plan but is not open source.

Which one fits

Reach for Langfuse if you need deep, production-grade observability of complex multi-step chains and agents: nested traces, sessions, token and latency capture across a wide range of frameworks, team dashboards, and the option to self-host the MIT core at high volume on your own infrastructure. It is mature, widely adopted, and now backed by ClickHouse's data infrastructure. If tracing and observability are first-class needs, or you must keep all data in-house at scale, it is the stronger choice.

Reach for CompletionKit if you want the improvement loop without the platform around it: a judge you can trust because the agreement is scored, the next revision drafted in-product from the judge's feedback rather than handed to an external CLI, and prompts served to your app so non-coders can edit them safely. It is source-available and self-hostable, and it stays focused on evaluate, improve, ship. Less surface to learn.

They are not either-or. Plenty of teams run Langfuse for production observability and tracing, and use CompletionKit for the tight prompt-improvement loop and in-product revision drafting.

Questions

Is Langfuse really open source, and is CompletionKit?
Langfuse is genuinely open source: its core, including tracing, evals, prompt management, experiments, annotation, and the playground, is MIT-licensed with no usage limits, and you can self-host it for free. Only a few enterprise security features (SCIM, audit logging, data retention policies) need a commercial license key, and even those ship as readable source. CompletionKit is not open source. Its engine is source-available under BSL 1.1, which means you can read it and self-host it in production, but the license is more restrictive than MIT. We say so plainly rather than borrow the label.
What does CompletionKit do that Langfuse doesn't?
Mainly two things. It drafts the next prompt revision for you in-product, reading the judge's per-row feedback and proposing a rewrite; Langfuse has no native prompt optimizer, and its current workflow drives revisions through an external Claude Agent Skill over the CLI, API, or MCP, with an in-product version still an open feature request. And the whole product is the evaluate, improve, ship loop rather than a feature beside a large observability platform, so there is less surface to wire up before you are improving a prompt. Note that both tools natively compute human-vs-judge agreement, so that is not a difference between them.
Does CompletionKit have tracing and observability like Langfuse?
No, and that is a real Langfuse strength. Langfuse gives you deep, LLM-native tracing: nested traces and spans for multi-step chains and agents, token and latency capture, sessions, and shareable team dashboards, built on OpenTelemetry with broad framework coverage. CompletionKit is focused on the improvement loop and does not do production observability. If you need tracing, use Langfuse, or pair the two.
Langfuse was acquired by ClickHouse. Is it still independent?
ClickHouse acquired Langfuse in January 2026 alongside a $400M Series D. Worth noting that ClickHouse is a database and analytics company, not a model lab, so this is not the case of an evaluation tool being owned by a model vendor it tests against. Both sides have said the MIT core stays open source, Langfuse Cloud continues as a standalone service, and nothing changes for existing deployments. Those are stated commitments rather than an enforceable guarantee, and stewardship now sits with a single acquiring company. CompletionKit is independent and source-available, and you can self-host it so your eval data and prompts stay on your own infrastructure. If that matters to you it is a point in our favor, but we are not going to overstate it given who the acquirer is.
Can I use both?
Yes. They are not mutually exclusive. Run Langfuse for production observability and tracing, and use CompletionKit for the focused improvement loop, the trust-scored judge, and in-product revision drafting. Langfuse maintains a broad set of integrations, so the two slot together without much friction.

Start free See pricing