API reference
REST JSON API and MCP server. Authenticate with a bearer key in the Authorization header: create one under Settings → API once you've made an organization. Swap your-org-slug-goes-here and YOUR_TOKEN below for your own.
MCP Server
Connect Claude Code, Cursor, or any MCP client to manage prompts, runs, datasets, and metrics conversationally. 49 tools over streamable HTTP.
claude mcp add --transport http completion-kit \ https://completionkit.com/orgs/your-org-slug-goes-here/mcp \ --header "Authorization: Bearer YOUR_TOKEN"
{
"mcpServers": {
"completion-kit": {
"url": "https://completionkit.com/orgs/your-org-slug-goes-here/mcp",
"headers": {
"Authorization": "Bearer YOUR_TOKEN"
}
}
}
}
Available tools
prompts 7
prompts_list
List all prompts
prompts_get
Get a prompt by ID
prompts_create
Create a prompt
prompts_update
Update a prompt. If the prompt already has runs, this creates a new DRAFT version (current=false) rather than editing in place or publishing — promote it with prompts_publish — so an agent's edits don't go live without a gate. If it has no runs, it is updated in place.
prompts_delete
Delete a prompt
prompts_publish
Publish a prompt version, making it the current version
prompts_suggest_improvement
Suggest an improved version of a prompt, grounded in a run's test results and judge feedback. Analyzes the run's responses, scores, and reviews, then returns reasoning plus a rewritten template (preserving {{variables}}) and persists it as a Suggestion. Requires a run that has a prompt (not a judge-only run).
runs 6
runs_list
List all runs
runs_get
Get a run by ID
runs_create
Create a run. Omit prompt_id and provide output_column for a judge-only run that grades a pre-existing dataset column instead of generating new outputs.
runs_update
Update a run
runs_delete
Delete a run
runs_generate
Generate responses for a run using its prompt and dataset
responses 2
responses_list
List responses for a run
responses_get
Get a specific response
datasets 6
datasets_list
List all datasets
datasets_get
Get a dataset by ID
datasets_create
Create a dataset with CSV data
datasets_update
Update a dataset
datasets_delete
Delete a dataset
datasets_create_from_url
Create a dataset by downloading CSV from a URL instead of inlining it. Use this for large datasets: pass a public http(s) URL and the server fetches the CSV directly, so the data never has to pass through the tool-call arguments. The URL is SSRF-checked and the download is capped at 10MB.
metrics 6
metrics_list
List all metrics
metrics_get
Get a metric by ID
metrics_create
Create a metric with evaluation criteria
metrics_update
Update a metric
metrics_delete
Delete a metric
metrics_suggest_variants
Ask the model to rewrite the metric's judge instruction in N variants targeted at the recent disagreements. Each variant is saved as a draft MetricVersion with source="suggestion". Returns the persisted drafts. Stripe-metering hooks fire via ActiveSupport::Notifications under completion_kit.judge_suggestion.generated.
metric 8
metric_groups_list
List all metric groups
metric_groups_get
Get a metric group by ID
metric_groups_create
Create a metric group
metric_groups_update
Update a metric group
metric_groups_delete
Delete a metric group
metric_versions_list
List every MetricVersion (drafts + published) for a metric, newest first. Each row carries version_number, state, source, current flag, and timestamps.
metric_versions_publish
Publish a MetricVersion as the live version of its metric. Works for both 'draft → published' and 'revert to an older published version → current'. Transactionally flips current, demotes peers, and writes the version's instruction + rubric_bands back onto the metric so the judge grades against it.
metric_versions_dismiss
Destroy a draft MetricVersion (use for either source: 'edit' or source: 'suggestion'). Published versions are refused — to demote a published version, publish a different one as current instead.
provider 5
provider_credentials_list
List all provider credentials (API keys are not exposed)
provider_credentials_get
Get a provider credential by ID (API key is not exposed)
provider_credentials_create
Create a provider credential
provider_credentials_update
Update a provider credential
provider_credentials_delete
Delete a provider credential
tags 5
tags_list
List all tags
tags_get
Get a tag by ID
tags_create
Create a tag. Color is auto-assigned.
tags_update
Rename a tag.
tags_delete
Delete a tag. Removes the tag from every linked metric, prompt, run, and dataset.
agreements 2
agreements_list
List agreements. Filter by run_id, response_id, metric_id, or created_by.
agreements_create
Upsert an agreement for (run, response, metric, created_by). Verdict is one of agree, disagree, borderline. corrected_score (1..5) is required when verdict is 'disagree'.
judges 2
judges_replay
Run the current judge against a dataset (judge-only run). Wraps runs_create with prompt_id omitted and output_column supplied. Re-judges existing dataset outputs so you can compare against human verdicts.
judges_compare
Compare two metric versions' agreement stats side by side. Pass either two metric_version_ids or one metric_id with metric_version_a_id / metric_version_b_id.
Prompts
Create, version, and manage LLM prompt templates.
GET /api/v1/prompts
curl https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/prompts \ -H "Authorization: Bearer YOUR_TOKEN"
POST /api/v1/prompts
Required: name, template, llm_model Optional: description
curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/prompts \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "summarizer", "template": "Summarize: {{text}}", "llm_model": "gpt-4.1"}'
GET /api/v1/prompts/:id
PATCH /api/v1/prompts/:id
DELETE /api/v1/prompts/:id
POST /api/v1/prompts/:id/publish
Runs
Create runs, generate LLM responses, and judge them with metrics.
GET /api/v1/runs
Optional filters: status (pending, running, completed, failed), prompt_id, dataset_id, tag[]
POST /api/v1/runs
Optional: name, prompt_id, dataset_id, metric_ids, judge_model, output_column (judge-only: omit prompt_id and grade a dataset column instead, default actual_output)
curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/runs \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"prompt_id": 1, "dataset_id": 1, "metric_ids": [1, 2]}'
GET /api/v1/runs/:id
curl https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/runs/1 \ -H "Authorization: Bearer YOUR_TOKEN"
POST /api/v1/runs/:id/generate
curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/runs/1/generate \ -H "Authorization: Bearer YOUR_TOKEN"
POST /api/v1/runs/:id/retry_failures
POST /api/v1/runs/:id/rerun
curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/runs/1/rerun \ -H "Authorization: Bearer YOUR_TOKEN"
POST /api/v1/runs/:id/regrade
GET /api/v1/runs/:id/compare?with=:other_id
curl "https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/runs/1/compare?with=2" \ -H "Authorization: Bearer YOUR_TOKEN"
PATCH /api/v1/runs/:id
DELETE /api/v1/runs/:id
Responses
Read-only access to generated responses and their review scores. Nested under runs.
GET /api/v1/runs/:run_id/responses
Optional filters: status (pending, succeeded, failed), plus limit and offset
curl https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/runs/1/responses \ -H "Authorization: Bearer YOUR_TOKEN"
GET /api/v1/runs/:run_id/responses/:id
Datasets
Data used as input for runs.
GET /api/v1/datasets
POST /api/v1/datasets
Required: name, and either csv_data (inline CSV) or a multipart file (CSV upload, preferred for large datasets)
curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/datasets \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "tickets", "csv_data": "text,expected_output\\nHello,Hi"}'
curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/datasets \ -H "Authorization: Bearer YOUR_TOKEN" \ -F "name=tickets" \ -F "file=@tickets.csv"
GET PATCH DELETE /api/v1/datasets/:id
Metrics
Scoring dimensions used by the judge model.
GET /api/v1/metrics
POST /api/v1/metrics
Required: name Optional: instruction, rubric_bands (array of {stars, description})
curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/metrics \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "relevance", "instruction": "Is the response relevant?"}'
GET PATCH DELETE /api/v1/metrics/:id
Agreement loop
POST /api/v1/metrics/:id/suggest_variants
Optional: count, model
curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/metrics/1/suggest_variants \ -H "Authorization: Bearer YOUR_TOKEN"
Metric versions
GET /api/v1/metrics/:metric_id/metric_versions
GET /api/v1/metrics/:metric_id/metric_versions/:id
POST /api/v1/metrics/:metric_id/metric_versions/:id/publish
DELETE /api/v1/metrics/:metric_id/metric_versions/:id
Metric Groups
Named groups of metrics you can apply to a run as a set.
GET /api/v1/metric_groups
POST /api/v1/metric_groups
Required: name Optional: description, metric_ids (array)
GET PATCH DELETE /api/v1/metric_groups/:id
Agreements
Per-verdict feedback events on a response/metric pair: agree, disagree (with a corrected score and note), or borderline. Agreements capture the metric version that was current when the verdict was cast, which is what drives the trust signal and the "stale" indicators across the rest of the API.
GET /api/v1/agreements
Optional filters: run_id, response_id, metric_id, metric_version_id, created_by, verdict (agree, disagree, or borderline)
curl "https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/agreements?metric_id=1&verdict=disagree" \ -H "Authorization: Bearer YOUR_TOKEN"
POST /api/v1/runs/:run_id/responses/:response_id/metrics/:metric_id/agreements
Required: verdict, created_by Optional: corrected_score, note
curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/runs/1/responses/42/metrics/3/agreements \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"verdict": "disagree", "corrected_score": 3, "note": "too generous", "created_by": "alice"}'
DELETE /api/v1/agreements/:id
Tags
Domain labels you can attach to metrics, prompts, runs, and datasets. Tags are auto-assigned a color from a 10-color palette. Each index page can be filtered by one or more tags using ?tag[]=name query params (OR semantics).
GET /api/v1/tags
curl https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/tags \ -H "Authorization: Bearer YOUR_TOKEN"
POST /api/v1/tags
Required: name
curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/tags \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "real estate"}'
GET PATCH DELETE /api/v1/tags/:id
Tagging resources
curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/metrics \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "Accuracy", "tag_names": ["real estate"]}'
MCP tools
tags_listList all tagstags_getGet a tag by IDtags_createCreate a tag (name required)tags_updateUpdate a tag's nametags_deleteDelete a tag and remove all its taggingsProvider Credentials
LLM provider API keys. The api_key field is write-only and never returned in responses.
GET /api/v1/provider_credentials
POST /api/v1/provider_credentials
Required: provider (openai, anthropic, ollama, openrouter), api_key Optional: api_endpoint
GET PATCH DELETE /api/v1/provider_credentials/:id
