Skip to content

API reference

REST JSON API and MCP server. Authenticate with a bearer key in the Authorization header: create one under Settings → API once you've made an organization. Swap your-org-slug-goes-here and YOUR_TOKEN below for your own.

MCP Server

Connect Claude Code, Cursor, or any MCP client to manage prompts, runs, datasets, and metrics conversationally. 49 tools over streamable HTTP.

Claude Code
claude mcp add --transport http completion-kit \
  https://completionkit.com/orgs/your-org-slug-goes-here/mcp \
  --header "Authorization: Bearer YOUR_TOKEN"
{} Cursor / Generic
{
  "mcpServers": {
    "completion-kit": {
      "url": "https://completionkit.com/orgs/your-org-slug-goes-here/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_TOKEN"
      }
    }
  }
}

Available tools

prompts 7

prompts_list List all prompts
prompts_get Get a prompt by ID
prompts_create Create a prompt
prompts_update Update a prompt. If the prompt already has runs, this creates a new DRAFT version (current=false) rather than editing in place or publishing — promote it with prompts_publish — so an agent's edits don't go live without a gate. If it has no runs, it is updated in place.
prompts_delete Delete a prompt
prompts_publish Publish a prompt version, making it the current version
prompts_suggest_improvement Suggest an improved version of a prompt, grounded in a run's test results and judge feedback. Analyzes the run's responses, scores, and reviews, then returns reasoning plus a rewritten template (preserving {{variables}}) and persists it as a Suggestion. Requires a run that has a prompt (not a judge-only run).

runs 6

runs_list List all runs
runs_get Get a run by ID
runs_create Create a run. Omit prompt_id and provide output_column for a judge-only run that grades a pre-existing dataset column instead of generating new outputs.
runs_update Update a run
runs_delete Delete a run
runs_generate Generate responses for a run using its prompt and dataset

responses 2

responses_list List responses for a run
responses_get Get a specific response

datasets 6

datasets_list List all datasets
datasets_get Get a dataset by ID
datasets_create Create a dataset with CSV data
datasets_update Update a dataset
datasets_delete Delete a dataset
datasets_create_from_url Create a dataset by downloading CSV from a URL instead of inlining it. Use this for large datasets: pass a public http(s) URL and the server fetches the CSV directly, so the data never has to pass through the tool-call arguments. The URL is SSRF-checked and the download is capped at 10MB.

metrics 6

metrics_list List all metrics
metrics_get Get a metric by ID
metrics_create Create a metric with evaluation criteria
metrics_update Update a metric
metrics_delete Delete a metric
metrics_suggest_variants Ask the model to rewrite the metric's judge instruction in N variants targeted at the recent disagreements. Each variant is saved as a draft MetricVersion with source="suggestion". Returns the persisted drafts. Stripe-metering hooks fire via ActiveSupport::Notifications under completion_kit.judge_suggestion.generated.

metric 8

metric_groups_list List all metric groups
metric_groups_get Get a metric group by ID
metric_groups_create Create a metric group
metric_groups_update Update a metric group
metric_groups_delete Delete a metric group
metric_versions_list List every MetricVersion (drafts + published) for a metric, newest first. Each row carries version_number, state, source, current flag, and timestamps.
metric_versions_publish Publish a MetricVersion as the live version of its metric. Works for both 'draft → published' and 'revert to an older published version → current'. Transactionally flips current, demotes peers, and writes the version's instruction + rubric_bands back onto the metric so the judge grades against it.
metric_versions_dismiss Destroy a draft MetricVersion (use for either source: 'edit' or source: 'suggestion'). Published versions are refused — to demote a published version, publish a different one as current instead.

provider 5

provider_credentials_list List all provider credentials (API keys are not exposed)
provider_credentials_get Get a provider credential by ID (API key is not exposed)
provider_credentials_create Create a provider credential
provider_credentials_update Update a provider credential
provider_credentials_delete Delete a provider credential

tags 5

tags_list List all tags
tags_get Get a tag by ID
tags_create Create a tag. Color is auto-assigned.
tags_update Rename a tag.
tags_delete Delete a tag. Removes the tag from every linked metric, prompt, run, and dataset.

agreements 2

agreements_list List agreements. Filter by run_id, response_id, metric_id, or created_by.
agreements_create Upsert an agreement for (run, response, metric, created_by). Verdict is one of agree, disagree, borderline. corrected_score (1..5) is required when verdict is 'disagree'.

judges 2

judges_replay Run the current judge against a dataset (judge-only run). Wraps runs_create with prompt_id omitted and output_column supplied. Re-judges existing dataset outputs so you can compare against human verdicts.
judges_compare Compare two metric versions' agreement stats side by side. Pass either two metric_version_ids or one metric_id with metric_version_a_id / metric_version_b_id.

Prompts

Create, version, and manage LLM prompt templates.

GET /api/v1/prompts

List all prompts, ordered by most recent.

curl https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/prompts \
  -H "Authorization: Bearer YOUR_TOKEN"

POST /api/v1/prompts

Create a new prompt.

Required:name, template, llm_modelOptional:description

curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/prompts \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "summarizer", "template": "Summarize: {{text}}", "llm_model": "gpt-4.1"}'

GET /api/v1/prompts/:id

Get a single prompt by ID.

PATCH /api/v1/prompts/:id

Update a prompt. Accepts same params as create.

DELETE /api/v1/prompts/:id

Delete a prompt. Returns 204 No Content.

POST /api/v1/prompts/:id/publish

Publish a prompt version, making it the current version in its family.

Runs

Create runs, generate LLM responses, and judge them with metrics.

GET /api/v1/runs

List runs with response counts and average scores. Supports pagination (limit, offset) and the following filters.

Optional filters:status (pending, running, completed, failed), prompt_id, dataset_id, tag[]

POST /api/v1/runs

Create a new run.

Optional:name, prompt_id, dataset_id, metric_ids, judge_model, output_column (judge-only: omit prompt_id and grade a dataset column instead, default actual_output)

curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/runs \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"prompt_id": 1, "dataset_id": 1, "metric_ids": [1, 2]}'

GET /api/v1/runs/:id

Get a run with status, progress, response count, and average score.

curl https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/runs/1 \
  -H "Authorization: Bearer YOUR_TOKEN"

POST /api/v1/runs/:id/generate

Start generating responses. Returns 202 Accepted. Poll the run to check progress.

curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/runs/1/generate \
  -H "Authorization: Bearer YOUR_TOKEN"

POST /api/v1/runs/:id/retry_failures

Re-queue any responses that failed during generation. Returns 202 Accepted.

POST /api/v1/runs/:id/rerun

Clone the run and start generating responses on the copy against the current prompt and metric versions. Returns the new run with 201 Created. Useful for capturing a fresh baseline after metric edits.

curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/runs/1/rerun \
  -H "Authorization: Bearer YOUR_TOKEN"

POST /api/v1/runs/:id/regrade

Re-judge the existing successful responses against the current metric versions without regenerating model output. Returns 202 Accepted, or 422 if no responses are eligible.

GET /api/v1/runs/:id/compare?with=:other_id

Side-by-side comparison against another run. Returns {rows: [...], metric_ids: [...]} with one row per input case, per-metric scores on both sides, and the delta. Cases that exist on only one side are still returned with the missing side nulled out.

curl "https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/runs/1/compare?with=2" \
  -H "Authorization: Bearer YOUR_TOKEN"

PATCH /api/v1/runs/:id

Update a run. Accepts same params as create.

DELETE /api/v1/runs/:id

Delete a run and all its responses. Returns 204 No Content.

Responses

Read-only access to generated responses and their review scores. Nested under runs.

GET /api/v1/runs/:run_id/responses

List responses for a run, including nested review scores.

Optional filters:status (pending, succeeded, failed), plus limit and offset

curl https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/runs/1/responses \
  -H "Authorization: Bearer YOUR_TOKEN"

GET /api/v1/runs/:run_id/responses/:id

Get a single response with its review scores and feedback.

Datasets

Data used as input for runs.

GET /api/v1/datasets

List all datasets.

POST /api/v1/datasets

Create a dataset from inline CSV or an uploaded CSV file.

Required:name, and either csv_data (inline CSV) or a multipart file (CSV upload, preferred for large datasets)

curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/datasets \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "tickets", "csv_data": "text,expected_output\\nHello,Hi"}'
curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/datasets \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "name=tickets" \
  -F "file=@tickets.csv"

GETPATCHDELETE /api/v1/datasets/:id

Get, update, or delete a dataset.

Metrics

Scoring dimensions used by the judge model.

GET /api/v1/metrics

List all metrics.

POST /api/v1/metrics

Create a metric.

Required:nameOptional:instruction, rubric_bands (array of {stars, description})

curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/metrics \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "relevance", "instruction": "Is the response relevant?"}'

GETPATCHDELETE /api/v1/metrics/:id

Get, update, or delete a metric.

Agreement loop

Drive metric improvement from disagree-flagged agreements: ask the model to rewrite the instruction and rubric into a new draft version.

POST /api/v1/metrics/:id/suggest_variants

Generate draft metric versions from the current disagreements. Returns 201 with the new draft versions, 422 if no disagreements exist or the model produced nothing usable.

Optional:count, model

curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/metrics/1/suggest_variants \
  -H "Authorization: Bearer YOUR_TOKEN"

Metric versions

Every metric carries a history of versions (the current published one, prior published ones, and any draft suggestions). Reviews and agreements record the version they ran against, so the API can surface stale state and let you revert.

GET /api/v1/metrics/:metric_id/metric_versions

List every version for the metric, newest version_number first.

GET /api/v1/metrics/:metric_id/metric_versions/:id

Get a single version with its instruction, rubric bands, state, and source.

POST /api/v1/metrics/:metric_id/metric_versions/:id/publish

Publish the version as current. Works for a draft (promote) or a superseded published version (revert). Copies the version's instruction and rubric back onto the metric.

DELETE /api/v1/metrics/:metric_id/metric_versions/:id

Dismiss a draft version. Returns 204 No Content, or 409 Conflict if the version is published (published versions are immutable history).

Metric Groups

Named groups of metrics you can apply to a run as a set.

GET /api/v1/metric_groups

List all metric groups with their metric IDs.

POST /api/v1/metric_groups

Create a metric group.

Required:nameOptional:description, metric_ids (array)

GETPATCHDELETE /api/v1/metric_groups/:id

Get, update, or delete a metric group. PATCH with metric_ids replaces all metric associations.

Agreements

Per-verdict feedback events on a response/metric pair: agree, disagree (with a corrected score and note), or borderline. Agreements capture the metric version that was current when the verdict was cast, which is what drives the trust signal and the "stale" indicators across the rest of the API.

GET /api/v1/agreements

List agreements across all runs. Supports filtering by any combination of the query params below.

Optional filters:run_id, response_id, metric_id, metric_version_id, created_by, verdict (agree, disagree, or borderline)

curl "https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/agreements?metric_id=1&verdict=disagree" \
  -H "Authorization: Bearer YOUR_TOKEN"

POST /api/v1/runs/:run_id/responses/:response_id/metrics/:metric_id/agreements

Cast an agreement on a specific response/metric pair. The metric version on the record is set automatically from the run's review.

Required:verdict, created_byOptional:corrected_score, note

curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/runs/1/responses/42/metrics/3/agreements \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"verdict": "disagree", "corrected_score": 3, "note": "too generous", "created_by": "alice"}'

DELETE /api/v1/agreements/:id

Delete an agreement. Returns 204 No Content.

Tags

Domain labels you can attach to metrics, prompts, runs, and datasets. Tags are auto-assigned a color from a 10-color palette. Each index page can be filtered by one or more tags using ?tag[]=name query params (OR semantics).

GET /api/v1/tags

List all tags with name and color.

curl https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/tags \
  -H "Authorization: Bearer YOUR_TOKEN"

POST /api/v1/tags

Create a tag.

Required:name

curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/tags \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "real estate"}'

GETPATCHDELETE /api/v1/tags/:id

Get, update, or delete a tag. PATCH accepts name. DELETE returns 204 No Content and removes all taggings for this tag.

Tagging resources

Metrics, prompts, runs, and datasets accept a tag_names array on their create and update endpoints. Passing a name that does not yet exist silently creates the tag. On PATCH, the list replaces all existing tags for that record (omit the field to leave tags unchanged).

curl -X POST https://completionkit.com/orgs/your-org-slug-goes-here/api/v1/metrics \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "Accuracy", "tag_names": ["real estate"]}'

MCP tools

tags_listList all tags
tags_getGet a tag by ID
tags_createCreate a tag (name required)
tags_updateUpdate a tag's name
tags_deleteDelete a tag and remove all its taggings

The existing metrics_create, metrics_update, prompts_create, prompts_update, runs_create, runs_update, datasets_create, and datasets_update tools all accept a tag_names parameter with the same auto-create and replace semantics as the REST API.

Provider Credentials

LLM provider API keys. The api_key field is write-only and never returned in responses.

GET /api/v1/provider_credentials

List all provider credentials (api_key excluded).

POST /api/v1/provider_credentials

Create a provider credential.

Required:provider (openai, anthropic, ollama, openrouter), api_keyOptional:api_endpoint

GETPATCHDELETE /api/v1/provider_credentials/:id

Get, update, or delete a provider credential.