← Docs

Benchmark your GPU

The benchmark is a small tool you run once on the same machine as Ollama. It measures real coding models against your hardware — speed, and whether each one fits fully in your GPU — then recommends one and gives you a setup code that fills in the model when you create your agent.

How it works

It's a container that talks to your host Ollama over HTTP — the same way an agent does. For each model in the catalog it:

It then recommends the best blend of quality and speed that fits your GPU, and emits a kk1- setup code. Your GPU's name and VRAM are detected automatically via nvidia-smi when you run it with GPU access (below) — you don't type them in.

The benchmark web UI showing the model sweep results table, the recommended model, and the kk1- setup code.
The benchmark UI after a run — measured models, the recommendation, and the setup code to copy.

Prerequisites are the same as running an agent — Docker and Ollama installed, with a model or two available. If you haven't set those up, do the Windows or Linux guide first.

Run it

Run it in the foreground so you can see it start, then open the URL it prints. It serves on http://localhost:9190 and groups under the keikaku project in Docker Desktop alongside your agents.

Windows (Docker Desktop)

docker run --rm -p 9190:9190 --gpus all ^
  --label com.docker.compose.project=keikaku ^
  -e OLLAMA_URL=http://host.docker.internal:11434 ^
  ghcr.io/keikaku-ai/benchmark:latest

Linux (Docker Engine)

docker run --rm -p 9190:9190 --gpus all \
  --label com.docker.compose.project=keikaku \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_URL=http://host.docker.internal:11434 \
  ghcr.io/keikaku-ai/benchmark:latest

Prefer a file? The self-hosted bundle ships deploy/selfhosted/benchmark.compose.yml — then it's just:

docker compose -f deploy/selfhosted/benchmark.compose.yml up

About --gpus all: it's only so the tool can read your GPU's name and VRAM. On Linux it needs the NVIDIA Container Toolkit; on Docker Desktop (WSL2) it works with the NVIDIA driver. No GPU access? Drop the --gpus all flag — the benchmark still runs and still tells you which models fit; it just won't label the exact GPU.

Get your recommendation

  1. Open http://localhost:9190 and click Run.
  2. Watch the table fill in as it measures each model — tokens/sec, fits, and an indicative quality score.
  3. When it finishes you get a recommended model and a setup code starting with kk1-. Copy it.

First run can download a lot. By default the benchmark pulls catalog models you don't already have so it can measure them — tens of GB for the larger ones. To measure only models already installed in Ollama, add -e BENCHMARK_PULL=0.

Use the setup code

In app.keikaku.aiAgents → New agent, paste the kk1- code into the Setup code field. It prefills the recommended model, so the docker run command you get back already targets the model that fits your GPU. Then follow the self-hosted agent guide to run it.

Create an agent →