Benchmark your GPU
The benchmark is a small tool you run once on the same machine as Ollama. It measures real coding models against your hardware — speed, and whether each one fits fully in your GPU — then recommends one and gives you a setup code that fills in the model when you create your agent.
How it works
It's a container that talks to your host Ollama over HTTP — the same way an agent does. For each model in the catalog it:
- Measures throughput — runs a fixed tiny coding prompt and records real tokens/sec and latency.
- Checks the fit — asks Ollama whether the model loaded entirely into VRAM. If any of it spilled to the CPU, that's flagged as a poor fit (it'll be slow).
- Smoke-tests the output — a light check that the model actually produced working code, for an indicative quality score.
It then recommends the best blend of quality and speed that fits your GPU, and emits a
kk1- setup code. Your GPU's name and VRAM are detected automatically via
nvidia-smi when you run it with GPU access (below) — you don't type them in.
Prerequisites are the same as running an agent — Docker and Ollama installed, with a model or two available. If you haven't set those up, do the Windows or Linux guide first.
Run it
Run it in the foreground so you can see it start, then open the URL it prints. It serves on
http://localhost:9190 and groups under the keikaku project in
Docker Desktop alongside your agents.
Windows (Docker Desktop)
docker run --rm -p 9190:9190 --gpus all ^
--label com.docker.compose.project=keikaku ^
-e OLLAMA_URL=http://host.docker.internal:11434 ^
ghcr.io/keikaku-ai/benchmark:latest
Linux (Docker Engine)
docker run --rm -p 9190:9190 --gpus all \
--label com.docker.compose.project=keikaku \
--add-host=host.docker.internal:host-gateway \
-e OLLAMA_URL=http://host.docker.internal:11434 \
ghcr.io/keikaku-ai/benchmark:latest
Prefer a file? The self-hosted bundle ships
deploy/selfhosted/benchmark.compose.yml — then it's just:
docker compose -f deploy/selfhosted/benchmark.compose.yml up
About --gpus all: it's only so the tool can read your GPU's
name and VRAM. On Linux it needs the
NVIDIA Container Toolkit;
on Docker Desktop (WSL2) it works with the NVIDIA driver. No GPU access? Drop the
--gpus all flag — the benchmark still runs and still tells you which models
fit; it just won't label the exact GPU.
Get your recommendation
- Open http://localhost:9190 and click Run.
- Watch the table fill in as it measures each model — tokens/sec, fits, and an indicative quality score.
- When it finishes you get a recommended model and a setup code starting with
kk1-. Copy it.
First run can download a lot. By default the benchmark pulls catalog
models you don't already have so it can measure them — tens of GB for the larger ones. To
measure only models already installed in Ollama, add -e BENCHMARK_PULL=0.
Use the setup code
In app.keikaku.ai → Agents → New agent,
paste the kk1- code into the Setup code field. It prefills the
recommended model, so the docker run command you get back already targets the
model that fits your GPU. Then follow the
self-hosted agent guide to run it.