Run an agent on Linux

Ollama runs natively on the host (it uses your NVIDIA GPU directly — no nvidia-container-toolkit needed), and the agent runs as a Docker container that talks to it. The one Linux-specific step is mapping host.docker.internal.

Before you start

Minimum requirements — check these first, the rest of the guide assumes them:

A current 64-bit distro with systemd — e.g. Ubuntu 22.04+, Debian 12+, Fedora. You'll need sudo access.
A supported GPU — an NVIDIA GeForce GTX 900 series (2014) or newer with driver 525+ (anything your distro currently packages is fine), or a recent AMD Radeon (RX 6000 / 7000 series) via ROCm. Step 1 covers NVIDIA; AMD, Intel and CPU-only are in the notes below.
Docker Engine 20.10 or newer — required for the host-gateway flag used later; any version from the last few years qualifies.
~20 GB free disk space — models are large (a 7B model is ~5 GB; bigger ones are tens of GB).

Not on an NVIDIA card? AMD Radeon mostly works, Intel doesn't — see AMD, Intel and CPU-only below before you start.

Prerequisites

1. GPU driver

On an AMD Radeon? You can skip this step — the Ollama install script in step 2 pulls the ROCm components and the amdgpu kernel driver ships with any modern kernel. See AMD, Intel and CPU-only. The rest of this step is for NVIDIA.

You may already have a driver — the question is whether it's new enough. Check your version with this command — it prints just the number:

nvidia-smi --query-gpu=driver_version --format=csv,noheader

You'll see a single version like 550.127.05. If it's 525 or higher, you're done — go to step 2. You do not need the very latest driver.

Got a "command not found" error? No driver is installed. On Ubuntu, install the recommended one and reboot:
```
sudo ubuntu-drivers install
sudo reboot
```
On other distros, install your distro's proprietary NVIDIA driver package (e.g. akmod-nvidia on Fedora), or see nvidia.com/drivers.

(If you ran plain nvidia-smi and got a screenful of numbers — that's the full GPU status table. The only part that matters here is Driver Version: in its top line; the command above prints just that.)

Because Ollama runs on the host (not in a container), you do not need the NVIDIA Container Toolkit, and no CUDA toolkit either — Ollama bundles what it uses.

2. Ollama

Install the latest with the official script (any recent version works), then confirm it's installed:

curl -fsSL https://ollama.com/install.sh | sh
ollama --version

That's all you need here — you don't need to download a model yet, and you don't have to pick one by hand. Run the benchmark to get one recommended for your GPU, or see choosing a model for the sizing guide; the agent then pulls whatever model you chose the first time it runs (the model lives in host Ollama, shared by every agent and loaded into VRAM once — it's deliberately not baked into the small agent image).

By default Ollama listens on 127.0.0.1:11434. So the container can reach it, bind it on all interfaces — set OLLAMA_HOST=0.0.0.0 for the service:

sudo mkdir -p /etc/systemd/system/ollama.service.d
printf '[Service]\nEnvironment="OLLAMA_HOST=0.0.0.0"\n' | \
  sudo tee /etc/systemd/system/ollama.service.d/override.conf
sudo systemctl daemon-reload && sudo systemctl restart ollama

See choosing a model for what fits your VRAM.

3. Docker Engine

Install from docs.docker.com/engine/install. Verify:

docker --version

You should see something like Docker version 27.x — anything 20.10 or newer is fine.

Create your agent

You create the agent in the portal first — that's where the connection token comes from. You never invent or copy a token by hand; the portal generates the whole command for you.

Open app.keikaku.ai → Agents → New agent.
Give it a name, choose Self-hosted and a model (or paste a benchmark code), then click Create.
The next screen shows a ready-to-run docker run command with your token and model already baked in, and a Copy button — copy it.
Paste it into your terminal and run it. The agent dials home on its own.

For reference (the portal fills in the real AGENT_TOKEN); on Linux it includes the --add-host line so the container can reach host Ollama. There's no model in the command — the agent fetches the model you chose from the portal on connect and downloads it then, reporting progress back to the portal:

docker run -d --pull=always --name keikaku-agent --label com.docker.compose.project=keikaku --restart unless-stopped \
  --add-host=host.docker.internal:host-gateway \
  -e API_BASE_URL=https://api.keikaku.ai \
  -e AGENT_TOKEN=<from the app> \
  -e OLLAMA_URL=http://host.docker.internal:11434 \
  -p 9170:9170 \
  ghcr.io/keikaku-ai/agent:latest

What these values are — the portal sets all of them; here's what they mean:

AGENT_TOKEN — the per-agent key from the create screen. It identifies this agent to Keikaku; keep it secret (you can rotate it later).
API_BASE_URL — the Keikaku server it connects to (https://api.keikaku.ai for Cloud).
OLLAMA_URL — where the agent finds the model runtime: the Ollama you installed in step 2, running on your host at localhost:11434. From inside the container the host is host.docker.internal (the --add-host line below makes that resolve on Linux), so this stays http://host.docker.internal:11434. Verify Ollama is up: curl http://localhost:11434 should return "Ollama is running".
Model — not in the command. You pick it when you create the agent; the agent receives it on connect and pulls it into Ollama, reporting download progress to the portal. (-e MODEL=… still works as a manual override.)

You don't pass any GPU flag to the agent — Ollama owns the GPU; the agent just talks to it over OLLAMA_URL.

Linux difference: unlike Docker Desktop, plain Docker Engine doesn't provide host.docker.internal automatically — the --add-host=host.docker.internal:host-gateway flag (already in the app's Linux snippet) is what makes it resolve to the host.

Verify it connected

The agent shows as online in the app under Agents; the local dashboard is at http://localhost:9170. Logs:

docker logs -f keikaku-agent

Not on NVIDIA? AMD, Intel and CPU-only

This guide is written and tested against NVIDIA — that's the path we support best. Here's how the picture changes on other hardware:

AMD Radeon — works on recent cards. Ollama supports roughly the RX 6000 / 7000 series and several Radeon PRO cards via ROCm. The install script in step 2 detects the card and downloads the ROCm components for you, and the amdgpu kernel driver those cards need ships with any modern kernel — so the only change is skipping step 1 entirely. To confirm the GPU is actually being used, run a model (step 2), then in a second terminal run ollama ps — the PROCESSOR column should say 100% GPU. The exact supported-card list is in Ollama's GPU docs; unsupported Radeon cards silently fall back to CPU.
Intel (Arc or integrated graphics) — not supported. Ollama has no Intel GPU backend today, so models run on the CPU instead.
No dedicated GPU — works, slowly. Everything in this guide still functions on CPU with a small model (7B class). Fine for trying Keikaku out; expect responses to be many times slower than on a GPU.

Run multiple agents

One shared Ollama, one model in VRAM, N workers:

docker compose up -d --scale agent=3

Update / stop

docker pull ghcr.io/keikaku-ai/agent:latest && docker restart keikaku-agent
docker stop keikaku-agent      # stop
docker rm -f keikaku-agent     # remove

Headless server, no host Ollama? The app's Compose bundle has an optional gpu-ollama profile that runs Ollama in a container with GPU passthrough — for that you do need the NVIDIA Container Toolkit. Host Ollama is the simpler default.

What the agent does: it executes work generated by your models — writing files and running build/test commands inside its own container and workspace. Outbound HTTPS only (no inbound ports).