Run a self-hosted agent
Overview
A Keikaku agent is a small worker you run on your own machine. It connects out to your Keikaku cloud, claims tasks, and runs them against a local model on your GPU — your code and models never leave your hardware.
Architecture
Three pieces: your server (your own machine, with the GPU), the Docker agents you run on it, and Keikaku Cloud. Ollama runs natively on your server and is shared by every agent — the model loads into your GPU once. The agents make outbound HTTPS calls to Keikaku Cloud to claim tasks and report results; nothing reaches in, and your code and models never leave your server.
Docker containers — scale up to add workers
HTTPS only
Most setups run a single agent against one large model — that's the sweet spot. Add more agents only if your GPU has headroom for the extra concurrent load; they all share the one model in VRAM.
Prerequisites
Three things on the machine that will run the agent:
- Docker — Docker Desktop on Windows, or Docker Engine on Linux.
- Ollama — installed natively on the host (not in a container). It owns the GPU and is shared by every agent.
- An NVIDIA GPU + recent driver — for real throughput. No GPU? Ollama falls back to CPU for small models — fine for trying it out, much slower.
macOS isn't supported yet.
Model sizing
Pick the largest model that still fits entirely in your GPU's VRAM — once Ollama has to offload part of a model to the CPU, throughput drops sharply. Two ways to choose:
- Measure it (recommended) — run the benchmark. It sweeps the catalog on your actual GPU, tells you which models fit and how fast each runs, and hands you a setup code that prefills the recommended model when you create your agent.
- Eyeball it — see choosing a model for a VRAM-to-model guide and the CPU fallback.
Download options
Pick your platform, then size your model. Each guide covers install, run, and verify.
- Windows Docker Desktop + Ollama on an NVIDIA GPU.
- Linux Docker Engine + Ollama, with the host-gateway note.
- Choosing a model What fits your VRAM, and the CPU fallback.
- Benchmark your GPU Measure real models, get a recommendation + setup code.
Setup steps
- Install the prerequisites — Docker + Ollama + GPU drivers (see your OS guide above).
- Pick a model — run the benchmark to measure your GPU and get a recommended model + a setup code (optional, but it takes the guesswork out).
- Create an agent in the app — app.keikaku.ai → Agents → New agent. Paste the setup code, and you get a ready-to-run command with your connection token and model baked in.
- Run it — paste the command; the agent connects and shows up as online.