Run a self-hosted agent

Overview

A Keikaku agent is a small worker you run on your own machine. It connects out to your Keikaku cloud, claims tasks, and runs them against a local model on your GPU — your code and models never leave your hardware.

Architecture

Three pieces: your server (your own machine, with the GPU), the Docker agents you run on it, and Keikaku Cloud. Ollama runs natively on your server and is shared by every agent — the model loads into your GPU once. The agents make outbound HTTPS calls to Keikaku Cloud to claim tasks and report results; nothing reaches in, and your code and models never leave your server.

Your server

Ollama · your GPU one model, loaded once — shared

▲ agents query the local model

agent agent agent

Docker containers — scale up to add workers

outbound
HTTPS only

Keikaku Cloud

api.keikaku.ai projects · tasks · review

Most setups run a single agent against one large model — that's the sweet spot. Add more agents only if your GPU has headroom for the extra concurrent load; they all share the one model in VRAM.

Prerequisites

Three things on the machine that will run the agent:

Docker — Docker Desktop on Windows, or Docker Engine on Linux.
Ollama — installed natively on the host (not in a container). It owns the GPU and is shared by every agent.
A GPU + driver — for real throughput. Our guides document NVIDIA end to end (GTX 900 series or newer); recent AMD Radeon cards also work via Ollama — each OS guide has a section on exactly what changes. No GPU? Ollama falls back to CPU for small models — fine for trying it out, much slower.

macOS isn't supported yet.

Model sizing

Pick the largest model that still fits entirely in your GPU's VRAM — once Ollama has to offload part of a model to the CPU, throughput drops sharply. Two ways to choose:

Measure it (recommended) — run the benchmark. It sweeps the catalog on your actual GPU, tells you which models fit and how fast each runs, and hands you a setup code that prefills the recommended model when you create your agent.
Eyeball it — see choosing a model for a VRAM-to-model guide and the CPU fallback.

Setup guides

The detailed install steps live in the platform guides. Each one is written for someone starting from a blank machine: minimum versions, how to check and update your GPU driver, install + verify each piece, and what to do if you're not on an NVIDIA card. They take you all the way to an agent showing online in the app.

Create an agent →