OpenClaw with Local LLMs: Qwen + Ollama, Zero API Bills
The Setup That's Going Viral This Week
A setup tweet is making the rounds: OpenClaw + Qwen 3.5 via Ollama — local AI agents at Claude-level quality, no API bill, no cloud, no data leaving your machine.
166 retweets in a few hours. The reason? The question so many people are sitting on: *"Do I really have to pay Anthropic or OpenAI for every single agent call?"*
The answer: No.
This post shows exactly how it works — and what you need to watch out for in a multi-agent setup.
---
Why Local Models at All?
Three reasons that apply to most setups:
1. Cost. With a 6-agent team running heartbeats every 30 minutes, daily cron jobs, and active usage, API costs add up fast — easily €200–500/month. A local model costs: electricity.
2. Privacy. If your agents have access to emails, business data, and internal documents — you might not want that data flowing through a cloud provider. For sensitive setups, this isn't a nice-to-have, it's a requirement.
3. Latency. For simple tasks (read a file, check task status, write a short reply), a local 7B model is faster than a cloud API call with network latency.
---
What You Need
---
Step 1: Install Ollama and Pull a Model
```bash
# Install Ollama (macOS)
brew install ollama
# Install Ollama (Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull Qwen 3.5 7B (~4.7 GB download)
ollama pull qwen2.5:7b
# For more demanding tasks: 14B
ollama pull qwen2.5:14b
# Test: talk to the model directly
ollama run qwen2.5:7b "Hello, what can you do?"
```
If you get a response: Ollama is running. The model is ready.
Ollama exposes a local API at `http://localhost:11434`. That's the endpoint OpenClaw will use.
---
Step 2: Configure OpenClaw to Use the Local Model
OpenClaw supports multiple model providers — including the OpenAI-compatible API that Ollama exposes.
Check your current configuration:
```bash
openclaw config show
```
To switch the model to Ollama/Qwen:
```bash
# Set provider to Ollama-compatible API
openclaw config set model.provider openai-compatible
openclaw config set model.baseUrl http://localhost:11434/v1
openclaw config set model.name qwen2.5:7b
openclaw config set model.apiKey ollama
```
Important: Ollama doesn't need a real API key, but the field can't be empty. Any placeholder like `ollama` works.
Then restart the gateway:
```bash
openclaw gateway restart
```
---
Step 3: Verify the Agent Is Using the Local Model
```bash
# Check if agent is responding
openclaw sessions list
# Direct test session
openclaw sessions test
```
Alternatively: send a message to your configured channel (e.g. Telegram) and see if the agent responds. If yes — everything is running locally.
To confirm, you can check Ollama's logs:
```bash
# Ollama logs (Linux)
journalctl -u ollama -f
# macOS: Ollama runs in background, logs at
tail -f ~/.ollama/logs/server.log
```
If you see log entries when the agent responds: confirmed, everything's local.
---
Which Model for Which Task?
Not every model is equally suited for every job. Here's what we've learned from our 6-agent setup:
Qwen 2.5 7B — good for:
Weak at: long multi-step reasoning chains; writing complex code; ambiguous instructions.
Qwen 2.5 14B — good for:
Weak at: very long context windows (>32k tokens), subtle reasoning tasks that need GPT-4 or Claude.
Qwen 2.5 Coder 32B — for power users:
Requires at least 64 GB RAM though. Overkill for most setups.
---
Multi-Agent Setup: Different Models Per Agent
This is the killer feature of local models in a multi-agent system: each agent can use a different model.
In our setup:
| Agent | Task | Model |
|-------|------|-------|
| Sam (team lead) | Delegation, coordination | Claude Sonnet (cloud) |
| Peter (coding) | Code review, debugging | Qwen 2.5 Coder 7B (local) |
| Maya (marketing) | Blog posts, copy | Qwen 2.5 14B (local) |
| Alex (everyday tasks) | Emails, calendar | Qwen 2.5 7B (local) |
| Iris (research) | Web search, summaries | Qwen 2.5 14B (local) |
| Atlas (CEO assistant) | Direct assistance | Claude Sonnet (cloud) |
Result: cloud costs reduced to 2 agents that genuinely need complex reasoning. Everything else runs locally.
How to configure different models per agent:
Each agent has its own workspace. In that workspace's OpenClaw configuration, you can override the model:
```bash
# Inside a specific agent's workspace
openclaw config set model.name qwen2.5:7b
openclaw config set model.baseUrl http://localhost:11434/v1
```
Alternatively: set the model via environment variables per container (if you're using Docker).
---
Practical Limitations and How We Handle Them
Context Window
Ollama models have a smaller default context window than cloud APIs. With long conversations or large files, this can become a problem.
Solution: explicitly increase the context window in Ollama:
```bash
# Start model with larger context
OLLAMA_NUM_CTX=32768 ollama serve
```
Or define it in a Modelfile:
```
FROM qwen2.5:7b
PARAMETER num_ctx 32768
```
Tool Calling
Not all Ollama models support reliable tool calling. Qwen 2.5 is better here than most, but worse than Claude or GPT-4.
Practical rule: If a cron job needs to call multiple tools in parallel (e.g. email + calendar + ClickUp simultaneously), use a stronger model. For sequential single-tool calls, Qwen works fine.
Cold Start Latency
The first request after system startup loads the model into RAM — can take 10–30 seconds. After that: fast.
Solution: pre-warm Ollama at startup:
```bash
# Pre-load the model at startup (runs once)
ollama run qwen2.5:7b "" &
```
---
Docker + Ollama: The Production Setup
If you're running multiple agents in Docker containers (like we do), Ollama ideally runs on the host — not inside each container.
```yaml
# docker-compose.yml (excerpt)
services:
agent-maya:
image: openclaw/agent:latest
environment:
- OPENCLAW_MODEL_PROVIDER=openai-compatible
- OPENCLAW_MODEL_BASE_URL=http://host.docker.internal:11434/v1
- OPENCLAW_MODEL_NAME=qwen2.5:14b
- OPENCLAW_MODEL_API_KEY=ollama
volumes:
- ./workspaces/maya:/workspace
agent-alex:
image: openclaw/agent:latest
environment:
- OPENCLAW_MODEL_PROVIDER=openai-compatible
- OPENCLAW_MODEL_BASE_URL=http://host.docker.internal:11434/v1
- OPENCLAW_MODEL_NAME=qwen2.5:7b
- OPENCLAW_MODEL_API_KEY=ollama
volumes:
- ./workspaces/alex:/workspace
```
`host.docker.internal` is the hostname Docker containers use to reach the host machine. On Linux this can differ — check with `docker network inspect bridge`.
---
When Cloud APIs Are Still Worth It
Honestly: local models aren't the right choice in every situation.
Stick with cloud when:
Switch to local when:
The pragmatic approach: hybrid. Cloud for the thinking work, local for the routine work. That's exactly what our 6-agent team does.
---
The Full Setup
The complete picture — Docker configuration, multi-model setup, Tailscale security, and the exact system prompts for each agent — is documented in the OpenClaw Setup Playbook.
18 chapters, based on real production experience. Not a theoretical framework — the actual thing we run.
Fully available in German too. 🇩🇪
Want to learn more?
Our playbook contains 18 detailed chapters — available in English and German.
Get the Playbook