2026-03-038 min

OpenClaw with Local LLMs: Qwen + Ollama, Zero API Bills

OllamaLocal LLMQwenNo API BillsSelf-Hosted

The Setup That's Going Viral This Week

A setup tweet is making the rounds: OpenClaw + Qwen 3.5 via Ollama — local AI agents at Claude-level quality, no API bill, no cloud, no data leaving your machine.

166 retweets in a few hours. The reason? The question so many people are sitting on: *"Do I really have to pay Anthropic or OpenAI for every single agent call?"*

The answer: No.

This post shows exactly how it works — and what you need to watch out for in a multi-agent setup.

---

Why Local Models at All?

Three reasons that apply to most setups:

1. Cost. With a 6-agent team running heartbeats every 30 minutes, daily cron jobs, and active usage, API costs add up fast — easily €200–500/month. A local model costs: electricity.

2. Privacy. If your agents have access to emails, business data, and internal documents — you might not want that data flowing through a cloud provider. For sensitive setups, this isn't a nice-to-have, it's a requirement.

3. Latency. For simple tasks (read a file, check task status, write a short reply), a local 7B model is faster than a cloud API call with network latency.

---

What You Need

Hardware: Minimum 16 GB RAM. For Qwen 3.5 7B: workable. For the 14B model: 32 GB recommended. GPU optional but helpful (Apple Silicon M-chips or NVIDIA).

Ollama: Free, open source, runs on macOS / Linux / Windows.

OpenClaw: Already installed (if not: `npm install -g openclaw`).

A Qwen model: We recommend `qwen2.5:7b` to start, or `qwen2.5:14b` for more demanding agent tasks.

---

Step 1: Install Ollama and Pull a Model

```bash

# Install Ollama (macOS)

brew install ollama

# Install Ollama (Linux)

curl -fsSL https://ollama.com/install.sh | sh

# Pull Qwen 3.5 7B (~4.7 GB download)

ollama pull qwen2.5:7b

# For more demanding tasks: 14B

ollama pull qwen2.5:14b

# Test: talk to the model directly

ollama run qwen2.5:7b "Hello, what can you do?"

```

If you get a response: Ollama is running. The model is ready.

Ollama exposes a local API at `http://localhost:11434`. That's the endpoint OpenClaw will use.

---

Step 2: Configure OpenClaw to Use the Local Model

OpenClaw supports multiple model providers — including the OpenAI-compatible API that Ollama exposes.

Check your current configuration:

```bash

openclaw config show

```

To switch the model to Ollama/Qwen:

```bash

# Set provider to Ollama-compatible API

openclaw config set model.provider openai-compatible

openclaw config set model.baseUrl http://localhost:11434/v1

openclaw config set model.name qwen2.5:7b

openclaw config set model.apiKey ollama

```

Important: Ollama doesn't need a real API key, but the field can't be empty. Any placeholder like `ollama` works.

Then restart the gateway:

```bash

openclaw gateway restart

```

---

Step 3: Verify the Agent Is Using the Local Model

```bash

# Check if agent is responding

openclaw sessions list

# Direct test session

openclaw sessions test

```

Alternatively: send a message to your configured channel (e.g. Telegram) and see if the agent responds. If yes — everything is running locally.

To confirm, you can check Ollama's logs:

```bash

# Ollama logs (Linux)

journalctl -u ollama -f

# macOS: Ollama runs in background, logs at

tail -f ~/.ollama/logs/server.log

```

If you see log entries when the agent responds: confirmed, everything's local.

---

Which Model for Which Task?

Not every model is equally suited for every job. Here's what we've learned from our 6-agent setup:

Qwen 2.5 7B — good for:

Simple routing tasks (which agent should do what?)

Short responses and status messages

Heartbeat checks (read emails, check task status)

Text formatting and summarization

Weak at: long multi-step reasoning chains; writing complex code; ambiguous instructions.

Qwen 2.5 14B — good for:

Code review and simple implementations

Writing longer blog posts

More complex multi-step tasks

Tool calling with multiple parallel actions

Weak at: very long context windows (>32k tokens), subtle reasoning tasks that need GPT-4 or Claude.

Qwen 2.5 Coder 32B — for power users:

Full codebase analysis

PR reviews

Debugging complex bugs

Requires at least 64 GB RAM though. Overkill for most setups.

---

Multi-Agent Setup: Different Models Per Agent

This is the killer feature of local models in a multi-agent system: each agent can use a different model.

In our setup:

| Agent | Task | Model |

|-------|------|-------|

| Sam (team lead) | Delegation, coordination | Claude Sonnet (cloud) |

| Peter (coding) | Code review, debugging | Qwen 2.5 Coder 7B (local) |

| Maya (marketing) | Blog posts, copy | Qwen 2.5 14B (local) |

| Alex (everyday tasks) | Emails, calendar | Qwen 2.5 7B (local) |

| Iris (research) | Web search, summaries | Qwen 2.5 14B (local) |

| Atlas (CEO assistant) | Direct assistance | Claude Sonnet (cloud) |

Result: cloud costs reduced to 2 agents that genuinely need complex reasoning. Everything else runs locally.

How to configure different models per agent:

Each agent has its own workspace. In that workspace's OpenClaw configuration, you can override the model:

```bash

# Inside a specific agent's workspace

openclaw config set model.name qwen2.5:7b

openclaw config set model.baseUrl http://localhost:11434/v1

```

Alternatively: set the model via environment variables per container (if you're using Docker).

---

Practical Limitations and How We Handle Them

Context Window

Ollama models have a smaller default context window than cloud APIs. With long conversations or large files, this can become a problem.

Solution: explicitly increase the context window in Ollama:

```bash

# Start model with larger context

OLLAMA_NUM_CTX=32768 ollama serve

```

Or define it in a Modelfile:

```

FROM qwen2.5:7b

PARAMETER num_ctx 32768

```

Tool Calling

Not all Ollama models support reliable tool calling. Qwen 2.5 is better here than most, but worse than Claude or GPT-4.

Practical rule: If a cron job needs to call multiple tools in parallel (e.g. email + calendar + ClickUp simultaneously), use a stronger model. For sequential single-tool calls, Qwen works fine.

Cold Start Latency

The first request after system startup loads the model into RAM — can take 10–30 seconds. After that: fast.

Solution: pre-warm Ollama at startup:

```bash

# Pre-load the model at startup (runs once)

ollama run qwen2.5:7b "" &

```

---

Docker + Ollama: The Production Setup

If you're running multiple agents in Docker containers (like we do), Ollama ideally runs on the host — not inside each container.

```yaml

# docker-compose.yml (excerpt)

services:

agent-maya:

image: openclaw/agent:latest

environment:

- OPENCLAW_MODEL_PROVIDER=openai-compatible

- OPENCLAW_MODEL_BASE_URL=http://host.docker.internal:11434/v1

- OPENCLAW_MODEL_NAME=qwen2.5:14b

- OPENCLAW_MODEL_API_KEY=ollama

volumes:

- ./workspaces/maya:/workspace

agent-alex:

image: openclaw/agent:latest

environment:

- OPENCLAW_MODEL_PROVIDER=openai-compatible

- OPENCLAW_MODEL_BASE_URL=http://host.docker.internal:11434/v1

- OPENCLAW_MODEL_NAME=qwen2.5:7b

- OPENCLAW_MODEL_API_KEY=ollama

volumes:

- ./workspaces/alex:/workspace

```

`host.docker.internal` is the hostname Docker containers use to reach the host machine. On Linux this can differ — check with `docker network inspect bridge`.

---

When Cloud APIs Are Still Worth It

Honestly: local models aren't the right choice in every situation.

Stick with cloud when:

You have an agent that needs complex multi-step reasoning (Sam as team lead, direct CEO assistance)

You don't have at least 16 GB free RAM

The agent frequently handles ambiguous, nuanced instructions

Latency is absolutely critical and you can't tolerate warmup time

Switch to local when:

Tasks are clearly defined and repetitive (email check, task updates, simple copy)

Privacy matters

You want to reduce costs without quality trade-offs on simple tasks

The pragmatic approach: hybrid. Cloud for the thinking work, local for the routine work. That's exactly what our 6-agent team does.

---

The Full Setup

The complete picture — Docker configuration, multi-model setup, Tailscale security, and the exact system prompts for each agent — is documented in the OpenClaw Setup Playbook.

18 chapters, based on real production experience. Not a theoretical framework — the actual thing we run.

Fully available in German too. 🇩🇪

Want to learn more?

Our playbook contains 18 detailed chapters — available in English and German.

Get the Playbook