All posts
2026-03-038 min

OpenClaw with Local LLMs: Qwen + Ollama, Zero API Bills

OllamaLocal LLMQwenNo API BillsSelf-Hosted

The Setup That's Going Viral This Week

A setup tweet is making the rounds: OpenClaw + Qwen 3.5 via Ollama — local AI agents at Claude-level quality, no API bill, no cloud, no data leaving your machine.

166 retweets in a few hours. The reason? The question so many people are sitting on: *"Do I really have to pay Anthropic or OpenAI for every single agent call?"*

The answer: No.

This post shows exactly how it works — and what you need to watch out for in a multi-agent setup.

---

Why Local Models at All?

Three reasons that apply to most setups:

1. Cost. With a 6-agent team running heartbeats every 30 minutes, daily cron jobs, and active usage, API costs add up fast — easily €200–500/month. A local model costs: electricity.

2. Privacy. If your agents have access to emails, business data, and internal documents — you might not want that data flowing through a cloud provider. For sensitive setups, this isn't a nice-to-have, it's a requirement.

3. Latency. For simple tasks (read a file, check task status, write a short reply), a local 7B model is faster than a cloud API call with network latency.

---

What You Need

  • Hardware: Minimum 16 GB RAM. For Qwen 3.5 7B: workable. For the 14B model: 32 GB recommended. GPU optional but helpful (Apple Silicon M-chips or NVIDIA).
  • Ollama: Free, open source, runs on macOS / Linux / Windows.
  • OpenClaw: Already installed (if not: `npm install -g openclaw`).
  • A Qwen model: We recommend `qwen2.5:7b` to start, or `qwen2.5:14b` for more demanding agent tasks.
  • ---

    Step 1: Install Ollama and Pull a Model

    ```bash

    # Install Ollama (macOS)

    brew install ollama

    # Install Ollama (Linux)

    curl -fsSL https://ollama.com/install.sh | sh

    # Pull Qwen 3.5 7B (~4.7 GB download)

    ollama pull qwen2.5:7b

    # For more demanding tasks: 14B

    ollama pull qwen2.5:14b

    # Test: talk to the model directly

    ollama run qwen2.5:7b "Hello, what can you do?"

    ```

    If you get a response: Ollama is running. The model is ready.

    Ollama exposes a local API at `http://localhost:11434`. That's the endpoint OpenClaw will use.

    ---

    Step 2: Configure OpenClaw to Use the Local Model

    OpenClaw supports multiple model providers — including the OpenAI-compatible API that Ollama exposes.

    Check your current configuration:

    ```bash

    openclaw config show

    ```

    To switch the model to Ollama/Qwen:

    ```bash

    # Set provider to Ollama-compatible API

    openclaw config set model.provider openai-compatible

    openclaw config set model.baseUrl http://localhost:11434/v1

    openclaw config set model.name qwen2.5:7b

    openclaw config set model.apiKey ollama

    ```

    Important: Ollama doesn't need a real API key, but the field can't be empty. Any placeholder like `ollama` works.

    Then restart the gateway:

    ```bash

    openclaw gateway restart

    ```

    ---

    Step 3: Verify the Agent Is Using the Local Model

    ```bash

    # Check if agent is responding

    openclaw sessions list

    # Direct test session

    openclaw sessions test

    ```

    Alternatively: send a message to your configured channel (e.g. Telegram) and see if the agent responds. If yes — everything is running locally.

    To confirm, you can check Ollama's logs:

    ```bash

    # Ollama logs (Linux)

    journalctl -u ollama -f

    # macOS: Ollama runs in background, logs at

    tail -f ~/.ollama/logs/server.log

    ```

    If you see log entries when the agent responds: confirmed, everything's local.

    ---

    Which Model for Which Task?

    Not every model is equally suited for every job. Here's what we've learned from our 6-agent setup:

    Qwen 2.5 7B — good for:

  • Simple routing tasks (which agent should do what?)
  • Short responses and status messages
  • Heartbeat checks (read emails, check task status)
  • Text formatting and summarization
  • Weak at: long multi-step reasoning chains; writing complex code; ambiguous instructions.

    Qwen 2.5 14B — good for:

  • Code review and simple implementations
  • Writing longer blog posts
  • More complex multi-step tasks
  • Tool calling with multiple parallel actions
  • Weak at: very long context windows (>32k tokens), subtle reasoning tasks that need GPT-4 or Claude.

    Qwen 2.5 Coder 32B — for power users:

  • Full codebase analysis
  • PR reviews
  • Debugging complex bugs
  • Requires at least 64 GB RAM though. Overkill for most setups.

    ---

    Multi-Agent Setup: Different Models Per Agent

    This is the killer feature of local models in a multi-agent system: each agent can use a different model.

    In our setup:

    | Agent | Task | Model |

    |-------|------|-------|

    | Sam (team lead) | Delegation, coordination | Claude Sonnet (cloud) |

    | Peter (coding) | Code review, debugging | Qwen 2.5 Coder 7B (local) |

    | Maya (marketing) | Blog posts, copy | Qwen 2.5 14B (local) |

    | Alex (everyday tasks) | Emails, calendar | Qwen 2.5 7B (local) |

    | Iris (research) | Web search, summaries | Qwen 2.5 14B (local) |

    | Atlas (CEO assistant) | Direct assistance | Claude Sonnet (cloud) |

    Result: cloud costs reduced to 2 agents that genuinely need complex reasoning. Everything else runs locally.

    How to configure different models per agent:

    Each agent has its own workspace. In that workspace's OpenClaw configuration, you can override the model:

    ```bash

    # Inside a specific agent's workspace

    openclaw config set model.name qwen2.5:7b

    openclaw config set model.baseUrl http://localhost:11434/v1

    ```

    Alternatively: set the model via environment variables per container (if you're using Docker).

    ---

    Practical Limitations and How We Handle Them

    Context Window

    Ollama models have a smaller default context window than cloud APIs. With long conversations or large files, this can become a problem.

    Solution: explicitly increase the context window in Ollama:

    ```bash

    # Start model with larger context

    OLLAMA_NUM_CTX=32768 ollama serve

    ```

    Or define it in a Modelfile:

    ```

    FROM qwen2.5:7b

    PARAMETER num_ctx 32768

    ```

    Tool Calling

    Not all Ollama models support reliable tool calling. Qwen 2.5 is better here than most, but worse than Claude or GPT-4.

    Practical rule: If a cron job needs to call multiple tools in parallel (e.g. email + calendar + ClickUp simultaneously), use a stronger model. For sequential single-tool calls, Qwen works fine.

    Cold Start Latency

    The first request after system startup loads the model into RAM — can take 10–30 seconds. After that: fast.

    Solution: pre-warm Ollama at startup:

    ```bash

    # Pre-load the model at startup (runs once)

    ollama run qwen2.5:7b "" &

    ```

    ---

    Docker + Ollama: The Production Setup

    If you're running multiple agents in Docker containers (like we do), Ollama ideally runs on the host — not inside each container.

    ```yaml

    # docker-compose.yml (excerpt)

    services:

    agent-maya:

    image: openclaw/agent:latest

    environment:

    - OPENCLAW_MODEL_PROVIDER=openai-compatible

    - OPENCLAW_MODEL_BASE_URL=http://host.docker.internal:11434/v1

    - OPENCLAW_MODEL_NAME=qwen2.5:14b

    - OPENCLAW_MODEL_API_KEY=ollama

    volumes:

    - ./workspaces/maya:/workspace

    agent-alex:

    image: openclaw/agent:latest

    environment:

    - OPENCLAW_MODEL_PROVIDER=openai-compatible

    - OPENCLAW_MODEL_BASE_URL=http://host.docker.internal:11434/v1

    - OPENCLAW_MODEL_NAME=qwen2.5:7b

    - OPENCLAW_MODEL_API_KEY=ollama

    volumes:

    - ./workspaces/alex:/workspace

    ```

    `host.docker.internal` is the hostname Docker containers use to reach the host machine. On Linux this can differ — check with `docker network inspect bridge`.

    ---

    When Cloud APIs Are Still Worth It

    Honestly: local models aren't the right choice in every situation.

    Stick with cloud when:

  • You have an agent that needs complex multi-step reasoning (Sam as team lead, direct CEO assistance)
  • You don't have at least 16 GB free RAM
  • The agent frequently handles ambiguous, nuanced instructions
  • Latency is absolutely critical and you can't tolerate warmup time
  • Switch to local when:

  • Tasks are clearly defined and repetitive (email check, task updates, simple copy)
  • Privacy matters
  • You want to reduce costs without quality trade-offs on simple tasks
  • The pragmatic approach: hybrid. Cloud for the thinking work, local for the routine work. That's exactly what our 6-agent team does.

    ---

    The Full Setup

    The complete picture — Docker configuration, multi-model setup, Tailscale security, and the exact system prompts for each agent — is documented in the OpenClaw Setup Playbook.

    18 chapters, based on real production experience. Not a theoretical framework — the actual thing we run.

    Fully available in German too. 🇩🇪

    Want to learn more?

    Our playbook contains 18 detailed chapters — available in English and German.

    Get the Playbook