2026-03-168 min

Which AI Model for Which Agent? How to Choose Smartly

LLMCost OptimizationConfigurationMulti-AgentOpenClaw

The Problem With "Best Model for Everything"

When we built our first multi-agent setup, the configuration was simple: all agents on Claude Opus. The strongest, the most expensive — better safe than sorry.

Then the API invoice came after week one. For six agents running around the clock, the number was... significantly higher than expected.

The realization: Not every task needs the most powerful model. When Alex checks my calendar and tells me whether I have a meeting tomorrow — that doesn't need 200-token-per-second intelligence. But Peter, our coding agent, reviewing complex TypeScript architectures — he genuinely needs the best available.

The result of our overhaul: 60% fewer API costs with equal or better quality.

---

The Core Idea: Classify Your Tasks

Before assigning models, you need to understand what each agent actually does. We split our agents into three categories:

Category 1: Reasoning-Intensive Tasks

These tasks require deep thinking, multi-step inference, code analysis, or creative quality work.

Examples:

Code reviews with architectural context

Complex research with synthesis

Strategic content creation (blog posts, proposals)

Error diagnosis in complex logs

Recommended models: Claude Opus 4.5+, GPT-4o, Gemini Ultra

Category 2: Structured, Rule-Based Tasks

These tasks follow clear patterns. Input is structured, output is predictable, error rates are low.

Examples:

Calendar queries and meeting reminders

Email classification (important / not important / spam)

Simple webhook triggers and API calls

Daily status summaries from structured data

Recommended models: Claude Sonnet 4.5, GPT-4o Mini, Gemini Flash

Category 3: Simple Execution

Tasks where the model mostly acts as an interface: receive command, call tool, return result.

Examples:

Read and summarize files

Forward simple database queries

Heartbeat checks (is the server reachable?)

Route notifications

Recommended models: Claude Haiku, GPT-4o Mini, Gemini Flash 8B

---

How to Configure Models in OpenClaw

OpenClaw lets you set the model per agent in `~/.openclaw/openclaw.json` (or `openclaw.json5`). The configuration looks like this:

```json

{

"agents": {

"sam": {

"model": "anthropic/claude-opus-4-5",

"workspace": "/home/sam/.openclaw/workspace"

"peter": {

"model": "anthropic/claude-opus-4-5",

"workspace": "/home/peter/.openclaw/workspace"

"maya": {

"model": "anthropic/claude-sonnet-4-5",

"workspace": "/home/maya/.openclaw/workspace"

"alex": {

"model": "anthropic/claude-haiku-3-5",

"workspace": "/home/alex/.openclaw/workspace"

"iris": {

"model": "anthropic/claude-sonnet-4-5",

"workspace": "/home/iris/.openclaw/workspace"

"atlas": {

"model": "anthropic/claude-opus-4-5",

"workspace": "/home/atlas/.openclaw/workspace"

}

```

Alternatively, you can set the model per agent via an environment variable in the Docker Compose file:

```yaml

services:

alex:

image: openclaw/agent:latest

environment:

- OPENCLAW_MODEL=anthropic/claude-haiku-3-5

- OPENCLAW_AGENT_NAME=alex

volumes:

- /home/alex/.openclaw/workspace:/workspace

```

Both methods work. We prefer the JSON configuration because it documents all agents centrally.

---

Our Real Setup: The 6-Agent Model Matrix

Here's exactly what we run — no theory, the actual setup:

|-------|------|-------|--------|

Cost comparison (estimated, moderate usage):

Before the change (all Opus): ~€380/month

After the change (mixed models): ~€145/month

Savings: ~62%

---

How to Tell When an Agent Has the Wrong Model

Signs of "model too weak":

Responses get shorter and flatter than expected

The agent asks for clarification more often on tasks that should be clear

Tool calls are invoked incorrectly or missed entirely

Code reviews miss obvious problems

Signs of "model over-provisioned":

The agent responds with multi-page analyses to simple yes/no questions

Simple calendar queries take 10+ seconds

The invoice grows without any perceptible improvement in quality

The fix: take a short look each week at the ratio of token consumption to output quality. With Alex, we noticed after two weeks that Haiku handles 95% of his tasks just fine.

---

Dynamic Model Switching: The Next Level

Advanced setups can switch models within an agent based on context. OpenClaw supports this via the session status override:

```

# In chat with the agent:

/model anthropic/claude-opus-4-5

# Or programmatically in a cron job instruction:

"For this task use /model anthropic/claude-opus-4-5 — complex analysis needed."

```

We use this rarely — mainly when Iris gets a particularly complex research assignment and needs to briefly switch to Opus. The default config stays Sonnet.

---

Mixing Providers: OpenAI + Anthropic + Gemini

OpenClaw supports multiple providers simultaneously. That means: you don't have to commit to one vendor.

```json

{

"agents": {

"maya": {

"model": "openai/gpt-4o-mini"

"alex": {

"model": "google/gemini-flash-1.5"

}

```

Important: each provider needs its own API key in `.env`:

```bash

ANTHROPIC_API_KEY=sk-ant-...

OPENAI_API_KEY=sk-...

GOOGLE_API_KEY=...

```

We experimented with mixed providers but ultimately stayed with Anthropic for all agents — consistent quality and a single invoice is operationally simpler.

---

Practical Guide: The Three-Question Model Decision

If you're unsure which model is right for an agent, ask these three questions:

1. Does the agent need to weigh contradictory information?

→ Yes → At least Sonnet, preferably Opus

2. Are the agent's tasks mostly predictable and structured?

→ Yes → Haiku or Flash is enough

3. Does a human see the output directly (CEO, customer, public)?

→ Yes → No compromise: Opus

These three questions helped us move our setup from "all Opus" to a thoughtful mix.

---

The Complete Setup

The exact configuration — openclaw.json, Docker Compose with model variables, and the criteria by which we selected each model — is documented in the OpenClaw Setup Playbook.

Including the monitoring setup we use to track token consumption per agent and detect when a model is over- or under-performing.

18 chapters, based on real production experience.

Fully available in German too. 🇩🇪

Want to learn more?

Our playbook contains 18 detailed chapters — available in English and German.

Get the Playbook