2026-04-0110 min

OpenClaw Orchestrator Pattern: How to Save 80% of Your Tokens Using Opus + Sonnet Sub-Agents

OpenClawSub-AgentsToken OptimizationClaudeOrchestrationCost

The Tweet That Started This

Someone posted this earlier today and it hit 98 impressions in under 10 minutes — which for an OpenClaw thread is basically viral:

> *"The biggest token saver nobody talks about: offload heavy tasks to sub-agents instead of keeping everything in one conversation. I run OpenClaw with Opus as orchestrator and Sonnet sub-agents for coding — main context stays tiny while sub-agents burn through tokens in isolated [sessions]"*

This is exactly right, and it's something the OpenClaw docs under-explain. Let's fix that.

---

The Problem: Context Rot Is Expensive

If you've been running OpenClaw for a while, you've probably hit this pattern:

1. You ask your agent to do something complex

2. The agent works through it in your main chat session

3. The conversation gets longer and longer

4. Responses get slower and more expensive

5. Eventually the agent starts forgetting things from 30 messages ago

This is context rot — and it costs you money on every single request because every message in the thread gets re-sent to the model with every new request.

A 50-message conversation with code snippets can easily hit 100k tokens *per response*. At Opus pricing, that adds up fast.

---

The Solution: Orchestrator + Sub-Agent Architecture

The fix is conceptually simple:

Orchestrator (Opus or your main agent): Holds the high-level plan, user preferences, decisions. Context stays small — just instructions and results.

Sub-agents (Sonnet or cheaper models): Do the actual heavy work in isolated sessions. They burn through tokens in a fresh context, then return a summary.

Your main context never accumulates the messy intermediate steps. It only sees the clean output.

Here's what this looks like in practice:

```

Main session (Opus):

"Hey, refactor the auth module to use JWT"

→ spawns Sonnet sub-agent with full codebase context

→ sub-agent works in isolation (5000 tokens of back-and-forth)

→ returns: "Done. Changed 3 files, here's a summary."

→ main session adds 200 tokens, not 5000

```

---

Setting It Up in OpenClaw

OpenClaw has first-class support for this via `sessions_spawn`. Here's the basic pattern in your SOUL.md or agent instructions:

```

When a task is complex or will require many steps:

1. Summarize the goal clearly

2. Use sessions_spawn to create an isolated sub-agent

3. Pass the goal + necessary context as the task

4. Wait for the result, then summarize it back to the user

```

The key insight: you control what context the sub-agent gets. You don't dump your entire conversation history into it. You write a clean, focused brief.

Example: Coding Sub-Agent

In your main agent's AGENTS.md or instructions:

```

For coding tasks that require more than 3 file edits:

Spawn a sub-agent with runtime="acp" and the coding agent ID

Pass: repo path, task description, acceptance criteria

Do NOT pass: the entire conversation history

Receive: summary of changes made + any blockers

```

This is exactly what the tweet described — Opus stays as the brain, Sonnet does the hands-on work.

---

Model Selection by Task Type

Not every task needs the same model. OpenClaw lets you specify the model per sub-agent spawn. Here's a practical breakdown:

| Task | Recommended Model | Why |

|------|-----------------|-----|

| Planning / decisions | Opus 4.x | Needs deep reasoning |

| Code writing | Sonnet 4.x | Fast, cheap, capable |

| Simple lookups | Haiku / mini | Near-free, fast |

| Long document analysis | Sonnet 4.x | Good context handling |

| Creative writing | Sonnet 4.x | Solid quality, good cost |

The tweet mentioned a $50/month setup: Codex mini for main brain, MiniMax for daily execution, Opus for feature planning. That's the orchestrator pattern applied to cost optimization.

---

Keeping Track: What Your Orchestrator Stores

The main session should only store decisions and outcomes, not process:

Store in main context (or MEMORY.md):

Final outcomes ("auth module now uses JWT, deployed to staging")

Decisions made ("we chose RS256 over HS256 because of multi-service setup")

Blockers and next steps

Do NOT store in main context:

Step-by-step execution logs

Intermediate code drafts

Debug output from sub-agents

This discipline is what keeps your orchestrator context lean over time.

---

The Token Math

Let's make this concrete. Say you have a coding task that involves:

Reading 10 files (~8,000 tokens)

8 rounds of back-and-forth debugging (~12,000 tokens)

Final summary (~500 tokens)

Without sub-agents:

All 20,500 tokens accumulate in your main session. Every future message costs those 20,500 tokens plus whatever comes next.

With sub-agents:

The 20,000 tokens of work happen in isolation. Your main session only gains the 500-token summary.

Over 10 such tasks, the difference is:

Without sub-agents: **200,000+ tokens** in main context

With sub-agents: **5,000 tokens** in main context

At Opus pricing (~$15/million input tokens), that's a difference of roughly $2.93 per response after 10 tasks. If your agent handles 50 requests/day, that's $146/day vs $7.50/day.

The orchestrator pattern doesn't just feel cleaner — it is dramatically cheaper.

---

Practical SOUL.md Additions

Add these guidelines to your SOUL.md or agent instructions to make this automatic:

```

Task Delegation Rules

Any coding task > 3 files → spawn sub-agent (runtime="acp")

Any research task > 5 web pages → spawn sub-agent

Any task taking > 2 minutes → spawn sub-agent + send progress updates

Always pass a CLEAN task brief to sub-agents, not conversation history

Always store only the OUTCOME in main memory, not the process

```

---

Common Mistakes

Mistake 1: Passing your entire conversation to the sub-agent

You negate all the benefits. Write a fresh brief. The sub-agent doesn't need to know about your earlier chat about something else.

Mistake 2: Using Opus for sub-agents

Sonnet handles the vast majority of coding and execution tasks perfectly well. Reserve Opus for planning, complex reasoning, and decisions that need deep thinking.

Mistake 3: Not summarizing sub-agent output

If a sub-agent returns 3,000 tokens of output and you dump it all into main context, you've only half-solved the problem. Ask the sub-agent to summarize, or summarize it yourself before storing.

Mistake 4: Spawning sub-agents for trivial tasks

Spawning has overhead — session creation, context loading, etc. For a quick "what's 2+2" style task, just answer in main session. Sub-agents are for heavy lifting.

---

The Cost Setup That Works

Going back to the original tweet, here's a proven $50/month setup:

Orchestrator: Codex 5.4/mini at $20/month for the main always-on brain

Daily execution: MiniMax M2.7 at $10/month for routine tasks

Heavy lifting: Opus 4.6 at $20/month, spawned on-demand for complex planning

The key: Opus is not your always-on model. It's spawned only when genuinely needed, runs in isolation, and returns a clean result. Your $20 Opus budget goes much further when it's not burning tokens on every heartbeat.

---

Quick Checklist

1. ✅ Identify your top 3 most expensive recurring tasks

2. ✅ Write a sub-agent brief template for each

3. ✅ Add delegation rules to your SOUL.md or agent instructions

4. ✅ Set Sonnet (or cheaper) as default for sub-agent spawns

5. ✅ Reserve Opus for orchestration and complex decisions only

6. ✅ After each sub-agent run, store only the outcome in main memory

Your main context should stay under 20,000 tokens for normal daily use. If it's consistently hitting 80,000+, you need sub-agents.

Everything covered here works with a standard OpenClaw setup — no plugins, no extra dependencies.

Vollständige Einrichtung im OpenClaw Setup Playbook dokumentiert. 🇩🇪

Want to learn more?

Our playbook contains 18 detailed chapters — available in English and German.

Get the Playbook