2026-03-2910 min

Running 3 OpenClaw Agents on a Mac Mini: No Cloud, No API Bills, Always On

Multi-AgentLocalMac MiniCostSelf-HostingOpenClaw

Why a Mac Mini Is the Perfect OpenClaw Server

A tweet from this morning captures the current energy in the OpenClaw community perfectly:

> *"running 3 autonomous agents on a mac mini right now with openclaw. no cloud, no api costs. this tool is the real deal"*

If you've been paying $200-400/month in cloud API bills while running AI agents, this post is for you. The Mac Mini M4 (base model, $599) is the most cost-efficient always-on AI agent host available right now. Here's the math that made me switch:

Mac Mini M4 — $599 one-time, ~6W idle, ~15W under load

Power cost — ~$1.50/month at average US electricity rates

Total first-year cost — ~$617

Cloud equivalent — a comparable VPS with enough RAM for local models runs $40-80/month → $480-960/year

And that's before you factor in API costs. If your agents are running Claude or GPT-4 on 48 heartbeats a day with long context windows, you can easily hit $50-150/month in model costs alone.

Running local models eliminates that entirely.

---

The Architecture: 3 Agents, One Machine

Here's how our Mac Mini setup is structured. Three agents, each with a distinct role, all sharing the same hardware:

Agent 1: Main Agent (Sam)

The primary assistant — handles direct chat on Telegram and WhatsApp, manages tasks, reads emails, coordinates with the other agents.

Config:

Model: Claude Sonnet 4.5 via API (for quality on direct chat)

Session: persistent main session

Channels: Telegram + WhatsApp

Agent 2: Research Agent (Iris)

Handles web research, summarization, and knowledge gathering. Runs scheduled research tasks via cron. Posts results to a private Discord channel.

Config:

Model: Qwen2.5-72B via Ollama (local — no API cost)

Session: isolated sessions triggered by cron

Memory: writes summaries to shared `~/research/` directory

Agent 3: Coding Agent (Peter)

Handles code reviews, PR analysis, and automated refactoring tasks. Spawned on-demand by the main agent when needed.

Config:

Model: Claude Sonnet 4.5 via API (for code quality)

Session: spawned as sub-agent, terminated after task

Tools: exec, edit, write, web_fetch

This is the setup most people on X are running when they say "no API costs" — they use local models for the background/autonomous agents and reserve API calls for the high-quality direct-interaction agent only.

---

Step 1: Installing OpenClaw on Mac Mini

Fresh macOS install. Start with the basics:

```bash

# Install Node.js (required)

brew install node

# Install OpenClaw globally

npm install -g openclaw

# Verify

openclaw --version

```

Then configure your gateway:

```bash

openclaw gateway start

```

This starts the persistent gateway daemon. On Mac, you'll want it to auto-start on login — add it to Login Items or create a launchd plist.

Auto-start via launchd:

```xml

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"

"http://www.apple.com/DTDs/PropertyList-1.0.dtd">

<dict>

<key>Label</key>

<string>com.openclaw.gateway</string>

<key>ProgramArguments</key>

<array>

<string>/usr/local/bin/openclaw</string>

<string>gateway</string>

<string>start</string>

</array>

<key>RunAtLoad</key>

<true/>

<key>KeepAlive</key>

<true/>

</dict>

</plist>

```

Save to `~/Library/LaunchAgents/com.openclaw.gateway.plist` and load:

```bash

launchctl load ~/Library/LaunchAgents/com.openclaw.gateway.plist

```

---

Step 2: Installing Ollama for Local Models

This is what eliminates the API bills for your background agents.

```bash

# Install Ollama

brew install ollama

# Start the service

brew services start ollama

# Pull your models

ollama pull qwen2.5:72b # For research/analysis tasks

ollama pull qwen2.5-coder:32b # For coding tasks (lighter, faster)

ollama pull llama3.3:70b # General purpose

```

The Mac Mini M4 with 16GB unified memory handles 7B-14B parameter models comfortably. For 32B+ models, you'll want the 24GB or 32GB variant. Qwen2.5-72B in 4-bit quantization runs on 32GB unified memory — this is the "no cloud" setup serious users are running.

Memory recommendations:

16GB: 7B-14B models — good for research summaries, light coding

24GB: 32B models — solid for most agent tasks

32GB: 70B+ models — near-API quality locally

---

Step 3: Configuring OpenClaw for Multiple Agents

Each agent gets its own directory and config. The recommended structure:

```

~/.openclaw/

workspace/ # Main agent (Sam)

SOUL.md

USER.md

AGENTS.md

MEMORY.md

memory/

.env

~/.openclaw/agents/

iris/ # Research agent

SOUL.md

AGENTS.md

memory/

peter/ # Coding agent

SOUL.md

AGENTS.md

memory/

```

In OpenClaw's config, register each agent with its model override:

```json

{

"agents": [

{

"id": "main",

"name": "Sam",

"workspace": "~/.openclaw/workspace",

"model": "anthropic/claude-sonnet-4-5"

{

"id": "research",

"name": "Iris",

"workspace": "~/.openclaw/agents/iris",

"model": "ollama/qwen2.5:72b"

{

"id": "coding",

"name": "Peter",

"workspace": "~/.openclaw/agents/peter",

"model": "ollama/qwen2.5-coder:32b"

}

]

}

```

The `model` field on individual agents is what OpenClaw 3.24's per-agent model selection feature enables — each agent picks its own model independently of the global default.

---

Step 4: The Cost Control Layer

Even with local models, your main agent still makes API calls. Control this with the built-in API cost management:

In your main agent's config or AGENTS.md, set explicit limits on when to use API vs. local:

```

Model Selection Rules

Direct user conversation → Claude Sonnet (API, quality matters)

Background research tasks → Iris via sessions_send (local Qwen, free)

Code review requests → Peter as sub-agent (local Qwen-Coder for drafts, Claude for final review)

Heartbeat tasks → Use local model if no user is present

Cron jobs → Always isolated sessions with local model unless output quality is critical

```

This hybrid approach — API for user-facing, local for background — is what gets you to ~$10-30/month in API costs vs. $150-400.

---

Step 5: Keeping It Running 24/7

The Mac Mini's biggest advantage over a laptop: it's designed to run continuously. But you still need to handle:

Sleep prevention:

```bash

# Prevent sleep while on power (set in System Settings → Energy Saver)

# Or via CLI:

sudo pmset -c sleep 0

sudo pmset -c disksleep 0

```

Remote access via Tailscale (never expose ports publicly):

```bash

brew install tailscale

# Connect to your tailnet, then access via tailscale IP

# Use tailscale serve for web UIs (tailnet-only, safe)

```

Monitor your agents:

OpenClaw's healthcheck skill gives you a full status dashboard. Run it via cron or ask your main agent: *"run a health check"* — it checks agent uptime, disk space, model availability, and API connectivity.

---

Real Usage Numbers: A Week of Data

Here's what our Mac Mini setup consumed in one week of real usage:

|-------|-------|----------|-------------------|-------------|

| Sam (main) | Claude Sonnet 4.5 | 340 | 4,200 | ~$18 |

| Iris (research) | Qwen2.5-72B (local) | 85 | 12,000 | $0 |

| Peter (coding) | Qwen2.5-Coder-32B (local) | 42 | 8,500 | $0 |

Total weekly API cost: ~$18 (~$72/month)

Compare that to running all three agents on Claude Sonnet 4.5: ~$340/month. The local model stack cut the bill by 79%.

The tradeoff is real: Qwen2.5-72B is excellent but not identical to Claude. For research summaries and background tasks, it's completely sufficient. For direct conversation and high-stakes decisions, Claude remains the right choice.

---

Troubleshooting: The Most Common Issues

Ollama models are slow on first load

The model gets loaded into unified memory on first call and stays warm. Subsequent calls are fast. If you're hitting slowness, keep Ollama running (`brew services start ollama`) so models stay loaded.

Agent loses track of which model it's using

Check the per-agent model selection in config. If the agent reports using the wrong model, verify the `model` field in the agent config is respected — some older OpenClaw versions require a restart after config changes.

Memory fills up after weeks of use

Mac Mini M4 base has 256GB storage. Agent logs, memory files, and model weights (Qwen2.5-72B in 4-bit ≈ 45GB) add up. Set up a monthly cleanup cron:

```bash

# Clean agent logs older than 30 days

find ~/.openclaw/agents/*/memory -name "*.md" -mtime +30 -exec trash {} \;

# Prune Ollama model cache if needed

ollama rm <unused-model>

```

The agent crashes when running 3 simultaneous tasks

This is usually a memory pressure issue. Check `Activity Monitor → Memory Pressure`. If it's consistently red, either reduce concurrent sessions or upgrade to 24GB. OpenClaw's session concurrency can be limited in config.

---

Summary: The Case for Local-First Multi-Agent

The Mac Mini running three OpenClaw agents isn't an experiment — it's a production setup that people are running right now, with real cost savings and real reliability. The key principles:

1. Local models for background agents — Qwen2.5 family is genuinely capable for non-user-facing work

2. API for user-facing quality — don't compromise on the conversations that matter

3. Hybrid cost control — route tasks to the right model, not the most expensive one

4. 24/7 reliability — Mac Mini is designed for this; configure it properly and it just runs

5. No exposed ports — Tailscale for remote access, never public-facing

The complete setup guide — config files, SOUL.md templates for each agent role, cron job examples, and the full Tailscale remote access setup — is in the OpenClaw Setup Playbook.

Also fully available in German. 🇩🇪

Want to learn more?

Our playbook contains 18 detailed chapters — available in English and German.

Get the Playbook