All posts
2026-03-259 min

Fixing Context Rot in OpenClaw Agents: Keeping Your Agent's Memory Sharp

MemoryContextContextEnginePerformanceOpenClaw

What Context Rot Is and Why It Hits Everyone

A tweet from last night sums it up well:

> *"The reason it forgets tasks and hallucinates isn't actually a model intelligence issue, it's an architecture problem. By default, OpenClaw (like most agents right now) just appends everything into a giant context window or a flat vector database. Eventually, it hits 'context rot'"*

That's precise. And it happens slowly enough that most people only notice it after weeks.

The symptoms:

  • The agent gives increasingly vague answers to questions it used to answer precisely
  • It "forgets" preferences or conventions that are written in MEMORY.md
  • Tool calls are more frequently wrong or get skipped entirely
  • Response times increase (the context window gets heavier to process)
  • Hallucinations increase — the agent invents facts instead of admitting it doesn't know
  • The cause: when every session interaction gets appended to a growing history, the model eventually loses the thread. LLMs have an "attention budget." The longer the context, the worse the precision on older entries — this is the well-documented "Lost in the Middle" effect.

    OpenClaw's version 2026.3.7-beta.1 introduced a direct solution: the ContextEngine plugin interface.

    ---

    Why the Default Setup Leads to Context Rot

    In a fresh OpenClaw setup the architecture is simple: every message gets appended to the session history. That works fine for the first few weeks.

    The problem is accumulation:

    ```

    Session 1: 500 token history

    Session 10: 5,000 token history

    Session 100: 50,000 token history

    ```

    Add to that MEMORY.md (loaded at every main session), daily notes, SOUL.md, USER.md — all of that gets pumped into the context window.

    With Claude Opus and its 200K token context window, that sounds fine. But:

    1. Cost: 50,000 token input × 48 heartbeats daily = 2.4 million tokens — in one day, just for context

    2. Quality: LLMs perform worse with very long contexts. Information at the beginning gets effectively "overlooked"

    3. Conflicts: MEMORY.md from today can contradict MEMORY.md from three months ago — the agent doesn't know which version is current

    In short: a bloated context window is not a sign of memory. It's a sign of chaos.

    ---

    The Solution: Three-Tier Memory

    The cleanest solution comes from computer architecture, applied to agents. Just as an operating system separates RAM, cache, and disk, we separate agent memory into three tiers:

    Tier 1: Core Memory (always in context)

    Small, dense information available on every turn:

  • SOUL.md — personality and core rules (200-400 words)
  • USER.md — who the agent helps (under 500 words)
  • Current task / active projects (from HEARTBEAT.md, first 10 items only)
  • This tier must never exceed ~2,000 tokens. It's the agent's "RAM."

    Tier 2: Recall Memory (searchable, not blindly injected)

    All session history, daily notes, past cron results — but not automatically in context. Only when relevant:

    ```

    # Instead of: loading all daily files into context

    # Better: only on explicit need

    memory_search("what did we decide about deployment last week?")

    # → Returns only the relevant snippets, not everything

    ```

    OpenClaw's `memory_search` tool does exactly this: semantic search over all memory files, returns top matches. That's ~500-1,000 tokens instead of 50,000.

    Tier 3: Archival Memory (for deep search)

    Older information that's rarely needed: decisions from 6 months ago, completed projects, expired cron configurations. The agent can access this when explicitly searching for it — but it's never auto-loaded.

    In practice: anything older than 30 days gets moved from daily notes to a compressed archive file.

    ---

    Concrete Implementation: What We Changed

    Here's exactly what we did in our setup to fix context rot:

    Step 1: Compress and Regularly Clean MEMORY.md

    The biggest quick win: MEMORY.md had grown over months and contained stale information.

    ```bash

    # How big is MEMORY.md right now?

    wc -w ~/.openclaw/workspace/MEMORY.md

    # → 8,432 words — way too much

    # Goal: under 1,000 words

    # Process: what's still current? What can go?

    ```

    We introduced a monthly cron job:

    ```

    Schedule: 0 9 1 * * (first day of month, 9 AM)

    Prompt:

    Run a MEMORY.md cleanup:

    1. Read MEMORY.md completely

    2. Check each entry: is it still relevant? Still current?

    3. Remove entries that:

    - Are older than 3 months and aren't permanent context

    - Are about completed projects

    - Have been superseded by newer entries

    4. Condense related entries into compact summaries

    5. Write a new, compressed MEMORY.md (target: under 800 words)

    6. Archive the old version as memory/archive/YYYY-MM.md

    ```

    After the first run: from 8,432 to 743 words. No relevant information lost — but 90% less token load.

    Step 2: Load Daily Notes Selectively

    The default behavior in AGENTS.md loads the last 2 days of notes:

    ```markdown

    3. Read memory/YYYY-MM-DD.md (today + yesterday) for recent context

    ```

    That's reasonable. But for agents with many daily entries (Sam: ~2,000 words per day), that quickly becomes 4,000 tokens just for the last two days.

    Our solution: more structured daily notes with clear sections, so the agent only loads the relevant parts:

    ```markdown

    # memory/2026-03-25.md

    Active Tasks (ALWAYS READ)

  • PR #247 waiting for review
  • Dimitrios needs monthly report by Friday
  • Decisions Today (ONLY WHEN RELEVANT)

  • sam/fix-auth-flow merged (14:23)
  • ClickUp task TC-89 set to "In Review"
  • Detailed Logs (ONLY ON REQUEST)

    [full details...]

    ```

    The agent only loads the "Active Tasks" section automatically — the rest only when explicitly searching for it.

    Step 3: Respect Session Boundaries

    The underestimated problem: very long individual sessions accumulate more and more context. Every message becomes history that gets sent with the next turn.

    For cron jobs: always use `sessionTarget: "isolated"`. Isolated sessions start without history overhead.

    For main sessions (direct chats): explicitly start new sessions on large context switches:

    ```

    # Instead of one eternal session

    # When switching from "project planning" to "code review":

    /restart # OpenClaw starts new session with fresh context

    # (the important info is in MEMORY.md — that gets reloaded fresh)

    ```

    This sounds counter-intuitive (losing context?), but the gain outweighs the cost: the new session reads MEMORY.md fresh and has full "attention focus."

    Step 4: Configure ContextEngine (OpenClaw ≥ 2026.3.7)

    With the ContextEngine update comes an explicit interface for these configurations. In `openclaw.json` or the agent-specific config:

    ```json

    {

    "contextEngine": {

    "strategy": "tiered",

    "coreMemoryMaxTokens": 2000,

    "recallMemorySearchK": 5,

    "archiveAfterDays": 30,

    "compactSessionHistoryAfterTurns": 20,

    "hooks": {

    "onIngest": "scoped",

    "onAssemble": "progressive",

    "onCompact": "summarize"

    }

    }

    }

    ```

    What the hooks mean:

  • `onIngest: "scoped"`: New information is written only to relevant memory tiers, not blindly to everything
  • `onAssemble: "progressive"`: Context is assembled progressively — core memory first, then recall only if needed
  • `onCompact: "summarize"`: When session history gets too long, it's summarized rather than truncated
  • Important: The ContextEngine interface is available from version 2026.3.7-beta.1. Check your version with `openclaw --version`.

    ---

    What Changed After the Switch

    We rolled out these changes across our 6-agent setup two weeks ago. Measurable results:

    Token consumption: -58% (from ~180,000 tokens daily to ~76,000 — at the same task load)

    Response quality: Significantly more precise on questions about information more than 3 days old. Before: vague or wrong. After: correct and citing the memory files.

    Hallucination rate: Dropped sharply. Mainly because the agent now rarely "guesses" — instead it actively searches memory files first.

    Costs: -58% on input tokens means at our model mix (mainly Sonnet) a monthly saving of ~€85. Directly measurable in the Anthropic Console.

    ---

    The "HyperStack" Community Solution

    In the Reddit community r/ClaudeCode, another approach has been going viral: "HyperStack" — a community project specifically for OpenClaw.

    Instead of dumping the full conversation history, HyperStack stores knowledge in structured "cards" — similar to index cards:

    ```json

    {

    "cards": [

    {

    "id": "arch-001",

    "topic": "GitHub Workflow",

    "content": "Never push directly to main. Branch from dev, PRs target dev.",

    "lastUpdated": "2026-03-10",

    "relevanceScore": 0.94

    },

    {

    "id": "pref-003",

    "topic": "Dimitrios Preferences",

    "content": "Prefers short updates. No long explanations unless asked.",

    "lastUpdated": "2026-02-28",

    "relevanceScore": 0.87

    }

    ]

    }

    ```

    When assembling context, only the top-N cards by relevance are included — hybrid search (semantic + keyword).

    This is a valid approach, but more involved to set up than the more native OpenClaw solution via memory_search + tiered structure. For teams accumulating very large amounts of knowledge (>10,000 facts), HyperStack may be the better choice.

    ---

    Quick Check: How Severe Is Your Context Rot?

    Here's a simple test to understand how bad the problem is for you:

    ```bash

    # 1. How large is your MEMORY.md?

    wc -w ~/.openclaw/workspace/MEMORY.md

    # > 2,000 words: clean it urgently

    # 500-2,000 words: acceptable range

    # < 500 words: good

    # 2. How many daily note files exist?

    ls memory/20*.md | wc -l

    # > 60 files without archiving: too many

    # Everything over 30 days should be archived

    # 3. How large are the daily notes on average?

    wc -w memory/2026-03-*.md | tail -1

    # > 1,000 words/day: too much detail in notes

    # 300-600 words/day: reasonable

    # 4. Test: ask the agent about something it did 2 weeks ago

    # Does it give a precise answer with concrete details?

    # → Yes: no acute problem

    # → Vague or wrong: context rot is active

    ```

    ---

    The Principle Behind Everything: Scoped Memory Injection

    The core principle connecting all these measures:

    Don't blindly inject everything into every prompt. Inject only what's relevant for the current turn.

    This sounds simple. It is simple. But it requires actively overriding the default configuration — because "always load everything" is the safe default (you never lose anything), but not the good one.

    The three-tier structure — core always, recall on-demand, archive on-demand — is the practical implementation of this principle.

    If you don't have this in your setup yet: start with MEMORY.md. Reduce it to under 800 words. The difference will be immediately noticeable.

    ---

    The complete setup — ContextEngine configuration, MEMORY.md cleanup as a cron job, the three-tier memory architecture for all 6 agents, and the exact prompts for monthly cleanup — is documented in the OpenClaw Setup Playbook.

    18 chapters, based on real production experience.

    Fully available in German too. 🇩🇪

    Want to learn more?

    Our playbook contains 18 detailed chapters — available in English and German.

    Get the Playbook