OpenClaw Context Optimization

The context window is the hidden battleground of every AI conversation.

You ask a question. The model responds. Simple, right? But behind that exchange lies a complex dance of token budgets, memory persistence, and architectural trade-offs. Every message you send, every tool the assistant calls, every file it reads—all of it consumes a finite resource: the context window.

Let’s explore how OpenClaw tackles this challenge, from basic token hygiene to sophisticated memory architectures that give your assistant true recall.

What Counts Toward Context?

Everything. Literally everything sent to the model:

System prompt — Rules, tool definitions, skills list, time/runtime info
Conversation history — Every user message and assistant reply
Tool calls and results — Command output, file contents, API responses
Attachments — Images, documents, audio files
Compaction summaries — Condensed history entries
Provider metadata — Hidden wrappers you never see but still pay for

Modern models like Claude offer 200K token windows. Sounds generous until you realize:

A single file read of a 500-line codebase can consume 5,000+ tokens
Long-running sessions accumulate hundreds of thousands of tokens in history alone
Each image attachment might cost 1,000+ tokens

The challenge isn’t running out of space—it’s managing what enters that space intelligently.

OpenClaw’s Context Architecture

OpenClaw builds a custom system prompt for every agent run. Understanding what’s inside helps you optimize.

System Prompt Sections

Section	What It Contains	Token Impact
Tooling	Available tools + descriptions	High (varies by policy)
Safety	Guardrail reminders	Low
Skills	Compact list with paths	Medium
Workspace	Working directory location	Low
Workspace Files	Injected bootstrap files	Variable
Current Date/Time	User-local timezone	Low
Runtime	Host, OS, model info	Low

Prompt Modes

OpenClaw uses different prompt sizes for different scenarios:

┌─────────────────────────────────────────────────┐
│  Mode         │  Use Case        │  Sections    │
├─────────────────────────────────────────────────┤
│  full         │  Main agent      │  All         │
│  minimal      │  Sub-agents      │  Core only   │
│  none         │  Special cases   │  Identity    │
└─────────────────────────────────────────────────┘

This is lazy loading in action. Sub-agents get a minimal prompt because they don’t need heartbeat management, memory recall, or self-update capabilities. The main agent shoulders that overhead so spawned workers can stay focused.

:::tip[Key Insight] Skills are listed in the prompt but their instructions aren’t loaded. When a task matches a skill, the model reads the SKILL.md file on-demand. This saves ~400-500 tokens per skill versus including everything upfront. :::

Techniques for Token Reduction

1. Bootstrap File Truncation

OpenClaw injects workspace files every turn:

AGENTS.md   → Behavior rules
SOUL.md     → Agent persona
TOOLS.md    → Tool notes
USER.md     → Human context
MEMORY.md   → Long-term memories

These files have caps:

// Default configuration
agents.defaults.bootstrapMaxChars: 20000       // Per-file
agents.defaults.bootstrapTotalMaxChars: 150000 // Total

Large files get truncated with visible markers:

[truncated: output exceeded context limit]

Best practice: Keep bootstrap files concise. MEMORY.md is injected every turn—it should be curated wisdom, not raw dumps.

2. Compaction: Summarize and Persist

When history grows too large, OpenClaw summarizes older turns into a compact entry:

[compacted: Previous discussion about React component architecture. 
Key decisions: Use Server Components, implement error boundaries 
at route level, defer data fetching to Suspense boundaries.]

Crucially, compaction persists. The summary is written to the session’s JSONL history file. Future sessions can still reference it.

:::warning[Compaction ≠ Pruning]

Compaction rewrites history files with summaries
Pruning temporarily trims tool results per request (doesn’t persist) :::

3. Session Pruning

Pruning targets old tool results without touching conversation history:

agents.defaults.contextPruning: {
  mode: "cache-ttl",
  ttl: "5m",
  keepLastAssistants: 3,
  softTrimRatio: 0.3,    // Trim 30% when soft threshold hit
  hardClearRatio: 0.5,   // Clear 50% when hard threshold hit
  minPrunableToolChars: 50000
}

What gets trimmed:

Only toolResult messages with large outputs
User and assistant messages are never modified
Tool results with images are preserved

4. Prompt Caching

Caching reduces costs when prompts repeat:

agents.defaults.models: {
  "anthropic/claude-opus-4-6": {
    params: {
      cacheRetention: "short"  // 5 minutes
    }
  }
}

Retention	TTL	Use Case
`none`	Disabled	Bursty/notification agents
`short`	5 minutes	Cost-first baseline
`long`	1 hour	Always-on sessions

Pair caching with heartbeat intervals that keep the cache warm:

agents.defaults.heartbeat: {
  every: "55m"  // Trigger before cache expires
}

The True-Recall Memory System

This is where it gets interesting.

Traditional AI memory is brittle—everything lives in the context window, and once it scrolls off, it’s gone. True-Recall solves this with a three-tier architecture:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Redis Buffer  │────▶│  LLM Curator    │────▶│ Qdrant Vectors  │
│  (24hr TTL)     │     │  (Gems Extract) │     │ (Long-term)     │
└─────────────────┘     └─────────────────┘     └─────────────────┘

True-Recall Architecture

How It Works

Tier 1: Redis Buffer

Every conversation turn is captured to Redis with a 24-hour TTL:

Key: mem:{user_id}
Value: [{ role: "user", content: "..." }, ...]
TTL: 86400 seconds

This gives you a rolling window of recent context—immediately available, automatically expiring.

Tier 2: LLM Curation

A daily job (3:30 AM default) processes the Redis buffer through an LLM. But instead of storing everything, it extracts “memory gems”—the actually memorable stuff:

# What counts as a gem?
- Decisions made: "Chose React over Vue for the dashboard"
- Preferences discovered: "Prefers concise summaries over detailed explanations"
- Facts worth remembering: "Birthday is March 15th"
- Projects context: "Currently building a memory system combining Qdrant + Redis"

Only gems with 0.6+ confidence score get persisted. A hundred turns of casual chat might yield 2-3 gems.

Tier 3: Qdrant Vector Storage

Gems are embedded (using mxbai-embed-large, 1024 dimensions) and stored in Qdrant:

collectionName: "true_recall"
embeddingModel: "mxbai-embed-large"
vectorSize: 1024

On next session start, relevant memories are retrieved via semantic search and injected into context.

The Configuration

plugins: {
  entries: {
    memory-qdrant: {
      config: {
        autoRecall: true,       // Auto-inject relevant memories
        collectionName: "true_recall",
        embeddingModel: "mxbai-embed-large",
        maxRecallResults: 2,    // Limit injection size
        minRecallScore: 0.7     // Similarity threshold
      }
    }
  }
}

:::tip[Why This Matters] The LLM curator is the secret sauce. Most memory systems store everything verbatim—which means they store noise. True-Recall filters for signal. The memories that persist are the ones worth persisting. :::

Tiered Memory: The Full Picture

OpenClaw’s memory architecture has four tiers, each with different properties:

Session Context  ←────────────────────────────────────┐
(Volatile, in-window)                                  │
                                                       │
         ┌─────────────────────────────────────────────┘
         ▼
Daily Files (memory/YYYY-MM-DD.md)
(On-demand, not injected)
         │
         ▼
MEMORY.md
(Long-term, injected every turn)
         │
         ▼
Qdrant/Redis
(Semantic search, on-demand recall)

Tier	Persistence	Token Cost	Access Pattern
Session	Lost on close	High (in window)	Immediate
Daily Files	Persisted	None (on-demand)	Explicit read
MEMORY.md	Persisted	Medium (injected)	Always visible
Vector DB	Persisted	Low (queried)	Semantic search

Best Practices for Users

1. Write to Disk, Not to RAM

When someone says “remember this”—write it to a file. Mental notes don’t survive session restarts.

Decisions → MEMORY.md
Daily notes → memory/YYYY-MM-DD.md
Lessons learned → AGENTS.md

2. Keep MEMORY.md Curated

MEMORY.md is injected every turn. Every character costs tokens. Treat it like a well-edited journal, not a raw dump:

# Good: Curated and actionable
## Decisions
- 2026-02-22: Chose mxbai-embed-large for embeddings

## Preferences
- Prefers concise responses
- Uses Telegram for primary communication

# Bad: Raw conversation dump
[5000 words of chat logs...]

3. Use Heartbeats for Maintenance

Heartbeats run periodically without user interaction. Use them for cleanup:

# HEARTBEAT.md
- Check for stale memory files (> 7 days)
- Review recent daily notes for distillation
- Clean up old session files

Configure interval based on usage:

agents.defaults.heartbeat: {
  every: "30m"  // 2-4 times per day
}

4. Audit Regularly

# Check context size
/context list

# Deep breakdown
/context detail

# Manual cleanup
/compact Focus on decisions and open questions

5. Respect Group Privacy

In group contexts, MEMORY.md is never injected. Your assistant remembers what’s shared in that group—nothing more. This is by design.

Practical Checklist

Daily

Check /context list if the session feels slow
Write important decisions to MEMORY.md
Log activity to memory/YYYY-MM-DD.md

Weekly

Audit MEMORY.md size (keep under 10KB)
Clean up old daily files
Remove unused skills
Review tool schema sizes with /context detail

Configuration Tuning

Set cacheRetention based on usage pattern
Enable contextPruning for long sessions
Configure heartbeat intervals for cache warmth
Tune memory injection thresholds

The Bottom Line

Context optimization isn’t about cramming more into the window—it’s about being intentional about what enters.

OpenClaw’s approach combines:

Lazy loading — Skills load on-demand, not upfront
Tiered memory — Different persistence levels for different needs
Smart compaction — Summarize and persist, don’t just truncate
Curation — LLM-powered filtering separates signal from noise

The result: An assistant that remembers what matters, forgets what doesn’t, and stays sharp across months of conversation.

:::tip[Final Thought] Context is the scarcest resource in AI interactions. Every token you save is a token available for what actually matters. Optimize accordingly. :::

OpenClaw Context Optimization

What Counts Toward Context?

OpenClaw’s Context Architecture

System Prompt Sections

Prompt Modes

Techniques for Token Reduction

1. Bootstrap File Truncation

2. Compaction: Summarize and Persist

3. Session Pruning

4. Prompt Caching

The True-Recall Memory System

How It Works

The Configuration

Tiered Memory: The Full Picture

Best Practices for Users

1. Write to Disk, Not to RAM

2. Keep MEMORY.md Curated

3. Use Heartbeats for Maintenance

4. Audit Regularly

5. Respect Group Privacy

Practical Checklist

Daily

Weekly

Configuration Tuning

The Bottom Line

Anthony Lattanzio

Comments

What Counts Toward Context?

OpenClaw’s Context Architecture

System Prompt Sections

Prompt Modes

Techniques for Token Reduction

1. Bootstrap File Truncation

2. Compaction: Summarize and Persist

3. Session Pruning

4. Prompt Caching

The True-Recall Memory System

How It Works

The Configuration

Tiered Memory: The Full Picture

Best Practices for Users

1. Write to Disk, Not to RAM

2. Keep MEMORY.md Curated

3. Use Heartbeats for Maintenance

4. Audit Regularly

5. Respect Group Privacy

Practical Checklist

Daily

Weekly

Configuration Tuning

The Bottom Line

Get Early Access

Anthony Lattanzio

Comments