Why Your OpenClaw Agent Is Spending More Than You Think (And How to Fix It)
OpenClaw costs can run $0 to $200+/month depending on how you configure it. Here's what's burning your tokens — and the fixes that actually work.
Anywhere from $0 to $200+ per month — and the spread is entirely about configuration, not usage volume.
Run OpenClaw on Ollama with local models: $0. Run it on Claude Sonnet with autonomous task loops and no rate limits: $150–200+ is realistic.
The #1 thing that makes costs spike isn't how much you use your agent — it's agentic loops: tasks where your agent calls a tool, gets a result, calls another tool based on that result, and repeats. Each step burns tokens on both input (accumulated context) and output. A single 10-step loop with Claude Sonnet can cost more than 50 simple single-turn queries.
Most OpenClaw users don't realize this is happening until they check their Anthropic or OpenRouter dashboard and see a number they didn't expect.
The Real Cost Breakdown
Let's put actual numbers on this. OpenClaw costs come from three places: your AI provider, your infrastructure, and how you've configured your agent to behave.
Ollama (Local Models) — $0/month
If you're running models locally via Ollama, your API cost is zero. Models like Llama 3.1, Mistral, and Phi-3 run entirely on your hardware. The cost is your electricity and compute — negligible for most users. The tradeoff is capability: local models are slower and less powerful than Claude or GPT-4 for complex reasoning tasks.
Best for: lightweight tasks, file reads, simple lookups, anything that doesn't require heavy reasoning.
Cloud API Providers — $10–$200+/month
The per-token cost varies enormously by model:
- Claude Haiku 3.5: ~$0.80 per million input tokens / $4 per million output tokens
- Claude Sonnet 3.7: ~$3 per million input tokens / $15 per million output tokens
- GPT-4o mini: ~$0.15 per million input tokens / $0.60 per million output tokens
- GPT-4o: ~$2.50 per million input tokens / $10 per million output tokens
A typical single-turn conversation (one prompt, one response) costs fractions of a cent. That's not the problem. The problem is what agentic workflows do to that math.
What Agentic Loops Actually Cost
When OpenClaw runs an autonomous task — say, "research this topic and write a summary" — it doesn't make one API call. It makes many. Each step adds the previous conversation to the context window. By step 8 of a research loop, your input tokens might be 10,000+ just from accumulated context. Then it outputs. Then repeats.
Example: A 12-step research-and-write task using Claude Sonnet, starting with a 2,000-token context that grows by ~1,500 tokens per step:
- Total input tokens: ~90,000 (growing context across 12 calls)
- Total output tokens: ~18,000
- Cost: ~$0.54 — for one task that took 3 minutes
That's not catastrophic. But run 5 tasks like that per day for a month and you're at $80+ just from those tasks. Add memory writes, tool calls, and background cron jobs — and $150+ is easy.
VPS + API Combined Scenarios
If you're running OpenClaw on a VPS (DigitalOcean, Hetzner, etc.), add that cost too:
- Minimal VPS (2GB RAM): ~$6–12/month
- With Ollama + decent performance (8GB RAM): ~$20–40/month
- Full agent stack, heavy usage (16GB RAM): ~$48–80/month
A mid-range setup — decent VPS + moderate Sonnet usage — lands you at $70–120/month without any red flags. Heavy usage pushes past $200.
The 4 Things That Spike Your Bill Without Warning
These are the patterns we see most often in OpenClaw setups that generate unexpectedly high bills. None of them are obvious from the default configuration.
1. Autonomous Task Loops With No Exit Condition
OpenClaw is designed to run autonomously. That's the point. But "autonomous" without a ceiling means an agent can loop indefinitely if it hits an ambiguous state — or if it's overly thorough. A task that should take 5 steps can stretch to 20 if the model keeps deciding there's more to do.
The fix: Set explicit max-step limits in your agent configs. A task shouldn't be able to call tools more than 10–15 times without a human checkpoint.
2. Memory Writes That Compound Over Time
Every time OpenClaw writes to memory (persisting facts about you, your preferences, project context), it often reads memory first to avoid duplication. That means memory writes involve both read tokens (loading existing memory) and write tokens (the new content). If your agent is memory-heavy and runs frequently, you're paying for memory operations on every session.
The fix: Audit what your agent is actually persisting. Memory should store high-value, frequently-reused context — not ephemeral task notes that get read once and never used again.
3. Tool Calls That Cascade Into More Tool Calls
Web search is the classic example. Your agent searches for something. The results mention something interesting. The agent decides to search for that too. Now you have 3 searches instead of 1 — and each search result gets added to the context window before the next call.
This isn't a bug. The agent is being thorough. But without constraints, "thorough" becomes expensive fast.
The fix: Set tool call budgets per task. Web search should have a max invocation count per session. Same for any tool that generates large outputs.
4. Poorly Scoped Context Windows
The context window is everything the model sees when it generates a response. If you're loading large files, long conversation histories, or extensive memory into context on every call — even when that information isn't needed — you're paying for tokens that don't contribute to the output.
Context window bloat is silent. You won't see it unless you're actively inspecting token counts.
The fix: Use summarization for long histories. Load files selectively, not in bulk. Set a context cap that forces pruning before every call.
How to Audit Your Current Spend
Before you change anything, you need to know where your money is actually going. Here's how to find out.
Step 1: Check Your Provider Dashboard
- Anthropic: console.anthropic.com → Usage. Shows daily token usage by model. Look for spikes — specific days where usage jumped.
- OpenRouter: openrouter.ai/activity. Per-request logs with token counts and costs. You can filter by model and date range.
- OpenAI: platform.openai.com → Usage. Same pattern — daily breakdown by model.
What you're looking for: consistent high usage (probably fine — that's normal load) vs. irregular spikes (indicates runaway loops or unexpected autonomous tasks).
Step 2: Inspect OpenClaw Logs
OpenClaw logs each session with model and token information. Check recent logs:
# View recent session logs
ls -lt ~/.openclaw/logs/ | head -20
# Search for token usage across recent logs
grep -r "tokens" ~/.openclaw/logs/ | tail -100
# Find the most expensive recent sessions
grep -r "total_tokens" ~/.openclaw/logs/ | sort -t: -k3 -rn | head -20Sessions with high total_tokens that you don't remember initiating are the ones to investigate. Check the corresponding log file for what the agent actually did.
Step 3: Set Spend Limits Before You Change Anything Else
Before tuning your config, put a ceiling in place so you're protected while you experiment:
- Anthropic Console: Settings → Limits → Set monthly spend cap ($25–50 is a good starting point)
- OpenRouter: API Keys → Edit key → Set credit limit per key
A spend cap won't fix the underlying issue, but it means a misconfiguration can't spiral into a $300 bill while you figure things out.
Configuration Fixes That Actually Move the Needle
These are the specific changes that make the biggest difference. In order of impact:
1. Route Lightweight Tasks to Haiku (or Local)
Most OpenClaw tasks don't need your most powerful model. Calendar lookups, file reads, simple web searches, status checks — Haiku handles all of these well at a fraction of the cost.
In your OpenClaw agent config:
# ~/.openclaw/workspace/AGENTS.md or agent config
# Specify model per task type
# For lightweight ops (search, read, status)
model: anthropic/claude-haiku-3-5
# For reasoning-heavy tasks (analysis, writing, planning)
model: anthropic/claude-sonnet-3-7If you're using OpenRouter, you can set per-model fallbacks and route based on task complexity. The switching overhead is minimal — the cost savings are substantial.
2. Set Rate Limits and Session Caps
Add hard limits to prevent runaway sessions:
# In your openclaw config or session settings:
# Max tool calls per task (prevents cascading loops)
max_tool_calls_per_session: 15
# Max tokens per session (forces context pruning)
max_context_tokens: 32000
# Session timeout (kills stalled sessions)
session_timeout_minutes: 30These limits feel restrictive until you realize that most tasks finish well within them — and the ones that don't are usually loops that shouldn't be running unchecked anyway.
3. Prune Context Windows Aggressively
Long conversation histories are expensive. If your agent maintains a running context across sessions, it's loading that history into every call — even when it's not relevant.
# Enable context summarization to compress old history
context_strategy: summarize_on_overflow
# Set max context before summarization triggers
context_window_target: 20000
# Don't auto-load all memory on every session
memory_load_mode: selective # vs. eagerSummarization compresses old context into a short summary instead of loading full transcripts. The model loses some detail, but for most tasks that's acceptable — and it can cut your input token count by 60–70%.
4. Audit and Trim Your Agent's System Prompt
Every API call includes your system prompt in the input tokens. A 3,000-token system prompt gets charged on every single call. Across 200 sessions per month, that's 600,000 extra tokens — just from setup instructions.
Review your SOUL.md, AGENTS.md, and any injected context files. Trim aggressively. Remove anything the model doesn't actively use. 800–1,200 tokens is the target for a lean system prompt.
5. Disable Cron Jobs You Don't Need
OpenClaw supports background cron tasks — scheduled agents that run automatically. These are powerful and also invisible cost generators. Check what's running:
# List active cron configurations
cat ~/.openclaw/config.yaml | grep -A 20 "cron"
# Check recent cron session logs
ls -lt ~/.openclaw/logs/ | grep "cron" | head -20Disable any cron jobs you didn't intentionally set up, or that aren't delivering value proportional to their cost. A cron job running every hour with Claude Sonnet adds up fast.
Getting all of this right takes time — finding the config options, testing the limits, calibrating what's acceptable to trade off. If you'd rather skip the tuning and get a configuration that reflects current best practices for cost-conscious OpenClaw usage, that's exactly what ClawMentor's update packages deliver.
Common questions
What's the cheapest way to run OpenClaw?+
Run local models via Ollama. It's free — no API key, no per-token billing, no surprise charges. Models like Llama 3, Mistral, and Phi-3 run entirely on your hardware. The tradeoff: local models are slower and less capable than Claude or GPT-4. For lightweight tasks (calendar checks, simple searches, file reads), local models are excellent. For anything requiring reasoning or long-form output, you'll want a cloud provider — but you can still route selectively to keep costs down.
How do I check what I'm spending on API calls?+
Anthropic users: go to console.anthropic.com → Usage. OpenRouter users: openrouter.ai/activity shows per-model spend in real time. For OpenClaw specifically, check ~/.openclaw/logs/ — each session log includes model name and token counts. You can also run: grep -r "tokens_used" ~/.openclaw/logs/ | tail -50 to see recent usage. If you're using multiple providers, OpenRouter's unified dashboard is the easiest way to see everything in one place.
Which models are cheapest for everyday tasks?+
Claude Haiku 3.5 is the most cost-effective capable model — about 25× cheaper than Claude Sonnet per token. For quick tasks (reading files, short replies, simple lookups), Haiku handles them well at a fraction of the cost. GPT-4o mini is similarly positioned. If you're running Ollama locally, Phi-3 Mini is fast and surprisingly capable for structured tasks with zero per-call cost. The key is routing: use the smallest model that can do the job. Most agentic tasks don't need your most powerful (and expensive) model.
Can I set a spending limit so I don't get a surprise bill?+
Yes — both Anthropic and OpenRouter support spend limits. In Anthropic Console, go to Settings → Usage Limits and set a monthly cap. OpenRouter lets you set per-API-key credit limits, which is ideal if you want to isolate your OpenClaw spending from other tools. In your OpenClaw config, you can also set rate limits per session (max_tokens_per_session) to prevent runaway loops from draining your credits in a single task. Setting a $25–50 monthly cap is a good starting point while you calibrate your usage.
Does OpenClaw have any built-in cost controls?+
OpenClaw has basic controls: you can set which model to use, configure session timeouts, and limit concurrent sessions. But it doesn't automatically choose the cheapest model for each task type, or warn you when a loop is escalating costs. That configuration work is on you — unless you're using a pre-configured package. The defaults are optimized for capability, not cost.
What does ClawMentor do about costs?+
ClawMentor's update packages include pre-tuned cost configurations as part of each delivery. That means model routing rules (Haiku for lightweight tasks, Sonnet for reasoning), context window caps to prevent bloat, rate limiting settings, and session timeout defaults — all calibrated for real-world usage patterns. Instead of spending hours tuning these yourself, you get a configuration that reflects what actually works. Starter plan is $29/month with a 3-day free trial.
Stop flying blind on agent costs
ClawMentor's update packages include pre-tuned cost controls — so your agent runs smart, not just fast. 3-day free trial, cancel anytime.
Get Ember's Package — $29/moCancel anytime · 30-second install