On this page
๐ Where does the money go?
| Category | Typical cost | Culprit |
|---|---|---|
| Chat (primary model) | $5-15/mo | Sonnet/GPT for daily conversations |
| Cron jobs | $2-8/mo | Scheduled tasks running expensive models |
| Heartbeat | $0.50-3/mo | Running every 15-30 min on an expensive model |
| Sub-agents | $1-5/mo | Spawned workers using the primary model instead of a cheaper one |
| TTS/STT | $5-20/mo | ElevenLabs for voice (if chatty) |
| Context overflow | $0-10/mo | Long conversations with no compaction or session reset |
โ
Rule of thumb: Use /status regularly. It shows model, tokens used, and cost for the current session. If a single session is over $1, something is off.
๐๏ธ Model tiering strategy
The biggest cost lever: use expensive models only when needed.
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-sonnet-4-5",
"fallbacks": ["openai/gpt-4.1-mini"]
},
"heartbeat": {
"model": "google/gemini-2.5-flash-lite"
}
}
}
}
| Task | Best model | Why |
|---|---|---|
| Daily chat | Sonnet 4.5 | Best quality for interactive use |
| Heartbeat checks | Flash-Lite / Nano | Simple "check calendar, check email" tasks โ cheap |
| Morning briefing cron | GPT-4.1-mini / Flash | Structured summary โ doesn't need top-tier model |
| Code review | Sonnet 4.5 | Quality matters for code |
| Dependency audit cron | Flash-Lite | Parsing npm outdated output โ trivial task |
| Sub-agent research | Flash / Mini | Good enough for web search + summary |
๐ Cheap heartbeat model
Heartbeat runs every 15-30 minutes. Using Sonnet here wastes money on "check if there's a new email" tasks:
{
"agents": {
"defaults": {
"heartbeat": {
"every": "30m",
"model": "google/gemini-2.5-flash-lite",
"activeHours": {
"start": "08:00",
"end": "22:00",
"timezone": "Europe/Bucharest"
}
}
}
}
}
- Flash-Lite for heartbeat: ~$0.50/mo vs ~$3/mo with Sonnet
- Active hours: No heartbeats while you sleep = 33% savings
- Increase interval:
"every": "1h"if 30 min is too frequent
โฐ Cron job optimization
Cron jobs are the second biggest cost driver. Key strategies:
Use cheaper models per cron job
openclaw cron add \
--name "Dependency audit" \
--cron "0 9 * * 1" \
--model "google/gemini-2.5-flash-lite" \
--message "Run npm audit and npm outdated"
Use --session isolated
Isolated sessions prevent cron jobs from inflating your main session's context (and cost):
openclaw cron add \
--name "Morning briefing" \
--cron "0 7 * * *" \
--session isolated \
--message "Send my morning briefing"
Set session retention
{
"cron": {
"sessionRetention": "24h",
"runLog": {
"maxBytes": "2mb",
"keepLines": 2000
}
}
}
Sessions from completed cron runs are pruned after 24h, preventing bloat.
๐ Context & token management
- Daily session reset: Prevents conversations from growing indefinitely โ config example:
{
"session": {
"reset": {
"mode": "daily",
"atHour": 4,
"idleMinutes": 120
}
}
}
- Auto-compaction: OpenClaw automatically compacts context when it overflows โ but better to reset before that happens
- Idle timeout:
idleMinutes: 120resets the session after 2 hours of inactivity - Shorter SOUL.md: Every word in SOUL.md is sent with every message. Trim unnecessary instructions.
๐ฆ Skills optimization
Every enabled skill gets injected into the system prompt on every API call โ even when irrelevant. 20 custom skills โ ~32,000 extra tokens per message, adding $30โ100/month.
Quick wins
- Trim
skills.allowBundled: The default loads all 53 bundled skills. Keep only 3โ5 essentials you use on every session. - Archive custom skills: Move specialist skills to
~/.openclaw/skills-archive/and load them on demand via a Skill-Router or IDENTITY.md Relay.
{
"skills": {
"allowBundled": ["github", "healthcheck", "session-logs"]
}
}
Result: Production setups have cut skill token costs by up to 89% (e.g. ~125k โ ~19k tokens per call). For the full pattern โ Skill-Router vs IDENTITY.md Relay, implementation steps, and real-world benchmarks โ see How to Load Skills On Demand.
๐ฅ๏ธ Local models (zero cost)
Run models locally with Ollama for zero API cost:
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-sonnet-4-5",
"fallbacks": ["ollama/qwen2.5:7b"]
},
"heartbeat": {
"model": "ollama/qwen2.5:7b"
}
}
}
}
Local fallback serves two purposes: saves money on simple tasks, and keeps working when cloud APIs go down.
๐ก From the community: "When cloud APIs timeout (happens often), agents automatically fall back to local Ollama. Zero human intervention required." Run both for resilience and savings.
๐จ Budget limits & alerts
OpenRouter spending limits
If using OpenRouter as your provider, set a monthly cap on your API key:
# In OpenRouter dashboard:
# Settings โ API Keys โ Set monthly limit: $30
This is critical for Telegram bots that are always online โ a runaway conversation loop can burn through credits fast.
Per-model rate limiting
{
"agents": {
"defaults": {
"model": {
"rateLimit": {
"maxRequestsPerMinute": 10,
"maxTokensPerDay": 500000
}
}
}
}
}
๐ Cost profiles
| Profile | Monthly cost | Setup |
|---|---|---|
| ๐ข Budget | $5-10 | Flash-Lite primary, Ollama heartbeat, 2 cron jobs, no TTS |
| ๐ก Standard | $15-25 | Sonnet primary, Flash-Lite heartbeat/cron, 5 cron jobs, basic TTS |
| ๐ด Power | $30-50 | Sonnet primary, Flash cron, sub-agents, full TTS, 10+ cron jobs |
| ๐ Local-first | $0-5 | Ollama primary, cloud Sonnet fallback for complex tasks only |
๐ Usage tracking
# Check current session cost
/status
# Check model usage across all sessions
openclaw status --deep
# View cron job costs
openclaw cron runs --id <job-id>
# Check OpenRouter spend
# Visit openrouter.ai/activity
โ
Weekly habit: Run /status at end of week to see total token usage. If it's higher than expected, check cron job frequency and heartbeat model.