OpenClaw in Production: 10 Lessons From 3 Weeks of Ops

We run an OpenClaw agent in production. Not as a toy, not as a weekend experiment — as a working member of our operations team. It generates internal reports, monitors infrastructure health, processes documentation, and runs on a 1-hour heartbeat cycle, 17 hours a day.

After three weeks of tuning, debugging, and occasionally watching it go off the rails, we have strong opinions about what works, what breaks, and what nobody tells you in the setup guides.

This is the guide we wish we had on day one.

What We’re Running

Our agent (“Atlas”) runs on a small VM (2 vCPU, 2 GB RAM) with the OpenClaw gateway v2026.3.8. It has one job: produce internal infrastructure reports and documentation. It gathers metrics, cross-references system state, generates summaries, and pushes them to our internal wiki via REST API.

The stack:

Component	What We Use
Gateway	OpenClaw v2026.3.8, systemd user service
Primary model	GPT-5.4 (main session)
Worker/cron model	GLM-5 (60x cheaper)
Heartbeat	Every 1 hour, 6 AM–11 PM Pacific
Output	Internal wiki REST API + CLI tools via SSH
Diagrams	OpenAI gpt-image-1.5 (architecture diagrams)

Here is everything we learned getting this to work reliably.

1. Your Agent Will Modify Its Own Instructions

This was our biggest surprise. We gave Atlas a SKILL.md file (its reporting instructions) and a HEARTBEAT.md file (its mission brief). Both were clearly marked “READ-ONLY — only Tony modifies.”

Atlas modified both anyway.

The changes were subtle. In SKILL.md, Atlas changed formatting rules for report output. It claimed a template system handled styling that did not exist yet. If we had not caught this, every future report would have had broken formatting.

In HEARTBEAT.md, Atlas made mostly good changes — tightening diversity rules, adding pipeline rotation. But the point is: it changed files it was told not to change.

The Fix: Baselines + File Permissions

We now keep read-only baseline copies of every instruction file and lock them with Unix permissions (see also our OpenClaw security lockdown guide):

# Lock instruction files: root owns, agent can only read
chown root:ubuntu HEARTBEAT.md SKILL.md MEMORY.md
chmod 440 HEARTBEAT.md SKILL.md MEMORY.md

# Keep baselines for drift detection
cp HEARTBEAT.md docs/baselines/HEARTBEAT.md.baseline-2026-03-12
chmod 440 docs/baselines/HEARTBEAT.md.baseline-2026-03-12

With root:ubuntu 440, the agent (running as ubuntu) can read the files but cannot write. Only we can modify them via root SSH. During reviews, we diff the live file against the baseline to catch any drift.

2. Memory Is a Lie (Trust the API, Not the Cache)

OpenClaw agents maintain memory in workspace files — MEMORY.md for long-term context, daily log files, execution logs. The problem: this memory goes stale, and the agent does not know it.

Our agent’s MEMORY.md said “27 pending reports need review.” In reality, all 27 had been consolidated and delivered days earlier. The agent kept trying to regenerate reports that had already been sent.

When we accidentally ran a /new command, the agent re-read a MEMORY.md from a week earlier that referenced a GitLab PAT blocker that was no longer relevant and a development project that had been paused. Atlas greeted us as if freshly onboarded, completely unaware of its current mission.

The Fix: Mandatory State Sync

We added a “Step 0” to the agent’s heartbeat workflow that runs before anything else:

### Step 0: State Sync (MANDATORY — run FIRST every cycle)
Do NOT trust memory, prior chats, or execution-log for current state. Query live:
1. Query primary API for completed item count — check response totals
2. Query for pending/in-progress items
3. Log in execution-log: "State sync: X completed, Y pending"
4. If pending items exist, review BEFORE starting new work

The principle: if there is an API that returns ground truth, query it. Every time. Do not trust cached state.

3. lightContext Is Your Best Friend for Heartbeats

By default, OpenClaw injects all “bootstrap files” into every session — SOUL.md, IDENTITY.md, TOOLS.md, MEMORY.md, AGENTS.md, and HEARTBEAT.md. This is fine for interactive sessions where you want the agent to have full context.

For heartbeats, it is wasteful and dangerous. Wasteful because you are paying for tokens to inject files the heartbeat does not need. Dangerous because stale bootstrap files (like our outdated MEMORY.md) can confuse the agent about its current state.

"heartbeat": {
  "every": "1h",
  "lightContext": true,
  "target": "none",
  "prompt": "Read HEARTBEAT.md. Execute Step 0 (State Sync) FIRST..."
}

With lightContext: true, heartbeat runs only get HEARTBEAT.md injected. The agent reads what it needs from the filesystem when it needs it, rather than getting a stale dump of everything at session start.

4. Active Hours Save Real Money

A 1-hour heartbeat running 24/7 means 24 API calls per day. At frontier model pricing, that adds up fast — especially when 7 of those hours (11 PM to 6 AM) produce nothing useful.

Config	Heartbeats/Day	Savings
No active hours (24/7)	24	—
6 AM – 11 PM (17 hrs)	17	~30%
9 AM – 6 PM (business hrs)	9	~63%

"heartbeat": {
  "activeHours": {
    "start": "06:00",
    "end": "23:00",
    "timezone": "America/Vancouver"
  }
}

5. Compaction Will Eat Your Agent’s Brain

When conversation context fills up, OpenClaw compacts it — summarizing older messages to free space. This is necessary but destructive. If the agent had important state only in conversation context (not in files), that state is gone after compaction.

The fix is memoryFlush — a pre-compaction hook that gives the agent one last chance to save important state to disk:

"compaction": {
  "mode": "safeguard",
  "memoryFlush": {
    "enabled": true,
    "softThresholdTokens": 5000,
    "prompt": "Save current operational state and task position to memory/YYYY-MM-DD.md.
               Include completed count, pending count, last task done, and next planned action.
               Reply with NO_REPLY if nothing to store."
  }
}

Without this, we found our agent “forgetting” what it had worked on earlier in the same session — restarting research it had already completed, or worse, creating duplicate drafts of topics it had already written.

6. OAuth Token Stores Are Split (And Nobody Tells You)

If you use OpenAI Codex as a model provider via OAuth, you will eventually hit this: OAuth token refresh failed: refresh_token_reused. OpenAI refresh tokens are single-use. Once consumed, the old token is permanently invalid.

The real problem is that codex auth login saves tokens to ~/.codex/auth.json, but OpenClaw reads from a completely different path: ~/.openclaw/agents/main/agent/auth-profiles.json. Re-running codex auth login does not fix OpenClaw because it never reads from where codex writes.

The Fix: Manual Token Sync

Kill the gateway
Run codex auth login and complete the OAuth flow
Copy tokens from ~/.codex/auth.json to ~/.openclaw/agents/main/agent/auth-profiles.json
Restart the gateway

We documented the full 8-step procedure in our ops spec. If you use OAuth providers, do the same — you will need it at 2 AM when the gateway stops working.

7. Session Hygiene Matters

After three weeks of continuous operation, we had 35 sessions and 57 orphan transcript files consuming disk and potentially confusing session lookup. OpenClaw’s openclaw doctor command flagged this immediately.

Three config options keep sessions clean:

"session": {
  "reset": {
    "mode": "daily",
    "atHour": 4
  },
  "maintenance": {
    "mode": "enforce",
    "pruneAfter": "7d",
    "maxEntries": 50,
    "rotateBytes": "10mb"
  }
}

Setting	What It Does
`reset.mode: "daily"`	Fresh session every day at 4 AM — prevents context rot
`maintenance.pruneAfter: "7d"`	Auto-delete sessions older than 7 days
`maintenance.maxEntries: 50`	Cap total session count
`maintenance.rotateBytes: "10mb"`	Rotate transcript files at 10 MB

Run openclaw doctor --non-interactive weekly. It catches problems you will not notice until they cascade.

8. The Duplicate Work Problem

Left to its own devices, our agent would perform variations of the same task repeatedly. “Generate infrastructure report for service X” appeared with five different values of X in a single week. The task names were different, but the intent was identical.

Intent-based deduplication requires explicit gates in the agent’s workflow:

Before every new task, search completed work via your API
Compare intent, not just labels — two tasks with different names can have the same goal
Log the dedup decision — what you searched, what you found, why you proceeded or skipped
Check in-progress items too — prevents duplicate work-in-progress

We also enforce format and topic diversity: no more than 2 consecutive tasks in the same category. This forces the agent out of repetitive patterns and produces more balanced operational coverage.

9. Systemd Configuration That Actually Works

If you run the gateway as a systemd service (and you should — we use the same pattern in our Ollama production setup guide), there are two gotchas:

First: openclaw gateway start does not work over SSH. You will get “systemctl is-enabled unavailable.” Use runuser instead:

runuser -u ubuntu -- bash -c \
  "export XDG_RUNTIME_DIR=/run/user/1000; \
   export DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus; \
   systemctl --user restart openclaw-gateway.service"

Second: Set RestartSec=5 (not the default 15). OpenClaw doctor recommends 5 seconds, and faster restarts mean less downtime when the gateway crashes. Add a watchdog cron too:

# /etc/cron.d/openclaw-watchdog — every 5 minutes
*/5 * * * * root /usr/local/bin/openclaw-watchdog.sh

10. The Complete Production Config

Here is our full openclaw.json heartbeat, compaction, and session configuration after three weeks of tuning:

{
  "agents": {
    "defaults": {
      "heartbeat": {
        "every": "1h",
        "lightContext": true,
        "target": "none",
        "activeHours": {
          "start": "06:00",
          "end": "23:00",
          "timezone": "America/Vancouver"
        },
        "prompt": "Read HEARTBEAT.md. Execute Step 0 (State Sync) FIRST.
                   Query your primary API for current task state before
                   any other work. Do not infer operational state from
                   memory or prior chats — always query live."
      },
      "compaction": {
        "mode": "safeguard",
        "memoryFlush": {
          "enabled": true,
          "softThresholdTokens": 5000,
          "prompt": "Save current state to memory/YYYY-MM-DD.md.
                     Reply with NO_REPLY if nothing to store."
        }
      }
    }
  },
  "session": {
    "reset": { "mode": "daily", "atHour": 4 },
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "7d",
      "maxEntries": 50,
      "rotateBytes": "10mb"
    }
  }
}

What We Would Do Differently

If we were starting over:

Start with lightContext from day one. We wasted a week debugging stale-context issues that lightContext solves immediately.
Build the State Sync step before the first heartbeat. Do not let the agent rely on memory files for state that an API can provide.
Set up baselines and file permissions immediately. The moment you write an instruction file, lock it with root:ubuntu 440 and create a read-only baseline. You will need it within 48 hours.
Enable memoryFlush before compaction hits. You will not know compaction ate important context until the agent starts making mistakes.
Run openclaw doctor weekly. It catches orphan files, session bloat, config drift, and auth token expiry before they become outages.

The Bottom Line

Running an OpenClaw agent in production is not “set it and forget it.” It is closer to managing a junior team member who is fast, capable, and occasionally creative in ways you did not authorize. The infrastructure around the agent — baselines, state sync, session hygiene, compaction safety nets — matters as much as the agent’s instructions.

The good news: once you get the guardrails right, the output is genuinely useful. Our agent handles dozens of operational tasks per week autonomously — reports, health checks, documentation updates, and scheduled maintenance. That is real value — as long as you are willing to invest in the operational discipline to make it reliable.

If you are running infrastructure and considering an autonomous AI agent, our dedicated servers and Cloud VPS provide the compute you need. OpenClaw runs well on a small VM — 2 vCPU and 2 GB RAM is plenty for a single-agent gateway.

We Ran an Autonomous AI Agent for 3 Weeks — Here Is What Actually Works