PROJ-035 Live Status v4 - implementation plan created by planner subagent. Discovery findings documented in discoveries/README.md covering: - JSONL transcript format (confirmed v3 schema) - Session keying patterns (subagent spawnedBy linking) - Hook events available (gateway:startup confirmed) - Mattermost API (no edit time limit) - Current v1 failure modes Audit: 32/32 PASS, Simulation: READY
17 KiB
Implementation Plan: Live Status v4
Generated: 2026-03-07 | Agent: planner:proj035 | Status: DRAFT
1. Goal
Replace the broken agent-cooperative live-status system with a transparent infrastructure-level daemon that tails OpenClaw's JSONL transcript files in real-time and updates a Mattermost status box automatically — zero agent cooperation required. Sub-agents become visible. Spam is eliminated. Sessions never lose state. Works from gateway startup without any AGENTS.md instruction injection.
2. Architecture
OpenClaw Gateway
├── Agent Sessions (main, coder-agent, sub-agents, hooks...)
│ └── writes {uuid}.jsonl as it works
│
└── status-watcher daemon (per active session)
├── Polls/watches {uuid}.jsonl (new line = new event)
├── Parses tool calls, results, assistant text
├── Maps tool names → human-readable labels
├── Debounces Mattermost updates (500ms)
├── Auto-creates status box in correct channel/thread
├── Detects sub-agent spawns → nests sub-agent status
└── Auto-completes when agent stops writing (idle timeout)
sessions.json (runtime registry)
├── session key → {sessionId, sessionFile, spawnedBy, spawnDepth, channel, ...}
└── used to: resolve JSONL file path, determine channel, link parent→child
OpenClaw Hook (gateway:startup + command:new)
└── Spawns status-watcher for the right session
Mattermost API (slack.solio.tech)
├── POST /api/v4/posts → create status box
├── PUT /api/v4/posts/{id} → update in-place (no edit time limit)
└── Multiple bot tokens per agent
Key Design Decisions (from discovery)
-
Watch sessions.json, not just transcript files. sessions.json is the authoritative registry that maps session keys (including sub-agents) to JSONL files. Monitor it to detect new sessions.
-
No new hook events needed. We cannot use
session:start/session:endhooks (they don't exist). Instead: usegateway:startupto begin watching all active sessions, and poll sessions.json for new sessions. -
Sub-agent detection via
spawnedByfield. When sessions.json gets a new entry withspawnedBy, we know it's a sub-agent of the given parent session. Nest its status under the parent status box. -
JSONL format is stable. Version 3 format confirmed. Key events:
messagewith role=assistant+ contenttoolCall→ tool being calledmessagewith role=toolResult→ tool completedmessagewith role=assistant+ contenttext→ agent thinking/respondingcustomwithcustomType: openclaw.cache-ttl→ turn boundary (good idle signal)
-
Mattermost post edit is unlimited.
PostEditTimeLimit = -1. We can update the status post indefinitely. No workaround needed. -
Keep live-status.js as thin orchestration layer. agents can still call it manually for special cases, but it's no longer the primary mechanism.
3. Tech Stack
| Layer | Technology | Version | Reason |
|---|---|---|---|
| Watcher daemon | Node.js | 22.x (existing) | Already installed, fs.watch/setInterval available |
| File watching | fs.watch + fallback polling | built-in | fs.watch is iffy on Linux; polling fallback needed |
| Mattermost API | https (built-in) | - | Already used in live-status.js |
| Session registry | JSON file watch | - | sessions.json updated on every message |
| IPC (parent↔watcher) | PID file + signals | - | Simple, no deps |
| Hook integration | OpenClaw hooks system | existing | gateway:startup hook for auto-start |
4. Project Structure
MATTERMOST_OPENCLAW_LIVESTATUS/
├── src/
│ ├── status-watcher.js CREATE Core transcript tail + parse + debounce
│ ├── session-monitor.js CREATE Watch sessions.json for new/ended sessions
│ ├── mattermost-client.js CREATE Mattermost HTTP API wrapper (rate-limited)
│ ├── tool-labels.json CREATE Tool name → human-readable label map
│ ├── status-formatter.js CREATE Format status box message (text + sub-agents)
│ ├── watcher-manager.js CREATE Start/stop watchers per session, PID tracking
│ ├── live-status.js MODIFY Add start-watcher/stop-watcher commands; keep create/update/complete
│ └── agent-accounts.json KEEP Agent ID → bot account mapping
│
├── hooks/
│ └── status-watcher-hook/
│ ├── HOOK.md CREATE Hook metadata (events: gateway:startup, command:new)
│ └── handler.ts CREATE Spawns watcher-manager on gateway start
│
├── skill/
│ └── SKILL.md REWRITE Remove verbose manual protocol; just note status is automatic
│
├── deploy-to-agents.sh REWRITE Installs hook instead of AGENTS.md injection
├── install.sh REWRITE New install flow: npm install + hook enable
├── README.md REWRITE Full v4 documentation
├── package.json MODIFY Add start/stop/status npm scripts
└── Makefile MODIFY Add check/test/lint/fmt targets
5. Dependencies
| Package | Version | Purpose | New/Existing |
|---|---|---|---|
| node.js | 22.x | Runtime | Existing (system) |
| (none) | - | All built-in: https, fs, path, child_process | - |
No new npm dependencies. Everything uses Node.js built-ins to keep install footprint at zero.
6. Data Model
sessions.json entry (relevant fields)
{
"agent:main:subagent:uuid": {
"sessionId": "50dc13ad-...",
"sessionFile": "50dc13ad-....jsonl",
"spawnedBy": "agent:main:main",
"spawnDepth": 1,
"label": "proj035-planner",
"channel": "mattermost",
"groupChannel": "#channelId__botUserId"
}
}
JSONL event schema (parsed by watcher)
type=session → session UUID, cwd (first line only)
type=message → role=user|assistant|toolResult; content[]=text|toolCall|toolResult
type=custom → customType=openclaw.cache-ttl (turn boundary marker)
Watcher state per session
{
"sessionKey": "agent:main:subagent:uuid",
"sessionFile": "/path/to/uuid.jsonl",
"bytesRead": 1024,
"statusPostId": "abc123def456...",
"channelId": "yy8agcha...",
"rootPostId": null,
"lastActivity": 1772897576000,
"subAgentWatchers": ["child-session-key"],
"statusLines": ["[15:21] Reading file... done", ...],
"parentStatusPostId": null
}
Status box format
Agent: main — PROJ-035 Plan
[15:21:22] Reading transcript format...
[15:21:25] exec: ls /agents/sessions done (0.8s)
[15:21:28] Writing implementation plan...
Sub-agent: proj035-planner
[15:21:42] Reading protocol...
[15:21:55] Analyzing JSONL format...
[15:22:10] Complete (28s)
[15:22:15] Plan ready. Awaiting approval.
Runtime: 53s
7. Task Checklist
Phase 0: Repo Sync + Setup ⏱️ 10min
Parallelizable: no | Dependencies: none
- 0.1: Sync workspace live-status.js to remote repo (git push) → remote matches workspace
- 0.2: Verify Makefile has check/test/lint/fmt targets (or add them) → make check passes
- 0.3: Create
src/tool-labels.jsonwith initial tool→label mapping → file exists - 0.4: Create
src/agent-accounts.json(already exists, verify) → agent→account mapping
Phase 1: Core Watcher ⏱️ 2-3h
Parallelizable: no | Dependencies: Phase 0
- 1.1: Create
src/mattermost-client.js— HTTP wrapper with rate limiting (max 2 req/s), retry on 429, create/update/delete post methods → tested with curl - 1.2: Create
src/status-formatter.js— formats status box lines from events, sub-agent nesting, timestamps → unit testable pure function - 1.3: Create
src/status-watcher.js— core daemon:- Accepts: sessionKey, sessionFile, channelId, rootPostId (optional), statusPostId (optional)
- Reads JSONL file from current byte offset
- On new lines: parse event type, extract human-readable status
- Debounce 500ms before Mattermost update
- Idle timeout: 30s after last new line → mark complete
- Emits events: status-update, session-complete
- Returns: statusPostId (created on first event)
- 1.4: Add
src/tool-labels.jsonwith all known tools → exec, read, write, edit, web_search, web_fetch, message, subagents, nodes, browser, image, camofox_, claude_code_ - 1.5: Manual test — start watcher against a real session file, verify Mattermost post appears → post created and updated
Phase 2: Session Monitor ⏱️ 1-2h
Parallelizable: no | Dependencies: Phase 1
- 2.1: Create
src/session-monitor.js— watches sessions.json for changes:- Polls every 2s (fs.watch unreliable on Linux for JSON files)
- Diffs previous vs current sessions.json
- On new session: emit
session-addedwith session details - On removed session: emit
session-removed - Resolves channel/thread from session key format
- 2.2: Create
src/watcher-manager.js— coordinates monitor + watchers:- On session-added: resolve channel (from session key), start status-watcher
- Tracks active watchers in memory (Map: sessionKey → watcher)
- On session-removed or watcher-complete: clean up
- Handles sub-agents: on
spawnedBysession added, nest under parent watcher - PID file at
/tmp/openclaw-status-watcher.pidfor single-instance enforcement
- 2.3: Entry point
src/watcher-manager.jsCLI:node watcher-manager.js start|stop|status→ process management - 2.4: End-to-end test — run manager in foreground, trigger agent session, verify status box appears → automated smoke test
Phase 3: Channel Resolution ⏱️ 1h
Parallelizable: no | Dependencies: Phase 2
- 3.1: Implement channel resolver — given a session key like
agent:main:mattermost:channel:abc123, extract the Mattermost channel ID → function with unit test - 3.2: Handle thread sessions —
agent:main:mattermost:channel:abc123:thread:def456→ channel=abc123, rootPost=def456 - 3.3: Fallback for non-Mattermost sessions (hook sessions, cron sessions) — use configured default channel → configurable in openclaw.json or env var
- 3.4: Sub-agent channel resolution — inherit parent session's channel + use parent status box as
rootPostId→ sub-agent status appears under parent
Phase 4: Hook Integration ⏱️ 1h
Parallelizable: no | Dependencies: Phase 2, Phase 3
- 4.1: Create
hooks/status-watcher-hook/HOOK.mdwithevents: ["gateway:startup"]→ discovered by OpenClaw hooks system - 4.2: Create
hooks/status-watcher-hook/handler.js(plain JS) — on gateway:startup, spawnwatcher-manager.js startas background child_process → watcher manager auto-starts with gateway. Note: OpenClaw hooks system discovershandler.tsfirst, thenhandler.js— both are supported natively via dynamic import. Plain .js is confirmed to work. - 4.3: Add
hooks/status-watcher-hook/to workspace hooks dir (/home/node/.openclaw/workspace/hooks/) viadeploy-to-agents.sh→ hook auto-discovered - 4.4: Test: restart gateway → watcher-manager starts → verify PID file exists
Phase 5: Polish + Cleanup ⏱️ 1h
Parallelizable: no | Dependencies: Phase 4
- 5.1: Rewrite
skill/SKILL.md— remove manual protocol; say "live status is automatic, no action needed" → 10-line skill file - 5.2: Rewrite
deploy-to-agents.sh— remove AGENTS.md injection; install hook into workspace hooks dir; restart gateway → one-command deploy - 5.3: Update
install.sh— npm install, deploy hook, optionally restart gateway - 5.4: Update
src/live-status.js— addstart-watcherandstop-watchercommands for manual control; mark create/update/complete as deprecated but keep working - 5.5: Handle session compaction — detect if JSONL file gets smaller (compaction rewrites) → reset byte offset and re-read from start
- 5.6: Write
README.md— full v4 documentation with architecture diagram, install steps, config reference - 5.7: Run
make checkto verify lint/format passes → clean CI
Phase 6: Remove v1 Injection from AGENTS.md ⏱️ 30min
Parallelizable: no | Dependencies: Phase 5 (after watcher confirmed working)
- 6.1: Remove "📡 Live Status Protocol (MANDATORY)" section from main agent's AGENTS.md
- 6.2: Remove from all other agent AGENTS.md files (coder-agent, xen, global-calendar, etc.)
- 6.3: Confirm watcher is running before removing (safety check) → watcher PID file exists
8. Testing Strategy
| What | Type | How | Success Criteria |
|---|---|---|---|
| Mattermost client | Unit | Direct API call with test channel | Post created and updated |
| Status formatter | Unit | Input JSONL events → verify output strings | Correct labels, timestamps |
| Channel resolver | Unit | Test session key strings → verify channel/thread extracted | All formats parsed |
| JSONL parser | Unit | Sample events from real transcripts | All types handled |
| Session monitor | Integration | Write to sessions.json, verify events emitted | New session detected in <2s |
| Status watcher | Integration | Append to JSONL file, verify Mattermost post updates | Update within 1s of new line |
| Sub-agent nesting | Integration | Spawn real sub-agent, verify nested status | Sub-agent visible in parent box |
| Idle timeout | Integration | Stop writing to JSONL, verify complete after 30s | Status box marked done |
| Compaction | Integration | Truncate JSONL file, verify watcher recovers | No duplicate events, no crash |
| E2E | Manual smoke test | Real agent task in Mattermost, verify status box | Real-time updates visible |
9. Risks & Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| fs.watch unreliable on Linux | High | Fall back to polling (setInterval 2s). fs.watch as optimization |
| Sessions.json write race condition | Medium | Use atomic read (retry on parse error), debounce diff |
| Mattermost rate limit (10 req/s) | Medium | Debounce updates to 500ms; queue + batch; exponential backoff on 429 |
| Session compaction truncates JSONL | Medium | Compare file size on each poll; if smaller, reset offset |
| Multiple gateway restarts create duplicate watchers | Medium | PID file check + kill old process before spawning new |
| Sub-agent session key not stable across restarts | Low | Use sessionId (UUID) as key, not session key string |
| Watcher dies silently | Low | Cron health check or gateway boot-md restart |
| Non-Mattermost sessions (xen, hook) get status boxes | Low | Channel resolver returns null for non-MM sessions; skip gracefully |
| JSONL format change in future OpenClaw version | Medium | Abstract parser behind interface; version check on session record |
10. Effort Estimate
| Phase | Time | Can Parallelize? | Depends On |
|---|---|---|---|
| Phase 0: Repo Setup | 10min | No | — |
| Phase 1: Core Watcher | 2-3h | No | Phase 0 |
| Phase 2: Session Monitor | 1-2h | No | Phase 1 |
| Phase 3: Channel Resolution | 1h | No | Phase 2 |
| Phase 4: Hook Integration | 1h | No | Phase 2+3 |
| Phase 5: Polish + Cleanup | 1h | No | Phase 4 |
| Phase 6: Remove v1 Injection | 30min | No | Phase 5 (verified) |
| Total | 7-9h |
11. Open Questions
-
Q1: Idle timeout threshold. 30s is aggressive — exec commands can run for minutes. Should we use a smarter heuristic? E.g., detect
stopReason: "toolUse"(agent is waiting for tool) vsstopReason: "stop"(agent is done). Default if unanswered: UsestopReason: "stop"in the most recent assistant message as the idle signal, combined with 10s of no new lines. If stop_reason=toolUse, reset idle timer on every toolResult line. This is accurate and avoids false completions during long tool runs. -
Q2: Default channel for non-MM sessions. Hook-triggered sessions (agent:main:hook
...) don't have a Mattermost channel. Should we (a) skip them, (b) post to a default monitoring channel, or (c) allow config per-session-type?
Default if unanswered: (a) Skip non-MM sessions. Hook and cron sessions are largely invisible today and not causing user pain. The priority is Mattermost interactive sessions. Non-MM support can be Phase 7. -
Q3: Status box per-session or per-request? A single agent session may handle multiple sequential requests. Should each new user message create a new status box, or does one session = one status box? Default if unanswered: One status box per user message (per-request). Each incoming user message starts a new status cycle. When agent sends final response (stopReason=stop + no tool calls), mark current box complete. On next user message, create a new box. This matches expected UX: one progress indicator per task.
-
Q4: Compaction behavior. When OpenClaw compacts a transcript (rewrites the JSONL), does it preserve the original file or create a new one? Default if unanswered: Assume in-place truncation (most likely based on
compactionCountfield in sessions.json). Detect by checking if fileSize < bytesRead on each poll. If truncated, reset bytesRead to 0 and re-read from start (with deduplication via message IDs to avoid re-posting old events).