Root cause: TRANSCRIPT_DIR was /home/node/.openclaw/agents but the actual
sessions live at /root/.openclaw/agents. The daemon started, watched an empty
directory, and never detected any sessions.
Changes:
- config.js: default TRANSCRIPT_DIR -> /root/.openclaw/agents
- start-daemon.sh: same fix for fallback default
- .env.daemon (local): TRANSCRIPT_DIR fixed, PLUGIN_ENABLED=false
The custom_livestatus post type requires the Mattermost plugin webapp React
bundle to render. Disabled by default — now uses plain REST API posts with
markdown formatting, which render reliably everywhere (desktop, mobile, web).
Previous version preserved as git tag v4.1.
When the lock file is deleted (turn complete) and triggerIdle fires,
the transcript file continues receiving writes (the agent's own reply
being appended). The ghost watch was firing session-reactivate on these
trailing writes, causing an immediate complete→reactivate→complete loop
within the same turn.
Fix: only emit session-reactivate from ghost watch if the lock file
currently exists. A JSONL write without a lock file is a trailing write
from the completed turn, not a new user message.
Root cause of double status boxes: lock file event + ghost watch both fire
at the same time on reactivation. Both call clearCompleted+pollNow, both
session-added events reach the handler before activeBoxes.has() returns true
for either, so two status boxes are created.
Fix: sessionAddInProgress Set gates the handler. First caller proceeds,
second caller sees the key in-progress and returns immediately. Cleared
on success (after activeBoxes.set) and on error (before return).
When a bare channel session (no 🧵 suffix) and a thread-specific session
both resolve to the same MM channel/DM, two status boxes appeared simultaneously.
Fix: in session-added handler, before creating a box, check if any existing
active session already owns that channelId. Thread sessions displace bare channel
sessions (and take priority). Bare channel sessions are skipped if a thread
session already exists. First-in wins for same-type duplicates.
Heartbeat sessions (key pattern agent:<agent>:main) have no real Mattermost
conversation context. The daemon was resolving them to the DM fallback channel
and creating a new status box on every heartbeat cycle (~every 30min but firing
rapidly during active work). Each one appeared as a separate live status post.
Fix: in session-added handler, skip any session key matching /^agent:[^:]+:main$/
or /^agent:[^:]+:cli$/ before creating a status box.
Without this, _onSessionAdded never fires on reactivation because
isKnown=true short-circuits the new-session detection branch.
clearCompleted is called on lock file creation / ghost watch fire.
It clears _completedSessions cooldown but the session key stayed in
_knownSessions (isKnown=true), so poll() treated it as already tracked
and silently updated the entry instead of firing _onSessionAdded.
Fix: also delete from _knownSessions in clearCompleted() so next poll
sees the session as unknown and fires _onSessionAdded -> creates new
plugin status box.
Also: findExistingPost skipped in plugin mode on session-added to
prevent stale post reuse from REST search results.
When the gateway deletes .jsonl.lock it means the final reply was sent.
Use this as an immediate 'turn complete' trigger instead of waiting for
cache-ttl (3s grace) or idle timeout (60s).
- status-watcher.js: _onFileChange checks if .lock file exists on event.
If deleted -> emits 'session-lock-released'. Added triggerIdle() which
cancels all timers and emits session-idle immediately.
- watcher-manager.js: wires session-lock-released/session-lock-released-path
to watcher.triggerIdle() for instant completion.
Combined with lock-created trigger (previous commit), the full lifecycle is:
User sends message
-> .jsonl.lock created -> status box appears immediately
-> JSONL writes -> status box updates in real time
-> Gateway sends reply -> .jsonl.lock deleted -> status box marks done instantly
The gateway writes a .jsonl.lock file the instant it starts processing a
user message — before any JSONL content is written. This is the earliest
possible infrastructure signal that a session became active.
Previously the status box only appeared after the first JSONL write (first
assistant response token), meaning turns with no tool calls showed nothing.
Changes:
- status-watcher.js: _onFileChange handles .jsonl.lock events, emits
'session-lock' (known session) or 'session-lock-path' (unknown session)
- watcher-manager.js: wires session-lock/session-lock-path to clearCompleted
+ pollNow for immediate reactivation from lock file event
- session-monitor.js: findSessionByFile() looks up session key by transcript
path for lock events on sessions not yet in fileToSession map;
_getAgentIds() helper for directory enumeration
Result: status box appears the moment the gateway receives the user message,
not when the first reply token is written.
src/status-watcher.js:
- When lastOffset > fileSize (stale offset from compaction or previous session),
reset offset to current file end rather than 0.
Resetting to 0 caused re-parsing gigabytes of old content; resetting to fileSize
means we only read new bytes from this point forward. This was the root cause of
the status box receiving no updates — the offset was past EOF so every read
returned 0 bytes silently.
src/session-monitor.js:
- _knownSessions is now maintained incrementally instead of being replaced wholesale
at the end of every poll cycle.
- Previously: _knownSessions = currentSessions at end of _poll() meant forgetSession()
had no effect — the next poll immediately re-added the key as 'known' without firing
_onSessionAdded, silently swallowing the reactivation.
- Now: entries are added/updated individually, removals delete from the map directly.
forgetSession() + clearCompleted() + pollNow() now correctly triggers reactivation.
Verified: 3 consecutive 5s tests all show plugin KV updating with lines and timestamps.
plugin/server/store.go:
- CleanStaleSessions now handles last_update_ms=0 (pre-cleanup era orphans)
- Zero-timestamp sessions: mark active ones interrupted, delete non-active ones
- Previously these were silently skipped with 'continue', accumulating forever
src/status-watcher.js:
- removeSession() keeps fileToSession mapping as ghost entry ('\x00ghost:key')
- When ghost file changes, emits 'session-reactivate' immediately instead of
waiting up to 2s for the session-monitor poll cycle
- Ghost removed after first trigger to avoid repeated events
src/session-monitor.js:
- Added pollNow() for immediate poll without waiting for interval tick
- Reactivation check now uses sessions.json updatedAt vs completedAt timestamp
(pure infrastructure: two on-disk timestamps, no AI involvement)
src/watcher-manager.js:
- Wires session-reactivate event: clearCompleted() + pollNow() for instant re-detection
- New sessions now show up within ~100ms of first file change instead of 2s
Net result: status box appears reliably on every turn, clears 3s after reply,
zero orphan sessions accumulating in the KV store.
Problem: after a session completes, removeSession() deleted the file→session
mapping. When the next user message caused the JSONL to be written, fs.watch
fired but fileToSession returned undefined — silently dropped. Reactivation
only happened on the next session-monitor poll (up to 2s later), and by then
the watcher had missed the first lines of the new turn.
Fix:
- removeSession() keeps the file in fileToSession as a ghost marker
- fs.watch fires → ghost detected → emit 'session-reactivate'
- watcher-manager clears completedSessions cooldown + calls pollNow()
- session-monitor re-detects immediately with no poll lag
- Ghost removed after first fire (one-shot)
Also adds SessionMonitor.pollNow() for forced immediate polling.
Bug 1: session-monitor suppressed reactivation for 5min (file staleness check)
Fix: compare sessions.json updatedAt against completedAt timestamp instead.
If updatedAt > completedAt, a new gateway turn started — reactivate immediately.
Bug 2: watcher-manager passed stale saved offset on reactivation
The saved offset pointed near end-of-file from the prior session, so only
1-2 lines were read (usually just cache-ttl), triggering immediate fast-idle
with no content shown.
Fix: reactivated sessions always start from current file position (new content
only), same as brand-new sessions.
Result: after completing a turn, the next message correctly reactivates
the status box and streams tool calls/content in real time.
Previously completed sessions were suppressed for 5 minutes based on
JSONL file staleness. With fast-idle (cache-ttl detection), sessions
complete in ~3s — but the gateway immediately appends the next user
message, keeping the file 'fresh'. This blocked reactivation entirely.
Fix: compare sessions.json updatedAt against the completion timestamp.
If the gateway updated the session AFTER we marked it complete, a new
turn has started — reactivate immediately.
Pure infrastructure: timestamp comparison between two on-disk files.
No AI model state or memory involved.
Previously the daemon waited IDLE_TIMEOUT_S (60s) after the last file
change before marking a session complete. But the JSONL file is kept
open by the gateway indefinitely, so file inactivity was never reliable.
Fix: detect the 'openclaw.cache-ttl' custom record which the gateway
emits after every completed assistant turn. When pendingToolCalls == 0,
start a 3-second grace timer instead of the full 60s idle timeout.
Result: live status clears within ~3 seconds of the agent's final reply
instead of lingering for 60+ seconds (or indefinitely on active sessions).
Fixes: session stays 'active' long after work is done
- README: document all 5 fixes from Issue #5 (floating widget, RHS panel
refresh bug, browser auth fix, session cleanup goroutine, KV scan optimization)
- README: add full Mattermost Plugin section with build/deploy instructions,
manual deploy path for servers with plugin uploads disabled, auth model docs
- plugin/Makefile: build/package/deploy/health targets for production deployment
on any new OpenClaw+Mattermost server
Closes the documentation gap so any developer can deploy this from scratch.
Phase 1: Fix RHS panel to fetch existing sessions on mount
- Add initial API fetch in useAllStatusUpdates() hook
- Allow GET /sessions endpoint without shared secret auth
- RHS panel now shows sessions after page refresh
Phase 2: Floating widget component (registerRootComponent)
- New floating_widget.tsx with auto-show/hide behavior
- Draggable, collapsible to pulsing dot with session count
- Shows last 5 lines of most recent active session
- Position persisted to localStorage
- CSS styles using Mattermost theme variables
Phase 3: Session cleanup and KV optimization
- Add LastUpdateMs field to SessionData for staleness tracking
- Set LastUpdateMs on session create and update
- Add periodic cleanup goroutine (every 5 min)
- Stale active sessions (>30 min no update) marked interrupted
- Expired non-active sessions (>1 hr) deleted from KV
- Add ListAllSessions and keep ListActiveSessions as helper
- Add debug logging to daemon file polling
Closes#5
- Switched from registerAppBarComponent (not in MM 11.4 build) to
registerChannelHeaderButtonAction + registerRightHandSidebarComponent
- Added public/icon.svg for channel header button
- Fixed store dispatch for RHS toggle action
- Plugin deployment permissions fix (uid 2000)
Added a Right-Hand Sidebar (RHS) panel to the Mattermost plugin that
shows live agent activity in a dedicated, always-visible panel.
- New RHSPanel component with SessionCard views per active session
- registerAppBarComponent adds 'Agent Status' icon to toolbar
- Subscribes to WebSocket updates via global listener
- Shows active sessions with live elapsed time, tool calls, token count
- Shows recent completed sessions below active ones
- Responsive CSS matching Mattermost design system
The RHS panel solves the scroll-out-of-view problem: the status
dashboard stays visible regardless of chat scroll position.
Two bugs fixed:
1. Session monitor stale session bug: Sessions that were stale on first
poll got added to _knownSessions but never re-checked, even after
their transcript became active. Now stale sessions are tracked
separately in _staleSessions and re-checked on every poll cycle.
2. CLI live-status tool: create/update commands were creating plain text
posts without the custom_livestatus post type or plugin props. The
Mattermost webapp plugin only renders posts with type=custom_livestatus.
Now all CLI commands set the correct post type and livestatus props.
- Added completedBoxes map to track idle sessions and their post IDs
- On session reactivation, reuse existing post instead of creating new one
- Fixed variable scoping bug (saved -> savedState) in session-added handler
- Root cause: idle -> forgetSession -> re-detect -> new post -> repeat
This was creating 10+ duplicate status boxes per session per hour.
The main agent session uses key 'agent:main:main' which doesn't
contain a channel ID. The session monitor now falls back to reading
deliveryContext/lastTo from sessions.json and resolves 'user:XXXX'
format via the Mattermost direct channel API.
Fixes: status watcher not tracking the main agent's active transcript
- Add EnsureBotUser on plugin activate (fixes 'Unable to find user' error)
- Accept bot_user_id in create session request
- Await plugin health check before starting session monitor
(prevents race where sessions detect before plugin flag is set)
- Plugin now creates custom_livestatus posts with proper bot user
- Fix lint errors in plugin-client.js (unused var, empty block)
- Update README with plugin architecture and env vars
- Update STATE.json to v4.1 IMPLEMENTATION_COMPLETE
- All 96 tests passing, 0 lint errors
Plugin (Go server + React webapp):
- Custom post type 'custom_livestatus' with terminal-style rendering
- WebSocket broadcasts for real-time updates (no PUT, no '(edited)')
- KV store for session persistence across reconnects
- Shared secret auth for daemon-to-plugin communication
- Auto-scroll terminal with user scroll override
- Collapsible sub-agent sections
- Theme-compatible CSS (light/dark)
Daemon integration:
- PluginClient for structured data push to plugin
- Auto-detection: GET /health on startup + periodic re-check
- Graceful fallback: if plugin unavailable, uses REST API (PUT)
- Per-session mode tracking: sessions created via plugin stay on plugin
- Mid-session fallback: if plugin update fails, auto-switch to REST
Plugin deployed and active on Mattermost v11.4.0.
Code blocks collapse after ~4 lines in Mattermost, requiring click
to expand. Blockquotes (> prefix) never collapse and show all content
inline with a distinct left border.
- Tool calls: inline code formatting (backtick tool name)
- Thinking text: box drawing prefix for visual distinction
- Header: bold status + code agent name
- All lines visible without clicking to expand
- Removes flicker caused by delete+recreate pattern
- PUT updates modify post content in-place (smooth)
- Trade-off: Mattermost shows (edited) label, and PUT clears pin status
- Pin+PUT are incompatible in Mattermost API — every PUT clears is_pinned
- Fix pin API calls to use {} body instead of null
- Remove post-replaced event handler (no longer needed)
Added forgetSession() to SessionMonitor. When watcher marks a session
idle/done, it now clears the key from the monitor's known sessions map.
Next poll cycle re-detects the session if the transcript is still active,
creating a fresh status post.
- New sessions start from current file offset, not 0. Shows live
thinking from the moment of detection, not a backlog dump.
- Session poll reduced from 2s to 500ms for faster pickup.
- Auto-pin with null body (MM pin API quirk).
- Auto-pin status posts on creation, unpin on session completion
- Skip stale sessions (>5min since last transcript write)
- Parse OpenClaw JSONL format (type:message with nested role/content)
- Handle timestamp-prefixed transcript filenames
1. session-monitor: handle timestamp-prefixed transcript filenames
OpenClaw uses {ISO}_{sessionId}.jsonl — glob for *_{sessionId}.jsonl
when direct path doesn't exist.
2. session-monitor: skip stale sessions (>5min since last transcript write)
Prevents creating status boxes for every old session in sessions.json.
3. status-watcher: parse actual OpenClaw JSONL transcript format
Records are {type:'message', message:{role,content:[{type,name,...}]}}
not {type:'tool_call', name}. Now shows live tool calls with arguments
and assistant thinking text.
4. handler.js: fix module.exports for OpenClaw hook loader
Expects default export (function), not {handle: function}.
5. HOOK.md: add YAML frontmatter metadata for hook discovery.
- docs/v1-removal-checklist.md: exact sections to remove from 6 AGENTS.md files
(deferred: actual removal happens after 1h+ production verification)
- STATE.json: updated to IMPLEMENTATION_COMPLETE, phase 6, all test results,
v1RemovalStatus: DOCUMENTED_PENDING_PRODUCTION_VERIFICATION
- make check: clean
Phase 3 (Sub-Agent Support):
- session-monitor.js: sub-agents always passed through (inherit parent channel)
- watcher-manager.js enhancements:
- Pending sub-agent queue: child sessions that arrive before parent are queued
and processed when parent is registered (no dropped sub-agents)
- linkSubAgent(): extracted helper for clean parent-child linking
- Cascade completion: parent stays active until all children complete
- Sub-agents embedded in parent status post (no separate top-level post)
- status-formatter.js: recursive nested rendering at configurable depth
Integration tests - test/integration/sub-agent.test.js (9 tests):
3.1 Sub-agent detection via spawnedBy (monitor level)
3.2 Nested status rendering (depth indentation, multiple children, deep nesting)
3.3 Cascade completion (pending tool call tracking across sessions)
3.4 Sub-agent JSONL parsing (usage events, error tool results)
All 95 tests pass (59 unit + 36 integration). make check clean.
PROJ-035 Live Status v4 - implementation plan created by planner subagent.
Discovery findings documented in discoveries/README.md covering:
- JSONL transcript format (confirmed v3 schema)
- Session keying patterns (subagent spawnedBy linking)
- Hook events available (gateway:startup confirmed)
- Mattermost API (no edit time limit)
- Current v1 failure modes
Audit: 32/32 PASS, Simulation: READY