Live Status: fix false reactivation thrash while preserving fresh bottom-of-chat boxes and correct Mattermost routing #11
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
Live Status is close, but it still has a correctness bug in session lifecycle handling that makes the user-visible behavior unstable in Mattermost.
The current failure mode is not primarily process startup and not a full daemon crash. The live-status daemon process often remains alive, but the UI behavior becomes incorrect because completed sessions are being reactivated by stale transcript file events.
This issue needs to be handled as a routing + lifecycle/state-machine correctness issue, not as a small debounce bug.
What we want
We want Live Status to satisfy all of these constraints at the same time:
Correct Mattermost routing
Fresh box at the bottom of chat on real reactivation
No false reactivation after completion
fs.watchevents must not resurrect the session and create a new bottom box.Single startup owner
live-status-daemonplugin service, not multiple startup mechanisms competing or drifting.Reliable startup and runtime behavior
Symptoms observed
Observed behavior from the user side:
Observed behavior from logs:
Representative log pattern:
Lock file deleted — turn complete, marking session done immediatelySession complete via pluginfs.watch: file change on completed session — triggering reactivationGhost watch triggered reactivationDeleted old buried status box on reactivationCreated status box via pluginThis means the process is often alive but logically wrong.
Important product requirement clarified during diagnosis
A fresh status box at the bottom of the thread/channel is required.
That means:
Any proposed fix that preserves a single box forever is the wrong UX for this project.
Current code/design understanding
From the repo, the current design already includes routing-aware logic:
session-monitor.jschannelIdrootPostIdfrom:thread:<rootPostId>:direct:<userId>sessions via Mattermost APIwatcher-manager.jssessionKeychannelIdandrootPostIdstatus-watcher.jsSo the current system already understands routing and the fresh-bottom-box UX goal, but the session lifecycle/reactivation boundary is still not strong enough.
Root problem as currently understood
The likely root problem is:
session lifecycle truth is split too loosely between lock-file events and transcript file changes, allowing completed sessions to be resurrected by stale/ghost file activity.
In practice:
fs.watchevents still arriveThat creates complete → reactivate → recreate thrash.
Proposed solution direction found during diagnosis
Core principle
Keep the fresh-bottom-box UX, but tighten what qualifies as a real reactivation.
Proposed architecture direction
Use a single lifecycle owner for the process
live-status-daemonplugin service should be the one authoritative startup/runtime owner.Use an explicit session state machine
inactiveactivecompletingcompleted_guardedcompleted_guardedshould require a real activation signal.Preserve fresh-bottom-box behavior only for real reactivation
Tighten reactivation signals
Downgrade transcript file changes to content updates, not lifecycle authority
Keep routing correctness as a hard invariant
What the manager should do
Please review this issue as a manager/research task, not as an implementation task yet.
Requested output
Provide an implementation plan that:
Specific questions for the manager to answer
fs.watchever be allowed to trigger reactivation on a completed session?completed cooldown/ ghost-watch design fundamentally flawed and worth replacing?Non-goal for this issue
Do not implement yet.
This issue is for:
Acceptance criteria for the future implementation
The eventual implementation should satisfy all of these:
Repo areas likely involved
Based on current reading, likely areas include:
src/watcher-manager.jssrc/status-watcher.jssrc/session-monitor.jsWhy this should be planned, not patched ad hoc
This bug touches:
So a one-off patch is likely to create regressions unless the lifecycle model is made explicit first.