4 Commits

Author SHA1 Message Date
ccf08cfb67 Merge pull request 'docs: update poller to dispatcher architecture (closes #4)' (#5) from fix/update-poller-docs into main
All checks were successful
check / check (push) Successful in 9s
2026-02-28 16:31:43 +01:00
clawbot
0284ea63c0 docs: update poller to dispatcher architecture (closes #4)
All checks were successful
check / check (push) Successful in 11s
Replace flag-file + heartbeat approach with the production dispatcher
pattern: poller triages notifications and spawns isolated agents
directly via openclaw cron. Adds assignment scan for self-created
issues. Response time ~15-60s instead of ~30 min.
2026-02-28 06:29:32 -08:00
f3e48c6cd4 Merge pull request 'Expand sensitive output routing and make inbox references conditional' (#3) from fix/pii-and-conditional-email into main
All checks were successful
check / check (push) Successful in 9s
Reviewed-on: #3
2026-02-28 15:22:36 +01:00
clawbot
c0d345e767 expand PII routing to cover secrets, credentials, and operational info; make email/inbox references conditional
All checks were successful
check / check (push) Successful in 12s
- Rename 'PII Output Routing' → 'Sensitive Output Routing' throughout
- Expand scope to include secrets, credentials, API keys, flight numbers,
  locations, travel plans, medical info
- Replace hardcoded 'Emails' heartbeat check with conditional language
  ('Notifications — whatever inbox sources you've integrated')
- Remove 'email' from heartbeat-state.json example
- Update cross-references in SETUP_CHECKLIST.md
2026-02-28 03:40:13 -08:00
2 changed files with 221 additions and 97 deletions

View File

@@ -189,50 +189,68 @@ arrive instantly.
Setup: add a webhook on each Gitea repo (or use an organization-level webhook) Setup: add a webhook on each Gitea repo (or use an organization-level webhook)
pointing to `https://your-openclaw-host/hooks/gitea`. OpenClaw handles the rest. pointing to `https://your-openclaw-host/hooks/gitea`. OpenClaw handles the rest.
#### Option B: Notification Poller (Local Machine Behind NAT) #### Option B: Notification Poller + Dispatcher (Local Machine Behind NAT)
If your OpenClaw runs on a dedicated local machine behind NAT (like a home Mac If your OpenClaw runs on a dedicated local machine behind NAT (like a home Mac
or Linux workstation), Gitea can't reach it directly. This is our setup — or Linux workstation), Gitea can't reach it directly. This is our setup —
OpenClaw runs on a Mac Studio on a home LAN. OpenClaw runs on a Mac Studio on a home LAN.
The solution: a lightweight Python script that polls the Gitea notifications API The solution: a Python script that both polls and dispatches. It polls the Gitea
every few seconds. When new notifications appear, it writes a flag file that the notifications API every 15 seconds, triages each notification (checking
agent checks during heartbeats. assignment and @-mentions), marks them as read, and spawns one isolated agent
session per actionable item via `openclaw cron add --session isolated`.
The poller also runs a secondary **assignment scan** every 2 minutes, checking
all watched repos for open issues/PRs assigned to the bot that were recently
updated and still need a response. This catches cases where notifications aren't
generated (e.g. self-assignment, API-created issues).
Key design decisions: Key design decisions:
- **The poller never marks notifications as read.** The agent does that after - **The poller IS the dispatcher.** No flag files, no heartbeat dependency. The
processing. This prevents lost notifications if the agent fails to process. poller triages notifications and spawns agents directly.
- **It tracks notification IDs, not counts.** Only fires on genuinely new - **Marks notifications as read immediately.** Each notification is marked read
notifications, not re-reads of existing ones. as it's processed, preventing re-dispatch on the next poll.
- **Flag file instead of wake events.** We initially used OpenClaw's - **One agent per issue.** Each spawned agent gets a `SCOPE` instruction
`/hooks/wake` endpoint, but wake events target the main (DM) session — any limiting it to one specific issue/PR. Agents post results as Gitea comments,
model response during processing leaked to DM as a notification. The flag file not DMs.
approach is processed during heartbeats, where output routing is controlled. - **Dedup tracking.** An in-memory `dispatched_issues` set prevents spawning
multiple agents for the same issue within one poller lifetime.
- **`--no-deliver` instead of `--announce`.** Agents report via Gitea API
directly. The `--announce` flag on isolated sessions had delivery failures.
- **Assignment scan filters by recency.** Only issues updated in the last 5
minutes are considered, preventing re-dispatch for stale assigned issues.
- **Zero dependencies.** Just Python stdlib. Runs anywhere. - **Zero dependencies.** Just Python stdlib. Runs anywhere.
Tradeoff: notifications are processed at heartbeat cadence (~30 min) instead of Response time: ~1560s from notification to agent comment (vs ~30 min with the
realtime. For code review and issue triage, this is fine. old heartbeat approach).
```python ```python
#!/usr/bin/env python3 #!/usr/bin/env python3
""" """
Gitea notification poller (flag-file approach). Gitea notification poller + dispatcher.
Polls for unread notifications and writes a flag file when new ones
appear. The agent checks this flag during heartbeats and processes Two polling loops:
notifications via the Gitea API directly. 1. Notification-based: detects new notifications (mentions, assignments by
other users) and dispatches agents for actionable ones.
2. Assignment-based: periodically checks for open issues/PRs assigned to
the bot that have no recent bot comment. Catches cases where
notifications aren't generated (e.g. self-assignment, API-created issues).
Required env vars: Required env vars:
GITEA_URL - Gitea instance URL GITEA_URL - Gitea instance URL
GITEA_TOKEN - Gitea API token GITEA_TOKEN - Gitea API token
Optional env vars: Optional env vars:
FLAG_PATH - Path to flag file (default: workspace/memory/gitea-notify-flag) POLL_DELAY - Delay between polls in seconds (default: 15)
POLL_DELAY - Delay between polls in seconds (default: 5) COOLDOWN - Minimum seconds between dispatches (default: 30)
ASSIGNMENT_INTERVAL - Seconds between assignment scans (default: 120)
OPENCLAW_BIN - Path to openclaw binary
""" """
import json import json
import os import os
import subprocess
import sys import sys
import time import time
import urllib.request import urllib.request
@@ -240,62 +258,158 @@ import urllib.error
GITEA_URL = os.environ.get("GITEA_URL", "").rstrip("/") GITEA_URL = os.environ.get("GITEA_URL", "").rstrip("/")
GITEA_TOKEN = os.environ.get("GITEA_TOKEN", "") GITEA_TOKEN = os.environ.get("GITEA_TOKEN", "")
POLL_DELAY = int(os.environ.get("POLL_DELAY", "5")) POLL_DELAY = int(os.environ.get("POLL_DELAY", "15"))
FLAG_PATH = os.environ.get( COOLDOWN = int(os.environ.get("COOLDOWN", "30"))
"FLAG_PATH", ASSIGNMENT_INTERVAL = int(os.environ.get("ASSIGNMENT_INTERVAL", "120"))
os.path.join( OPENCLAW_BIN = os.environ.get("OPENCLAW_BIN", "/opt/homebrew/bin/openclaw")
os.path.dirname(os.path.dirname(os.path.abspath(__file__))), BOT_USER = "clawbot" # Change to your bot's Gitea username
"memory",
"gitea-notify-flag", # Repos to scan for assigned issues
), WATCHED_REPOS = [
) # "org/repo1",
# "org/repo2",
]
# Track dispatched issues to prevent duplicates
dispatched_issues = set()
def check_config(): def gitea_api(method, path, data=None):
if not GITEA_URL or not GITEA_TOKEN: url = f"{GITEA_URL}/api/v1{path}"
print("ERROR: GITEA_URL and GITEA_TOKEN required", file=sys.stderr) body = json.dumps(data).encode() if data else None
sys.exit(1) headers = {"Authorization": f"token {GITEA_TOKEN}"}
if body:
headers["Content-Type"] = "application/json"
req = urllib.request.Request(url, headers=headers, method=method, data=body)
try:
with urllib.request.urlopen(req, timeout=15) as resp:
raw = resp.read()
return json.loads(raw) if raw else None
except Exception as e:
print(f"WARN: {method} {path}: {e}", file=sys.stderr, flush=True)
return None
def gitea_unread_ids(): def needs_bot_response(repo_full, issue_number):
req = urllib.request.Request( """True if the bot is NOT the author of the most recent comment."""
f"{GITEA_URL}/api/v1/notifications?status-types=unread", comments = gitea_api("GET", f"/repos/{repo_full}/issues/{issue_number}/comments")
headers={"Authorization": f"token {GITEA_TOKEN}"}, if comments and len(comments) > 0:
if comments[-1].get("user", {}).get("login") == BOT_USER:
return False
return True
def is_actionable(notif):
"""Returns (actionable, reason, issue_number)."""
subject = notif.get("subject", {})
repo = notif.get("repository", {})
repo_full = repo.get("full_name", "")
url = subject.get("url", "")
number = url.rstrip("/").split("/")[-1] if url else ""
if not number or not number.isdigit():
return False, "no issue number", None
issue = gitea_api("GET", f"/repos/{repo_full}/issues/{number}")
if not issue:
return False, "couldn't fetch issue", number
assignees = [a.get("login") for a in (issue.get("assignees") or [])]
if BOT_USER in assignees:
if needs_bot_response(repo_full, number):
return True, f"assigned to {BOT_USER}", number
return False, "assigned but already responded", number
issue_body = issue.get("body", "") or ""
if f"@{BOT_USER}" in issue_body and issue.get("user", {}).get("login") != BOT_USER:
if needs_bot_response(repo_full, number):
return True, f"@-mentioned in body", number
comments = gitea_api("GET", f"/repos/{repo_full}/issues/{number}/comments")
if comments:
last = comments[-1]
if last.get("user", {}).get("login") == BOT_USER:
return False, "own comment is latest", number
if f"@{BOT_USER}" in (last.get("body") or ""):
return True, f"@-mentioned in comment", number
return False, "not mentioned or assigned", number
def spawn_agent(repo_full, issue_number, title, subject_type, reason):
dispatch_key = f"{repo_full}#{issue_number}"
if dispatch_key in dispatched_issues:
return
dispatched_issues.add(dispatch_key)
repo_short = repo_full.split("/")[-1]
job_name = f"gitea-{repo_short}-{issue_number}-{int(time.time())}"
msg = (
f"Gitea: {reason} on {subject_type} #{issue_number} "
f"'{title}' in {repo_full}.\n"
f"API: {GITEA_URL}/api/v1 | Token: {GITEA_TOKEN}\n"
f"SCOPE: Only {subject_type} #{issue_number} in {repo_full}.\n"
f"Read all comments, do the work, post results as Gitea comments."
) )
try: try:
with urllib.request.urlopen(req, timeout=10) as resp: subprocess.run(
return {n["id"] for n in json.loads(resp.read())} [OPENCLAW_BIN, "cron", "add",
"--name", job_name, "--at", "1s",
"--message", msg, "--delete-after-run",
"--session", "isolated", "--no-deliver",
"--thinking", "low", "--timeout-seconds", "300"],
capture_output=True, text=True, timeout=15,
)
except Exception as e: except Exception as e:
print(f"WARN: Gitea API failed: {e}", file=sys.stderr, flush=True) print(f"Spawn error: {e}", file=sys.stderr, flush=True)
return set() dispatched_issues.discard(dispatch_key)
def write_flag(count):
os.makedirs(os.path.dirname(FLAG_PATH), exist_ok=True)
with open(FLAG_PATH, "w") as f:
f.write(json.dumps({
"ts": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
"count": count,
}))
def main(): def main():
check_config() print(f"Poller started (poll={POLL_DELAY}s, cooldown={COOLDOWN}s)", flush=True)
print(f"Gitea poller started (delay={POLL_DELAY}s, flag={FLAG_PATH})", flush=True) seen_ids = set(n["id"] for n in (gitea_api("GET", "/notifications?status-types=unread") or []))
last_seen_ids = gitea_unread_ids() last_dispatch = 0
print(f"Initial unread: {len(last_seen_ids)}", flush=True) last_assign_scan = 0
while True: while True:
time.sleep(POLL_DELAY) time.sleep(POLL_DELAY)
current_ids = gitea_unread_ids() now = time.time()
new_ids = current_ids - last_seen_ids
if not new_ids: # Notification polling
last_seen_ids = current_ids notifs = gitea_api("GET", "/notifications?status-types=unread") or []
current_ids = {n["id"] for n in notifs}
new_ids = current_ids - seen_ids
if new_ids and now - last_dispatch >= COOLDOWN:
for n in [n for n in notifs if n["id"] in new_ids]:
nid = n.get("id")
if nid:
gitea_api("PATCH", f"/notifications/threads/{nid}")
is_act, reason, num = is_actionable(n)
if is_act:
repo = n["repository"]["full_name"]
title = n["subject"]["title"][:60]
stype = n["subject"].get("type", "").lower()
spawn_agent(repo, num, title, stype, reason)
last_dispatch = now
seen_ids = current_ids
# Assignment scan (less frequent)
if now - last_assign_scan >= ASSIGNMENT_INTERVAL:
for repo in WATCHED_REPOS:
for itype in ["issues", "pulls"]:
items = gitea_api("GET",
f"/repos/{repo}/issues?state=open&type={itype}"
f"&assignee={BOT_USER}&sort=updated&limit=10") or []
for item in items:
num = str(item["number"])
if f"{repo}#{num}" in dispatched_issues:
continue continue
ts = time.strftime("%H:%M:%S") # Only recently updated items (5 min)
print(f"[{ts}] {len(new_ids)} new ({len(current_ids)} total), flag written", flush=True) # ... add is_recently_updated() check here
write_flag(len(new_ids)) if needs_bot_response(repo, num):
last_seen_ids = current_ids spawn_agent(repo, num, item["title"][:60],
"pull" if itype == "pulls" else "issue",
f"assigned to {BOT_USER}")
last_assign_scan = now
if __name__ == "__main__": if __name__ == "__main__":
@@ -341,13 +455,15 @@ This applies to everything: project rules ("no mocks in tests"), workflow
preferences ("fewer PRs, don't over-split"), corrections, new policies. preferences ("fewer PRs, don't over-split"), corrections, new policies.
Immediate write to the daily file, and to MEMORY.md if it's a standing rule. Immediate write to the daily file, and to MEMORY.md if it's a standing rule.
### PII-Aware Output Routing ### Sensitive Output Routing
A lesson learned the hard way: **the audience determines what you can say, not A lesson learned the hard way: **the audience determines what you can say, not
who asked.** If the human asks for a medication status report in a group who asked.** If the human asks for a medication status report in a group
channel, the agent can't just dump it there — other people can read it. The channel, the agent can't just dump it there — other people can read it. The
rule: if the output would contain PII and the channel isn't private, redirect to rule: if the output would contain sensitive information (PII, secrets,
DM and reply in-channel with "sent privately." credentials, API keys, operational details like flight numbers, locations,
travel plans, medical info, etc.) and the channel isn't private, redirect to DM
and reply in-channel with "sent privately."
This is enforced at multiple levels: This is enforced at multiple levels:
@@ -378,7 +494,7 @@ The heartbeat handles:
- Periodic memory maintenance - Periodic memory maintenance
State tracking in `memory/heartbeat-state.json` prevents redundant checks (e.g., State tracking in `memory/heartbeat-state.json` prevents redundant checks (e.g.,
don't re-check email if you checked 10 minutes ago). don't re-check notifications if you checked 10 minutes ago).
The key output rule: heartbeats should either be `HEARTBEAT_OK` (nothing to do) The key output rule: heartbeats should either be `HEARTBEAT_OK` (nothing to do)
or a direct alert. Work narration goes to a designated status channel, never to or a direct alert. Work narration goes to a designated status channel, never to
@@ -1390,7 +1506,8 @@ stay quiet.
## Inbox Check (PRIORITY) ## Inbox Check (PRIORITY)
(check notifications, issues, emails — whatever applies) (check whatever notification sources apply to your setup — e.g. Gitea
notifications, emails, issue trackers)
## Flight Prep Blocks (daily) ## Flight Prep Blocks (daily)
@@ -1424,10 +1541,9 @@ Never send internal thinking or status narration to user's DM. Output should be:
```json ```json
{ {
"lastChecks": { "lastChecks": {
"email": 1703275200, "gitea": 1703280000,
"calendar": 1703260800, "calendar": 1703260800,
"weather": null, "weather": null
"gitea": 1703280000
}, },
"lastWeeklyDocsReview": "2026-02-24" "lastWeeklyDocsReview": "2026-02-24"
} }
@@ -1596,21 +1712,24 @@ Never lose a rule or preference your human states:
--- ---
## PII Output Routing — Audience-Aware Responses ## Sensitive Output Routing — Audience-Aware Responses
A critical security pattern: **the audience determines what you can say, not who A critical security pattern: **the audience determines what you can say, not who
asked.** If your human asks for a sitrep (or any PII-containing info) in a group asked.** If your human asks for a sitrep (or any sensitive info) in a group
channel, you can't just dump it there — other people can read it. channel, you can't just dump it there — other people can read it.
### AGENTS.md / checklist prompt: ### AGENTS.md / checklist prompt:
```markdown ```markdown
## PII Output Routing (CRITICAL) ## Sensitive Output Routing (CRITICAL)
- NEVER output PII in any non-private channel, even if your human asks for it - NEVER output sensitive information in any non-private channel, even if your
- If a request would produce PII (medication status, travel details, financial human asks for it
info, etc.) in a shared channel: send the response via DM instead, and reply - This includes: PII, secrets, credentials, API keys, and sensitive operational
in-channel with "sent privately" information (flight numbers/times/dates, locations, travel plans, medical
info, financial details, etc.)
- If a request would produce any of the above in a shared channel: send the
response via DM instead, and reply in-channel with "sent privately"
- The rule is: the audience determines what you can say, not who asked - The rule is: the audience determines what you can say, not who asked
- This applies to: group chats, public issue trackers, shared Mattermost - This applies to: group chats, public issue trackers, shared Mattermost
channels, Discord servers — anywhere that isn't a 1:1 DM channels, Discord servers — anywhere that isn't a 1:1 DM
@@ -1619,10 +1738,10 @@ channel, you can't just dump it there — other people can read it.
### Why this matters: ### Why this matters:
This is a real failure mode. If someone asks "sitrep" in a group channel and you This is a real failure mode. If someone asks "sitrep" in a group channel and you
respond with medication names, partner details, travel dates, and hotel names respond with medication names, partner details, travel dates, hotel names, or
you just leaked all of that to everyone in the channel. The human asking is API credentials — you just leaked all of that to everyone in the channel. The
authorized to see it; the channel audience is not. Always check WHERE you're human asking is authorized to see it; the channel audience is not. Always check
responding, not just WHO asked. WHERE you're responding, not just WHO asked.
--- ---

View File

@@ -273,7 +273,8 @@ poll. Structure it like this:
## Checks (rotate through these, 2-4 times per day) ## Checks (rotate through these, 2-4 times per day)
- Emails — any urgent unread messages? - Notifications — any unread items? (Gitea notifications, emails, or
whatever inbox sources you've integrated)
- Calendar — upcoming events in next 24-48h? - Calendar — upcoming events in next 24-48h?
- Open issues/PRs — anything assigned to me? - Open issues/PRs — anything assigned to me?
- Workspace sync — any uncommitted changes to push? - Workspace sync — any uncommitted changes to push?
@@ -336,32 +337,36 @@ Then add a reference to this checklist in the MEMORY.md checklist index.
Reference: Reference:
https://git.eeqj.de/sneak/clawpub/raw/branch/main/OPENCLAW_TRICKS.md https://git.eeqj.de/sneak/clawpub/raw/branch/main/OPENCLAW_TRICKS.md
(see "PII Output Routing" and "Checklists Over Prose") (see "Sensitive Output Routing" and "Checklists Over Prose")
``` ```
### 5.2 PII output routing ### 5.2 Sensitive output routing
Prevents leaking private info in shared channels. Paste this to your agent: Prevents leaking private info, secrets, and operational details in shared
channels. Paste this to your agent:
``` ```
Add the following warning banner near the TOP of AGENTS.md (before the Add the following warning banner near the TOP of AGENTS.md (before the
session startup section): session startup section):
**⚠️ NEVER output PII in non-private channels.** If asked for **⚠️ NEVER output sensitive information in non-private channels.** This
PII-containing info (medical, financial, personal) in a shared channel, includes PII, secrets, credentials, API keys, and sensitive operational
send via DM to your human instead. information (flight numbers/times/dates, locations, travel plans,
medical info, etc.). If asked for any of this in a shared channel, send
via DM to your human instead.
Also add a PII section to memory/checklist-messaging.md: Also add a sensitive-info section to memory/checklist-messaging.md:
## PII Check (before every message in shared channels) ## Sensitive Info Check (before every message in shared channels)
1. Contains names, addresses, medical info, financial info? → DM only 1. Contains PII (names, addresses, medical info, financial info)? → DM only
2. Contains login credentials or tokens? → NEVER send, period 2. Contains secrets, credentials, API keys, or tokens? → NEVER send, period
3. When in doubt → send via DM 3. Contains operational details (flight numbers, travel plans, locations)? → DM only
4. When in doubt → send via DM
Reference: Reference:
https://git.eeqj.de/sneak/clawpub/raw/branch/main/OPENCLAW_TRICKS.md https://git.eeqj.de/sneak/clawpub/raw/branch/main/OPENCLAW_TRICKS.md
(see "PII-Aware Output Routing") (see "Sensitive Output Routing")
``` ```
### 5.3 Additional checklists ### 5.3 Additional checklists