From f3296d136a78bc3c2fa1101e36778a05ae9eae21 Mon Sep 17 00:00:00 2001 From: Caret Date: Mon, 6 Apr 2026 12:43:18 +0000 Subject: [PATCH] =?UTF-8?q?research:=20Phase=200=20report=201=20=E2=80=94?= =?UTF-8?q?=20gitea-webhooks=20deep=20read?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../RESEARCH-01-gitea-webhooks-deep-read.md | 127 ++++++++++++++++++ 1 file changed, 127 insertions(+) create mode 100644 research/RESEARCH-01-gitea-webhooks-deep-read.md diff --git a/research/RESEARCH-01-gitea-webhooks-deep-read.md b/research/RESEARCH-01-gitea-webhooks-deep-read.md new file mode 100644 index 0000000..752f247 --- /dev/null +++ b/research/RESEARCH-01-gitea-webhooks-deep-read.md @@ -0,0 +1,127 @@ +# Research 01 — gitea-webhooks, workspace-ops, agent-reliability deep read + +**Subagent:** `afa92905872a43a9b` (Explore) +**Completed:** 2026-04-06 12:40 UTC +**Scope:** Architecture of the current openclaw gitea-webhooks pipeline and the two smaller reference repos. + +## Key architectural findings + +### Ingress path (the surprise) + +``` +Gitea → HTTPS POST https://slack.solio.tech/hooks/gitea + → nginx (TLS term, strips path, injects Bearer header) + → http://127.0.0.1:18789/hooks/agent (openclaw gateway) + → gitea-transform.js + → event router +``` + +**HMAC is NOT verified by the pipeline.** Gitea sends `X-Gitea-Signature` but the raw request body is not available to the transform layer, so real HMAC-SHA256 verification never happens. Authentication is layered differently: + +1. **TLS** via Let's Encrypt on nginx +2. **Nginx localhost ACL** on port 18789 (gateway only accessible from the host) +3. **Bearer token** injected by nginx (`Authorization: Bearer {OPENCLAW_HOOKS_TOKEN}`); gateway rejects 401 on mismatch +4. **Dedup cache** of `X-Gitea-Delivery` headers, 24h window, persisted at `/root/.openclaw/hooks/logs/dedup-cache.json` + +There IS an HMAC in the system, but it's for a completely different purpose: **spawn signatures**. When the `sol` account creates an `[IMPLEMENT]` issue, it embeds `` in the body. The transform re-computes HMAC-SHA256 over `repo|title|timestamp` with secret at `/root/.openclaw/hooks/spawn-secret`, verifies against the embedded value, and rejects stale or invalid signatures (2h TTL). This lets sol trigger SPAWN_MANAGER directly (bypassing gitea-worker), while keeping unauthorized creators locked out. + +**Implication for Caret migration:** I inherit the same security model (bearer + ACL) OR I add real HMAC to my own listener. If I run a separate listener and want to share the Gitea webhook config, I need to cooperate with nginx or stand up a second endpoint. If I terminate events myself, I can do proper HMAC-SHA256 of the raw body using the `X-Gitea-Signature` header. + +### Transform phases (gitea-transform.js v13.0) + +1. **Validation gates** — event type filter, dedup, sender validation, `clawbot` loop prevention, agent echo suppression (`` HTML comment), rate limiting (5 concurrent). +2. **Trust level detection** — Rooh (ID 29) → owner/main, collaborators → contributor/gitea-worker, unknown → readonly. +3. **Session lock check** — file-based locks at `/root/.openclaw/hooks/locks/{owner}-{repo}-{issue}`. 2h TTL. Closed issues transition to IS_DONE with 5min grace. +4. **Event routing** (the heart of it): + - `repository` / `create` → `execSync post-repo-audit.sh` — pure script, zero tokens + - `push` → skip main/master, route to main or gitea-worker with ACTION: RUN_CI + - `issues.opened` with `[IMPLEMENT]` title → verify spawn sig (if sol) → `precomputeSpawnParams` → `asyncDispatchToSpawner` → returns SPAWN_MANAGER directive + - `issue_comment` with approval words from Rooh → acquire lock → precompute → dispatch → EXECUTE_PLAN +5. **Logging & audit** — `logs/audit.jsonl`, `logs/webhook-events-YYYY-MM.jsonl`, `logs/incidents.jsonl`. + +### Tools fan-out (scripts invoked) + +| Script | Trigger | Role | +|------------------------------|----------------------------------|-----------------------------------------------------------------------------| +| `post-repo-audit.sh` | repository/create (execSync) | Add Rooh as admin collaborator, ensure webhook exists. Pure script, ~seconds. | +| `audit-webhooks.sh` | heartbeat (15min) | Verify all webhooks exist and are healthy. `--fix` recreates missing hooks. | +| `audit-repo-policies.sh` | heartbeat (6h) + manual | Enforce required files (Makefile, .editorconfig, etc.) from sol/repo-policies. | +| `secret-scan.sh` | CI (`make check`) | Find private keys and high-entropy tokens. Allowlist at `.secret-scan-allowlist`. | +| `create-implement-issue.sh` | manual / agent | Create signed `[IMPLEMENT]` issue with HMAC spawn signature. | +| `check-implement-orphans.sh` | heartbeat (15min) | Detect stale pending spawn files, inactive Managers, orphaned Workers. | +| `spawn-manager.sh` | agent-called | Generate Manager spawn JSON from issue body. Creates project workspace. | + +### Openclaw hard couplings I must replace + +| Coupling | What I'll need | Difficulty | +|-----------------------------------------------------------------|--------------------------------------------------------|------------| +| `sessions_spawn` (openclaw's subagent spawn primitive) | Channels plugin or HTTP-triggered session creation | **Hard** | +| `wakeMode: now / next-heartbeat` | My own dispatch queue | Medium | +| Agent IDs (`main`, `gitea-worker`, `spawner`) | My own routing scheme (just `caret` + role tags) | Easy | +| Bearer token auth (`OPENCLAW_HOOKS_TOKEN`) | My own bearer token, shared with nginx or own listener | Easy | +| Workspace path (`/root/.openclaw/workspace/projects/PROJ-XXX-*`) | My own workspace path; update all scripts | Easy | +| Session lock dir (`/root/.openclaw/hooks/locks/`) | My own lock dir | Easy | +| Queue inbox (`/root/.openclaw/hooks/queue-inbox/`) | My own queue + daemon | Easy | +| Mattermost incident posting | Swap to tg-stream or keep calling Mattermost | Easy | +| Gitea API wrapper | Keep as-is (stateless curl) | Trivial | + +### Surprising behaviors / gotchas + +1. Session locks have a **2h TTL**, different from archive timeout. Long-running Managers can lose their lock and trigger a race. +2. Closed-while-locked issues transition to **IS_DONE with 5min grace**, not immediate release. +3. Spawn signatures expire after **2h** regardless of issue state — approving a stale issue rejects the spawn. +4. Dedup cache is persisted to disk; crash recovery repopulates within 24h but replays may occur during that window. +5. Gitea HMAC not verified — bearer + nginx ACL are the only layers. +6. The `sol` account is treated as a contributor UNLESS its issue has a valid spawn signature. Approval words from sol are ignored. +7. Rate limiting is **per agent ID** (5 concurrent per agent), tracked in `active-sessions.json`. +8. `asyncDispatchToSpawner` is fire-and-forget HTTP POST — no error handling if the spawner is down. +9. `precomputeSpawnParams` calls `spawn-manager.sh` via `execSync` with a 30s timeout in the transform's hot path. Timeout = text-directive fallback. +10. Manager death before `STATE.json` checkpoint can cause duplicate respawns (flagged in agent-reliability). +11. Lock expiry cleanup is lazy — expired files accumulate until someone calls `check()`. +12. Queue daemon re-verifies spawn signatures using the stored timestamp, moving the TTL check off the hot path. + +### HMAC recipe I can steal (for a proper replacement) + +Gitea sends raw body + `X-Gitea-Signature: ` computed with the webhook secret over the raw JSON body. Node.js verification: + +```javascript +const crypto = require('node:crypto'); +function verifyGiteaSignature(rawBody, signatureHeader, secret) { + if (!signatureHeader) return false; + const expected = crypto.createHmac('sha256', secret).update(rawBody).digest('hex'); + // timing-safe compare + const sigBuf = Buffer.from(signatureHeader, 'hex'); + const expBuf = Buffer.from(expected, 'hex'); + if (sigBuf.length !== expBuf.length) return false; + return crypto.timingSafeEqual(sigBuf, expBuf); +} +``` + +The catch: the listener must read the **raw body before JSON parsing**. Middleware that parses JSON first strips the raw bytes and makes verification impossible. This is why openclaw's gateway never verified HMAC — it handed already-parsed JSON to the transform. My replacement listener must keep the raw body around (store it on the request object) until after verification. + +### Test infrastructure I can reuse + +- `tests/test-transform.js` — full unit tests for routing, trust levels, locks, rate limiting (Node.js, 50+ cases, mocks Gitea API). Directly portable. +- `tests/test-lifecycle.js` — Manager/Worker lifecycle including spawn signatures. +- `tests/test-spawn-manager.sh` — project ID generation, workspace creation (isolated, no API). +- `tools/test-transform.sh` — manual smoke test against a live gateway. +- Test fixtures in `tests/fixtures/` — mock Gitea payloads for events. + +All are portable. I can copy them into my migration repo with their license intact and adapt paths. + +## Answers to the questions that defined this research + +**Q: Can I replace the deterministic side cleanly?** Yes. Every pure script is a standalone bash/node script with minimal coupling to openclaw. Main effort is rewriting the event router (`gitea-transform.js` equivalent) and the ingress path. + +**Q: Can I replace the agent-spawn side cleanly?** Hard. `sessions_spawn` is a core openclaw primitive with session state, model selection, tool allowlists, wakeMode semantics, and process lifecycle management. The replacement needs a Channels plugin or a long-running worker pool; neither is a weekend project. + +**Q: What's the minimum viable Caret pipeline?** An HTTP listener with proper HMAC verification, a router with the same event → script fan-out as gitea-transform.js, a file-based session lock manager, a structured log, and ONE script per event type. That's ~600-800 lines in one bun file, doable in Phase 2. + +**Q: What should I NOT rebuild?** The tools themselves — `post-repo-audit.sh`, `audit-repo-policies.sh`, `spawn-manager.sh`, etc. Copy them, strip the openclaw path prefixes, ship them in `/host/root/.caret/tools/`. Don't reinvent the workspace/PROJ-XXX scheme unless Rooh explicitly asks — it's a working system with its own conventions I'd be recreating poorly. + +## Next reads pending + +- **Subagent `ae5ca38f70b1e9626`**: openclaw gateway internals — how does `sessions_spawn` actually fire a new Claude session? Where is the tool allowlist enforced? What does the cron/heartbeat scheduler look like? +- **Subagent `abf0cb0928d823a0b`**: live state audit — already partial results visible: four webhooks registered across sol/* repos, all pointing to `https://slack.solio.tech/hooks/gitea`, all with `Secret: NOT SET`. That's the smoking gun for "HMAC is not in the path right now." + +When both finish, synthesize into `ARCHITECTURE.md` in the migration repo and move to Phase 1 design.