# Research 01 — gitea-webhooks, workspace-ops, agent-reliability deep read **Subagent:** `afa92905872a43a9b` (Explore) **Completed:** 2026-04-06 12:40 UTC **Scope:** Architecture of the current openclaw gitea-webhooks pipeline and the two smaller reference repos. ## Key architectural findings ### Ingress path (the surprise) ``` Gitea → HTTPS POST https://slack.solio.tech/hooks/gitea → nginx (TLS term, strips path, injects Bearer header) → http://127.0.0.1:18789/hooks/agent (openclaw gateway) → gitea-transform.js → event router ``` **HMAC is NOT verified by the pipeline.** Gitea sends `X-Gitea-Signature` but the raw request body is not available to the transform layer, so real HMAC-SHA256 verification never happens. Authentication is layered differently: 1. **TLS** via Let's Encrypt on nginx 2. **Nginx localhost ACL** on port 18789 (gateway only accessible from the host) 3. **Bearer token** injected by nginx (`Authorization: Bearer {OPENCLAW_HOOKS_TOKEN}`); gateway rejects 401 on mismatch 4. **Dedup cache** of `X-Gitea-Delivery` headers, 24h window, persisted at `/root/.openclaw/hooks/logs/dedup-cache.json` There IS an HMAC in the system, but it's for a completely different purpose: **spawn signatures**. When the `sol` account creates an `[IMPLEMENT]` issue, it embeds `` in the body. The transform re-computes HMAC-SHA256 over `repo|title|timestamp` with secret at `/root/.openclaw/hooks/spawn-secret`, verifies against the embedded value, and rejects stale or invalid signatures (2h TTL). This lets sol trigger SPAWN_MANAGER directly (bypassing gitea-worker), while keeping unauthorized creators locked out. **Implication for Caret migration:** I inherit the same security model (bearer + ACL) OR I add real HMAC to my own listener. If I run a separate listener and want to share the Gitea webhook config, I need to cooperate with nginx or stand up a second endpoint. If I terminate events myself, I can do proper HMAC-SHA256 of the raw body using the `X-Gitea-Signature` header. ### Transform phases (gitea-transform.js v13.0) 1. **Validation gates** — event type filter, dedup, sender validation, `clawbot` loop prevention, agent echo suppression (`` HTML comment), rate limiting (5 concurrent). 2. **Trust level detection** — Rooh (ID 29) → owner/main, collaborators → contributor/gitea-worker, unknown → readonly. 3. **Session lock check** — file-based locks at `/root/.openclaw/hooks/locks/{owner}-{repo}-{issue}`. 2h TTL. Closed issues transition to IS_DONE with 5min grace. 4. **Event routing** (the heart of it): - `repository` / `create` → `execSync post-repo-audit.sh` — pure script, zero tokens - `push` → skip main/master, route to main or gitea-worker with ACTION: RUN_CI - `issues.opened` with `[IMPLEMENT]` title → verify spawn sig (if sol) → `precomputeSpawnParams` → `asyncDispatchToSpawner` → returns SPAWN_MANAGER directive - `issue_comment` with approval words from Rooh → acquire lock → precompute → dispatch → EXECUTE_PLAN 5. **Logging & audit** — `logs/audit.jsonl`, `logs/webhook-events-YYYY-MM.jsonl`, `logs/incidents.jsonl`. ### Tools fan-out (scripts invoked) | Script | Trigger | Role | |------------------------------|----------------------------------|-----------------------------------------------------------------------------| | `post-repo-audit.sh` | repository/create (execSync) | Add Rooh as admin collaborator, ensure webhook exists. Pure script, ~seconds. | | `audit-webhooks.sh` | heartbeat (15min) | Verify all webhooks exist and are healthy. `--fix` recreates missing hooks. | | `audit-repo-policies.sh` | heartbeat (6h) + manual | Enforce required files (Makefile, .editorconfig, etc.) from sol/repo-policies. | | `secret-scan.sh` | CI (`make check`) | Find private keys and high-entropy tokens. Allowlist at `.secret-scan-allowlist`. | | `create-implement-issue.sh` | manual / agent | Create signed `[IMPLEMENT]` issue with HMAC spawn signature. | | `check-implement-orphans.sh` | heartbeat (15min) | Detect stale pending spawn files, inactive Managers, orphaned Workers. | | `spawn-manager.sh` | agent-called | Generate Manager spawn JSON from issue body. Creates project workspace. | ### Openclaw hard couplings I must replace | Coupling | What I'll need | Difficulty | |-----------------------------------------------------------------|--------------------------------------------------------|------------| | `sessions_spawn` (openclaw's subagent spawn primitive) | Channels plugin or HTTP-triggered session creation | **Hard** | | `wakeMode: now / next-heartbeat` | My own dispatch queue | Medium | | Agent IDs (`main`, `gitea-worker`, `spawner`) | My own routing scheme (just `caret` + role tags) | Easy | | Bearer token auth (`OPENCLAW_HOOKS_TOKEN`) | My own bearer token, shared with nginx or own listener | Easy | | Workspace path (`/root/.openclaw/workspace/projects/PROJ-XXX-*`) | My own workspace path; update all scripts | Easy | | Session lock dir (`/root/.openclaw/hooks/locks/`) | My own lock dir | Easy | | Queue inbox (`/root/.openclaw/hooks/queue-inbox/`) | My own queue + daemon | Easy | | Mattermost incident posting | Swap to tg-stream or keep calling Mattermost | Easy | | Gitea API wrapper | Keep as-is (stateless curl) | Trivial | ### Surprising behaviors / gotchas 1. Session locks have a **2h TTL**, different from archive timeout. Long-running Managers can lose their lock and trigger a race. 2. Closed-while-locked issues transition to **IS_DONE with 5min grace**, not immediate release. 3. Spawn signatures expire after **2h** regardless of issue state — approving a stale issue rejects the spawn. 4. Dedup cache is persisted to disk; crash recovery repopulates within 24h but replays may occur during that window. 5. Gitea HMAC not verified — bearer + nginx ACL are the only layers. 6. The `sol` account is treated as a contributor UNLESS its issue has a valid spawn signature. Approval words from sol are ignored. 7. Rate limiting is **per agent ID** (5 concurrent per agent), tracked in `active-sessions.json`. 8. `asyncDispatchToSpawner` is fire-and-forget HTTP POST — no error handling if the spawner is down. 9. `precomputeSpawnParams` calls `spawn-manager.sh` via `execSync` with a 30s timeout in the transform's hot path. Timeout = text-directive fallback. 10. Manager death before `STATE.json` checkpoint can cause duplicate respawns (flagged in agent-reliability). 11. Lock expiry cleanup is lazy — expired files accumulate until someone calls `check()`. 12. Queue daemon re-verifies spawn signatures using the stored timestamp, moving the TTL check off the hot path. ### HMAC recipe I can steal (for a proper replacement) Gitea sends raw body + `X-Gitea-Signature: ` computed with the webhook secret over the raw JSON body. Node.js verification: ```javascript const crypto = require('node:crypto'); function verifyGiteaSignature(rawBody, signatureHeader, secret) { if (!signatureHeader) return false; const expected = crypto.createHmac('sha256', secret).update(rawBody).digest('hex'); // timing-safe compare const sigBuf = Buffer.from(signatureHeader, 'hex'); const expBuf = Buffer.from(expected, 'hex'); if (sigBuf.length !== expBuf.length) return false; return crypto.timingSafeEqual(sigBuf, expBuf); } ``` The catch: the listener must read the **raw body before JSON parsing**. Middleware that parses JSON first strips the raw bytes and makes verification impossible. This is why openclaw's gateway never verified HMAC — it handed already-parsed JSON to the transform. My replacement listener must keep the raw body around (store it on the request object) until after verification. ### Test infrastructure I can reuse - `tests/test-transform.js` — full unit tests for routing, trust levels, locks, rate limiting (Node.js, 50+ cases, mocks Gitea API). Directly portable. - `tests/test-lifecycle.js` — Manager/Worker lifecycle including spawn signatures. - `tests/test-spawn-manager.sh` — project ID generation, workspace creation (isolated, no API). - `tools/test-transform.sh` — manual smoke test against a live gateway. - Test fixtures in `tests/fixtures/` — mock Gitea payloads for events. All are portable. I can copy them into my migration repo with their license intact and adapt paths. ## Answers to the questions that defined this research **Q: Can I replace the deterministic side cleanly?** Yes. Every pure script is a standalone bash/node script with minimal coupling to openclaw. Main effort is rewriting the event router (`gitea-transform.js` equivalent) and the ingress path. **Q: Can I replace the agent-spawn side cleanly?** Hard. `sessions_spawn` is a core openclaw primitive with session state, model selection, tool allowlists, wakeMode semantics, and process lifecycle management. The replacement needs a Channels plugin or a long-running worker pool; neither is a weekend project. **Q: What's the minimum viable Caret pipeline?** An HTTP listener with proper HMAC verification, a router with the same event → script fan-out as gitea-transform.js, a file-based session lock manager, a structured log, and ONE script per event type. That's ~600-800 lines in one bun file, doable in Phase 2. **Q: What should I NOT rebuild?** The tools themselves — `post-repo-audit.sh`, `audit-repo-policies.sh`, `spawn-manager.sh`, etc. Copy them, strip the openclaw path prefixes, ship them in `/host/root/.caret/tools/`. Don't reinvent the workspace/PROJ-XXX scheme unless Rooh explicitly asks — it's a working system with its own conventions I'd be recreating poorly. ## Next reads pending - **Subagent `ae5ca38f70b1e9626`**: openclaw gateway internals — how does `sessions_spawn` actually fire a new Claude session? Where is the tool allowlist enforced? What does the cron/heartbeat scheduler look like? - **Subagent `abf0cb0928d823a0b`**: live state audit — already partial results visible: four webhooks registered across sol/* repos, all pointing to `https://slack.solio.tech/hooks/gitea`, all with `Secret: NOT SET`. That's the smoking gun for "HMAC is not in the path right now." When both finish, synthesize into `ARCHITECTURE.md` in the migration repo and move to Phase 1 design.