Files
openclaw-to-caret-migration/research/RESEARCH-01-gitea-webhooks-deep-read.md

11 KiB

Research 01 — gitea-webhooks, workspace-ops, agent-reliability deep read

Subagent: afa92905872a43a9b (Explore) Completed: 2026-04-06 12:40 UTC Scope: Architecture of the current openclaw gitea-webhooks pipeline and the two smaller reference repos.

Key architectural findings

Ingress path (the surprise)

Gitea → HTTPS POST https://slack.solio.tech/hooks/gitea
      → nginx (TLS term, strips path, injects Bearer header)
      → http://127.0.0.1:18789/hooks/agent (openclaw gateway)
      → gitea-transform.js
      → event router

HMAC is NOT verified by the pipeline. Gitea sends X-Gitea-Signature but the raw request body is not available to the transform layer, so real HMAC-SHA256 verification never happens. Authentication is layered differently:

  1. TLS via Let's Encrypt on nginx
  2. Nginx localhost ACL on port 18789 (gateway only accessible from the host)
  3. Bearer token injected by nginx (Authorization: Bearer {OPENCLAW_HOOKS_TOKEN}); gateway rejects 401 on mismatch
  4. Dedup cache of X-Gitea-Delivery headers, 24h window, persisted at /root/.openclaw/hooks/logs/dedup-cache.json

There IS an HMAC in the system, but it's for a completely different purpose: spawn signatures. When the sol account creates an [IMPLEMENT] issue, it embeds <!-- xen-spawn-sig:HMAC:TIMESTAMP --> in the body. The transform re-computes HMAC-SHA256 over repo|title|timestamp with secret at /root/.openclaw/hooks/spawn-secret, verifies against the embedded value, and rejects stale or invalid signatures (2h TTL). This lets sol trigger SPAWN_MANAGER directly (bypassing gitea-worker), while keeping unauthorized creators locked out.

Implication for Caret migration: I inherit the same security model (bearer + ACL) OR I add real HMAC to my own listener. If I run a separate listener and want to share the Gitea webhook config, I need to cooperate with nginx or stand up a second endpoint. If I terminate events myself, I can do proper HMAC-SHA256 of the raw body using the X-Gitea-Signature header.

Transform phases (gitea-transform.js v13.0)

  1. Validation gates — event type filter, dedup, sender validation, clawbot loop prevention, agent echo suppression (<!-- openclaw-agent --> HTML comment), rate limiting (5 concurrent).
  2. Trust level detection — Rooh (ID 29) → owner/main, collaborators → contributor/gitea-worker, unknown → readonly.
  3. Session lock check — file-based locks at /root/.openclaw/hooks/locks/{owner}-{repo}-{issue}. 2h TTL. Closed issues transition to IS_DONE with 5min grace.
  4. Event routing (the heart of it):
    • repository / createexecSync post-repo-audit.sh — pure script, zero tokens
    • push → skip main/master, route to main or gitea-worker with ACTION: RUN_CI
    • issues.opened with [IMPLEMENT] title → verify spawn sig (if sol) → precomputeSpawnParamsasyncDispatchToSpawner → returns SPAWN_MANAGER directive
    • issue_comment with approval words from Rooh → acquire lock → precompute → dispatch → EXECUTE_PLAN
  5. Logging & auditlogs/audit.jsonl, logs/webhook-events-YYYY-MM.jsonl, logs/incidents.jsonl.

Tools fan-out (scripts invoked)

Script Trigger Role
post-repo-audit.sh repository/create (execSync) Add Rooh as admin collaborator, ensure webhook exists. Pure script, ~seconds.
audit-webhooks.sh heartbeat (15min) Verify all webhooks exist and are healthy. --fix recreates missing hooks.
audit-repo-policies.sh heartbeat (6h) + manual Enforce required files (Makefile, .editorconfig, etc.) from sol/repo-policies.
secret-scan.sh CI (make check) Find private keys and high-entropy tokens. Allowlist at .secret-scan-allowlist.
create-implement-issue.sh manual / agent Create signed [IMPLEMENT] issue with HMAC spawn signature.
check-implement-orphans.sh heartbeat (15min) Detect stale pending spawn files, inactive Managers, orphaned Workers.
spawn-manager.sh agent-called Generate Manager spawn JSON from issue body. Creates project workspace.

Openclaw hard couplings I must replace

Coupling What I'll need Difficulty
sessions_spawn (openclaw's subagent spawn primitive) Channels plugin or HTTP-triggered session creation Hard
wakeMode: now / next-heartbeat My own dispatch queue Medium
Agent IDs (main, gitea-worker, spawner) My own routing scheme (just caret + role tags) Easy
Bearer token auth (OPENCLAW_HOOKS_TOKEN) My own bearer token, shared with nginx or own listener Easy
Workspace path (/root/.openclaw/workspace/projects/PROJ-XXX-*) My own workspace path; update all scripts Easy
Session lock dir (/root/.openclaw/hooks/locks/) My own lock dir Easy
Queue inbox (/root/.openclaw/hooks/queue-inbox/) My own queue + daemon Easy
Mattermost incident posting Swap to tg-stream or keep calling Mattermost Easy
Gitea API wrapper Keep as-is (stateless curl) Trivial

Surprising behaviors / gotchas

  1. Session locks have a 2h TTL, different from archive timeout. Long-running Managers can lose their lock and trigger a race.
  2. Closed-while-locked issues transition to IS_DONE with 5min grace, not immediate release.
  3. Spawn signatures expire after 2h regardless of issue state — approving a stale issue rejects the spawn.
  4. Dedup cache is persisted to disk; crash recovery repopulates within 24h but replays may occur during that window.
  5. Gitea HMAC not verified — bearer + nginx ACL are the only layers.
  6. The sol account is treated as a contributor UNLESS its issue has a valid spawn signature. Approval words from sol are ignored.
  7. Rate limiting is per agent ID (5 concurrent per agent), tracked in active-sessions.json.
  8. asyncDispatchToSpawner is fire-and-forget HTTP POST — no error handling if the spawner is down.
  9. precomputeSpawnParams calls spawn-manager.sh via execSync with a 30s timeout in the transform's hot path. Timeout = text-directive fallback.
  10. Manager death before STATE.json checkpoint can cause duplicate respawns (flagged in agent-reliability).
  11. Lock expiry cleanup is lazy — expired files accumulate until someone calls check().
  12. Queue daemon re-verifies spawn signatures using the stored timestamp, moving the TTL check off the hot path.

HMAC recipe I can steal (for a proper replacement)

Gitea sends raw body + X-Gitea-Signature: <hex HMAC-SHA256> computed with the webhook secret over the raw JSON body. Node.js verification:

const crypto = require('node:crypto');
function verifyGiteaSignature(rawBody, signatureHeader, secret) {
  if (!signatureHeader) return false;
  const expected = crypto.createHmac('sha256', secret).update(rawBody).digest('hex');
  // timing-safe compare
  const sigBuf = Buffer.from(signatureHeader, 'hex');
  const expBuf = Buffer.from(expected, 'hex');
  if (sigBuf.length !== expBuf.length) return false;
  return crypto.timingSafeEqual(sigBuf, expBuf);
}

The catch: the listener must read the raw body before JSON parsing. Middleware that parses JSON first strips the raw bytes and makes verification impossible. This is why openclaw's gateway never verified HMAC — it handed already-parsed JSON to the transform. My replacement listener must keep the raw body around (store it on the request object) until after verification.

Test infrastructure I can reuse

  • tests/test-transform.js — full unit tests for routing, trust levels, locks, rate limiting (Node.js, 50+ cases, mocks Gitea API). Directly portable.
  • tests/test-lifecycle.js — Manager/Worker lifecycle including spawn signatures.
  • tests/test-spawn-manager.sh — project ID generation, workspace creation (isolated, no API).
  • tools/test-transform.sh — manual smoke test against a live gateway.
  • Test fixtures in tests/fixtures/ — mock Gitea payloads for events.

All are portable. I can copy them into my migration repo with their license intact and adapt paths.

Answers to the questions that defined this research

Q: Can I replace the deterministic side cleanly? Yes. Every pure script is a standalone bash/node script with minimal coupling to openclaw. Main effort is rewriting the event router (gitea-transform.js equivalent) and the ingress path.

Q: Can I replace the agent-spawn side cleanly? Hard. sessions_spawn is a core openclaw primitive with session state, model selection, tool allowlists, wakeMode semantics, and process lifecycle management. The replacement needs a Channels plugin or a long-running worker pool; neither is a weekend project.

Q: What's the minimum viable Caret pipeline? An HTTP listener with proper HMAC verification, a router with the same event → script fan-out as gitea-transform.js, a file-based session lock manager, a structured log, and ONE script per event type. That's ~600-800 lines in one bun file, doable in Phase 2.

Q: What should I NOT rebuild? The tools themselves — post-repo-audit.sh, audit-repo-policies.sh, spawn-manager.sh, etc. Copy them, strip the openclaw path prefixes, ship them in /host/root/.caret/tools/. Don't reinvent the workspace/PROJ-XXX scheme unless Rooh explicitly asks — it's a working system with its own conventions I'd be recreating poorly.

Next reads pending

  • Subagent ae5ca38f70b1e9626: openclaw gateway internals — how does sessions_spawn actually fire a new Claude session? Where is the tool allowlist enforced? What does the cron/heartbeat scheduler look like?
  • Subagent abf0cb0928d823a0b: live state audit — already partial results visible: four webhooks registered across sol/* repos, all pointing to https://slack.solio.tech/hooks/gitea, all with Secret: NOT SET. That's the smoking gun for "HMAC is not in the path right now."

When both finish, synthesize into ARCHITECTURE.md in the migration repo and move to Phase 1 design.