Compare commits
3 Commits
feat/archi
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
7590df83de | ||
|
|
7489d2159c | ||
|
|
bab33fe1cf |
5
.dockerignore
Normal file
5
.dockerignore
Normal file
@@ -0,0 +1,5 @@
|
||||
.git
|
||||
node_modules
|
||||
coverage
|
||||
*.md
|
||||
!README.md
|
||||
15
.editorconfig
Normal file
15
.editorconfig
Normal file
@@ -0,0 +1,15 @@
|
||||
root = true
|
||||
|
||||
[*]
|
||||
indent_style = space
|
||||
indent_size = 4
|
||||
end_of_line = lf
|
||||
charset = utf-8
|
||||
trim_trailing_whitespace = true
|
||||
insert_final_newline = true
|
||||
|
||||
[*.md]
|
||||
trim_trailing_whitespace = false
|
||||
|
||||
[Makefile]
|
||||
indent_style = tab
|
||||
10
.gitignore
vendored
Normal file
10
.gitignore
vendored
Normal file
@@ -0,0 +1,10 @@
|
||||
node_modules/
|
||||
coverage/
|
||||
dist/
|
||||
.env
|
||||
.env.*
|
||||
!.env.example
|
||||
*.log
|
||||
.DS_Store
|
||||
*.swp
|
||||
*~
|
||||
4
.prettierignore
Normal file
4
.prettierignore
Normal file
@@ -0,0 +1,4 @@
|
||||
node_modules
|
||||
*.min.js
|
||||
coverage
|
||||
dist
|
||||
6
.prettierrc
Normal file
6
.prettierrc
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"singleQuote": true,
|
||||
"trailingComma": "all",
|
||||
"tabWidth": 4,
|
||||
"proseWrap": "always"
|
||||
}
|
||||
222
DEPENDENCIES.md
Normal file
222
DEPENDENCIES.md
Normal file
@@ -0,0 +1,222 @@
|
||||
# Openclaw dependency matrix
|
||||
|
||||
Every openclaw-provided capability that the current gitea-webhooks pipeline uses, and what Caret's replacement does with it. Source citations: R01 = RESEARCH-01, R02 = RESEARCH-02, R03 = RESEARCH-03, PLAN = PLAN.md.
|
||||
|
||||
Categories:
|
||||
- **REMOVE** — Caret's replacement doesn't need this at all
|
||||
- **REPLACE** — Caret builds a standalone equivalent
|
||||
- **KEEP** — Caret continues depending on openclaw for this (and why that's safe)
|
||||
- **BLOCKING** — Caret cannot move forward without resolving this first (Rooh action required)
|
||||
|
||||
## By capability
|
||||
|
||||
### 1. HTTP webhook ingress (port 18789, `/hooks/agent`)
|
||||
**Category:** REPLACE
|
||||
**What Caret does:** Stand up a bun HTTP listener at a Caret-owned port (e.g. 18790 or unix socket), with its own path. Mirrors openclaw's `/hooks/agent` shape so existing scripts can be reused. [R01 §Ingress path, R02 §HTTP endpoints]
|
||||
**Risk if kept:** Hard-couples Caret's lifecycle to the openclaw gateway container. Defeats the entire migration.
|
||||
**Effort:** 1 day. Stateless request/response, well-understood. [R02 difficulty matrix: Easy]
|
||||
|
||||
### 2. HMAC verification (currently NOT done by openclaw)
|
||||
**Category:** REPLACE (net-new — openclaw never had it)
|
||||
**What Caret does:** Real HMAC-SHA256 of the raw body using `X-Gitea-Signature` header, timing-safe compare, raw body preserved before JSON parse. [R01 §HMAC recipe, R03 confirms all 4 known webhooks have `Secret: NOT SET`]
|
||||
**Risk if kept:** No HMAC at all today; protection is bearer + nginx ACL only. Caret should do better.
|
||||
**Effort:** 0.5 day. ~30 lines.
|
||||
**Depends on BLOCKING #1 (webhook secret provisioning).**
|
||||
|
||||
### 3. Bearer token authentication (`OPENCLAW_HOOKS_TOKEN`)
|
||||
**Category:** REPLACE
|
||||
**What Caret does:** Own bearer token (`CARET_HOOKS_TOKEN`) stored in Caret's credentials dir, validated on each request as a second factor alongside HMAC. [R01 §Ingress path layer 3]
|
||||
**Risk if kept:** Sharing the openclaw token cross-couples secret rotation.
|
||||
**Effort:** Trivial.
|
||||
|
||||
### 4. nginx TLS termination and path rewriting
|
||||
**Category:** KEEP (with new location block)
|
||||
**What Caret does:** Adds a new nginx `location /hooks/caret` block that proxies to the Caret listener, same TLS cert. Reuses existing Let's Encrypt setup. [R01 §Ingress path]
|
||||
**Risk if kept:** None — nginx is host-level infra, not openclaw-specific.
|
||||
**Effort:** 1 hour, but requires host root access.
|
||||
**See BLOCKING #2.**
|
||||
|
||||
### 5. Event dedup cache (`X-Gitea-Delivery` 24h)
|
||||
**Category:** REPLACE
|
||||
**What Caret does:** Own JSON-on-disk dedup cache at `/host/root/.caret/state/dedup-cache.json`, identical 24h TTL semantics. [R01 §Ingress path layer 4, R01 gotcha 4]
|
||||
**Risk if kept:** Sharing openclaw's cache file is fragile and couples crash recovery.
|
||||
**Effort:** 0.5 day.
|
||||
|
||||
### 6. Rate limiting (5 concurrent per agent)
|
||||
**Category:** REPLACE
|
||||
**What Caret does:** Per-route concurrency limiter in the listener. Caret's routing is simpler (one "agent") so this is mostly a global semaphore. [R01 gotcha 7]
|
||||
**Risk if kept:** N/A.
|
||||
**Effort:** Trivial.
|
||||
|
||||
### 7. Session lock manager (`hooks/locks/`)
|
||||
**Category:** REPLACE
|
||||
**What Caret does:** Own lock dir at `/host/root/.caret/state/locks/`, same 2h TTL, same file naming `{owner}-{repo}-{issue}`, same IS_DONE 5min grace. [R01 §Session lock]
|
||||
**Risk if kept:** Two writers to one lock dir = race.
|
||||
**Effort:** 0.5 day.
|
||||
|
||||
### 8. Queue daemon (`queue-inbox/` + `queue-daemon.js`)
|
||||
**Category:** REPLACE
|
||||
**What Caret does:** In-process queue inside the listener (or a sidecar bun script reading from `/host/root/.caret/state/queue/`). Re-verifies spawn signatures off the hot path. [R01 gotcha 12, R03 §queue-daemon.js]
|
||||
**Risk if kept:** Couples Caret's spawn pipeline to openclaw container restart.
|
||||
**Effort:** 1 day.
|
||||
|
||||
### 9. `sessions_spawn` agent orchestration primitive
|
||||
**Category:** REPLACE (with Claude Code Channels plugin)
|
||||
**What Caret does:** Build a Channels plugin at `/host/root/.caret/channels/gitea-judgment/` that receives an HTTP POST and starts a Claude Code session with the payload as initial prompt. [R02 §Caret's conclusion, PLAN §Phase 3 J3.1]
|
||||
**Risk if kept:** Openclaw's `sessions_spawn` is the deepest coupling — see R02 difficulty matrix "Hard" rating. Keeping it means Caret can never fully cut over.
|
||||
**Effort:** 2-3 days for the minimum viable plugin. Full parity (wakeMode, thinking, model selection) is more like 1 week.
|
||||
|
||||
### 10. Tool allowlist enforcement per agent
|
||||
**Category:** KEEP (judgment path runs inside Claude Code, which has its own tool config)
|
||||
**What Caret does:** Caret's deterministic path runs scripts directly with no tools concept. The judgment path is a Claude Code session whose tool allowlist is configured in Claude Code's settings.json (not openclaw's). [R02 §Tool policy resolution]
|
||||
**Risk if kept:** None — Caret never touches openclaw's tool policy resolver.
|
||||
**Effort:** 0 (already separated).
|
||||
|
||||
### 11. Plugin SDK (`src/plugin-sdk/`)
|
||||
**Category:** REMOVE
|
||||
**What Caret does:** Doesn't load openclaw plugins. Caret's listener is a plain bun script with no plugin loader. Channels plugin uses Claude Code's plugin mechanism, not openclaw's. [R02 §Don't reinvent #2]
|
||||
**Risk if kept:** N/A.
|
||||
**Effort:** 0.
|
||||
|
||||
### 12. Delivery system (Telegram, Mattermost, Discord)
|
||||
**Category:** REPLACE (use tg-stream + Mattermost direct calls)
|
||||
**What Caret does:** Calls tg-stream HTTP API for Telegram alerts and Mattermost webhook URL directly for incident posts. No multi-channel abstraction layer. [R02 §Don't reinvent #4 warns it's tightly coupled — we don't replicate, we use simpler primitives]
|
||||
**Risk if kept:** Hard coupling to openclaw outbound system.
|
||||
**Effort:** 0.5 day (just curl calls).
|
||||
|
||||
### 13. Session storage (JSONL transcripts at `~/.openclaw/sessions/`)
|
||||
**Category:** KEEP (judgment path only — Claude Code's own session store)
|
||||
**What Caret does:** Deterministic path has no sessions. Judgment path uses Claude Code's native session JSONL under Claude Code's config dir, NOT openclaw's. [R02 §Don't reinvent #1]
|
||||
**Risk if kept:** None — Caret never reads/writes openclaw's session files.
|
||||
**Effort:** 0.
|
||||
|
||||
### 14. Heartbeat scheduler (`heartbeat-runner.ts`)
|
||||
**Category:** REPLACE (use Claude Code `schedule` skill / cron)
|
||||
**What Caret does:** Use systemd timer in the Caret container OR Claude Code's schedule mechanism for the 6h policy sweep and 15min webhook audit. No 28-item checklist — Caret's heartbeat is much smaller. [R02 §Heartbeat loop, R03 §Heartbeat checklists]
|
||||
**Risk if kept:** Couples to openclaw gateway uptime, which R03 shows is currently degraded.
|
||||
**Effort:** 0.5 day for the timer setup.
|
||||
|
||||
### 15. Cron service (`CronService` in gateway)
|
||||
**Category:** REPLACE (same as #14 — single mechanism)
|
||||
**What Caret does:** Same as heartbeat — systemd timers or Claude Code schedule. [R02 §Cron service, R03 §Active cron jobs shows 8 of 12 currently failing]
|
||||
**Risk if kept:** R03 shows openclaw cron is currently degraded; depending on it inherits the breakage.
|
||||
**Effort:** Combined with #14.
|
||||
|
||||
### 16. Gitea API integration (token + curl wrappers)
|
||||
**Category:** KEEP
|
||||
**What Caret does:** Continue using the same `GITEA_TOKEN` and stateless curl calls. Wrapper scripts ported into `/host/root/.caret/tools/`. [R01 §Tools fan-out, PLAN B2.3]
|
||||
**Risk if kept:** None — Gitea API is a third-party service, openclaw is not in the path.
|
||||
**Effort:** Path-strip and copy, 1 hour.
|
||||
**Depends on BLOCKING #3 (admin scope).**
|
||||
|
||||
### 17. Workspace directory (`/root/.openclaw/workspace/`)
|
||||
**Category:** KEEP (155 projects stay owned by openclaw/Xen)
|
||||
**What Caret does:** Caret's migration scope is the Gitea-facing slice ONLY. The 155-project workspace, the PROJ-XXX-* hierarchy, and Xen's project ownership stay as-is. Caret has its own `/host/root/.caret/workspace/` for migration artifacts only. [R03 §What the migration actually has to replace, PLAN §Goal]
|
||||
**Risk if kept:** None for the in-scope migration.
|
||||
**Effort:** 0.
|
||||
|
||||
### 18. Project registry (`projects/registry.json`)
|
||||
**Category:** KEEP
|
||||
**What Caret does:** Read-only access if needed (e.g., to look up which manager owns a project for a given issue). No writes. [R03 §Shared state]
|
||||
**Risk if kept:** Stale reads possible but acceptable.
|
||||
**Effort:** 0.
|
||||
|
||||
### 19. Memory system (`memory/`)
|
||||
**Category:** REMOVE (from migration scope)
|
||||
**What Caret does:** Caret's own memory is in `/root/.claude/projects/-app/memory/`. Doesn't touch openclaw's memory dir. [R03 §Shared state]
|
||||
**Risk if kept:** N/A.
|
||||
**Effort:** 0.
|
||||
|
||||
### 20. Credentials store (`credentials/`)
|
||||
**Category:** KEEP (read-only for shared tokens like GITEA_TOKEN)
|
||||
**What Caret does:** Read GITEA_TOKEN from openclaw credentials OR copy to Caret credentials store. Caret-owned secrets (HMAC webhook secret, CARET_HOOKS_TOKEN) live in `/host/root/.caret/credentials/`. [R03 §Shared state credentials]
|
||||
**Risk if kept:** Cross-coupling on token rotation. Acceptable short-term.
|
||||
**Effort:** Trivial.
|
||||
**See BLOCKING #4 (secret rotation story).**
|
||||
|
||||
### 21. `post-repo-audit.sh`
|
||||
**Category:** REPLACE (port the script verbatim)
|
||||
**What Caret does:** Copy to `/host/root/.caret/tools/post-repo-audit.sh`, strip openclaw path prefixes, ship as-is. Pure script, zero tokens. [R01 §Tools fan-out, PLAN B2.3]
|
||||
**Risk if kept:** Couples to openclaw tools dir lifecycle.
|
||||
**Effort:** 1 hour.
|
||||
|
||||
### 22. `audit-repo-policies.sh`
|
||||
**Category:** REPLACE (port verbatim)
|
||||
**What Caret does:** Copy to Caret tools dir, point at sol/repo-policies template repo, run with `--fix` from Caret's heartbeat. [R01 §Tools fan-out]
|
||||
**Risk if kept:** Same as #21.
|
||||
**Effort:** 1 hour.
|
||||
|
||||
### 23. `spawn-manager.sh`
|
||||
**Category:** REPLACE (port verbatim, but only after #9 is solved)
|
||||
**What Caret does:** Copy script. Generates Manager spawn JSON from issue body, creates project workspace. Caret's listener calls it via execSync with same 30s timeout pattern. [R01 §Tools fan-out, R01 gotcha 9]
|
||||
**Risk if kept:** Tightly coupled to openclaw workspace path conventions — depends on whether Caret keeps openclaw's PROJ-XXX scheme (R01 advises yes).
|
||||
**Effort:** 0.5 day.
|
||||
|
||||
### 24. `create-implement-issue.sh`
|
||||
**Category:** REPLACE (port verbatim)
|
||||
**What Caret does:** Copy script. Creates signed `[IMPLEMENT]` issue with HMAC spawn signature using `/host/root/.caret/credentials/spawn-secret`. [R01 §Tools fan-out]
|
||||
**Risk if kept:** Spawn secret coupling.
|
||||
**Effort:** 1 hour.
|
||||
**Depends on BLOCKING #5 (spawn secret ownership).**
|
||||
|
||||
### 25. `secret-scan.sh`
|
||||
**Category:** REPLACE (port verbatim)
|
||||
**What Caret does:** Copy script, used by both CI (`make check`) inside repos and Caret's push handler. [R01 §Tools fan-out]
|
||||
**Risk if kept:** N/A.
|
||||
**Effort:** 1 hour.
|
||||
|
||||
### 26. `check-implement-orphans.sh`
|
||||
**Category:** REPLACE (port verbatim)
|
||||
**What Caret does:** Copy script, run from Caret's 15min heartbeat to detect stale pending spawns and orphaned managers. [R01 §Tools fan-out]
|
||||
**Risk if kept:** N/A.
|
||||
**Effort:** 1 hour.
|
||||
|
||||
### 27. `auditLog → logs/audit.jsonl`
|
||||
**Category:** REPLACE
|
||||
**What Caret does:** Caret's listener writes to `/host/root/.caret/log/audit.jsonl`, same JSONL line schema. Rotation by line count. [R01 §Logging, PLAN B2.5]
|
||||
**Risk if kept:** Two writers to one file = corruption.
|
||||
**Effort:** 0.5 day.
|
||||
|
||||
### 28. Incident flagging (`logs/incidents.jsonl`)
|
||||
**Category:** REPLACE
|
||||
**What Caret does:** Caret writes to `/host/root/.caret/log/incidents.jsonl`. Critical incidents also fan out to Telegram via tg-stream. [R01 §Logging, T2.23]
|
||||
**Risk if kept:** Same as #27.
|
||||
**Effort:** 0.5 day (combined with #27).
|
||||
|
||||
## Summary by category
|
||||
|
||||
| Category | Count |
|
||||
|---|---|
|
||||
| REMOVE | 2 (#11, #19) |
|
||||
| REPLACE | 19 (#1, #2, #3, #5, #6, #7, #8, #9, #12, #14, #15, #21, #22, #23, #24, #25, #26, #27, #28) |
|
||||
| KEEP | 6 (#4, #10, #13, #16, #17, #18, #20) — note #20 partially |
|
||||
| BLOCKING | 5 (see below) |
|
||||
|
||||
## BLOCKING items — Rooh action required
|
||||
|
||||
These five items must be resolved by Rooh before Caret can write the production code. They are not engineering decisions; they are policy / access decisions only Rooh can make.
|
||||
|
||||
### BLOCKING #1 — Webhook secret provisioning and storage location
|
||||
**Question:** Where is the new HMAC webhook secret stored? `/host/root/.caret/credentials/webhook-secret`? A vault entry? Reused from an existing openclaw secret?
|
||||
**Why it blocks:** Cannot ship HMAC verification (T1.06–T1.11) without the secret being authoritative somewhere. Cannot register webhooks on sol/* repos without knowing what secret to set on the Gitea side.
|
||||
**Reference:** R01 §HMAC recipe, R03 confirms all current webhooks have `Secret: NOT SET`.
|
||||
|
||||
### BLOCKING #2 — nginx config write access for `/hooks/caret` location
|
||||
**Question:** Does Caret have permission to edit `/host/etc/nginx/...` and reload nginx, or does Rooh do that one-time setup?
|
||||
**Why it blocks:** Without an nginx route, Gitea cannot reach the Caret listener via HTTPS. Direct port exposure is the only alternative and that needs firewall changes.
|
||||
**Reference:** R01 §Ingress path nginx termination, dependency #4.
|
||||
|
||||
### BLOCKING #3 — Gitea token admin scope OR manual system-webhook registration
|
||||
**Question:** Will Rooh elevate the sol token to include `read:admin` + `write:admin`, or will Rooh manually register the system-level webhook one time?
|
||||
**Why it blocks:** Without admin scope Caret cannot list system-level webhooks (R03 confirmed) and cannot create them. The sol/* per-repo webhooks can be registered with the existing token, but for new repo bootstrap to register Caret's webhook automatically (T1.02), the admin scope or manual setup is required.
|
||||
**Reference:** R03 §Registered Gitea webhooks, PLAN §Dependencies token scope.
|
||||
|
||||
### BLOCKING #4 — Secret rotation story across openclaw + Caret
|
||||
**Question:** When `GITEA_TOKEN` or the webhook secret rotates, what's the rollout? Does Caret read from openclaw's credentials dir live, or get its own copy that needs separate rotation? Who is responsible for rotation drills?
|
||||
**Why it blocks:** PLAN §Risks #1 explicitly calls this out as a first-class deliverable. Without a documented rotation procedure, T3.13 cannot pass and the migration leaves a security debt.
|
||||
**Reference:** PLAN §Risks #1, dependency #20.
|
||||
|
||||
### BLOCKING #5 — Spawn signature secret ownership (`/root/.openclaw/hooks/spawn-secret`)
|
||||
**Question:** Does Caret get its own spawn secret (and `create-implement-issue.sh` is updated to use it), or does Caret read openclaw's existing secret? If the latter, what happens when openclaw rotates it?
|
||||
**Why it blocks:** T1.15–T1.17 (spawn signature verification) require Caret and `create-implement-issue.sh` to share a secret. Cross-system sharing is the cleaner short-term answer but locks Caret to openclaw's lifecycle until cut-over completes.
|
||||
**Reference:** R01 §Ingress path "spawn signatures", dependency #24.
|
||||
352
DESIGN.md
Normal file
352
DESIGN.md
Normal file
@@ -0,0 +1,352 @@
|
||||
# DESIGN.md — Caret repo-enforcer (openclaw Gitea-slice replacement)
|
||||
|
||||
**Status:** draft for sign-off
|
||||
**Author:** Caret
|
||||
**Approver:** Rooh
|
||||
**Phase:** 1 (architecture lock)
|
||||
**Scope reminder:** replaces only the Gitea-facing slice of openclaw — webhook ingress, event router, script fan-out, repo policy enforcement, and an opt-in judgment wake-up. Everything else (155 projects, workspace memory, sub-agent orchestration, delivery system) stays owned by openclaw.
|
||||
|
||||
## Purpose
|
||||
|
||||
A single-file Bun HTTP listener that receives Gitea webhooks, verifies them with HMAC, runs deterministic policy scripts, and optionally wakes a Claude session for judgment cases — replacing openclaw's `gitea-transform.js` pipeline with no LLM in the hot path.
|
||||
|
||||
## Target architecture
|
||||
|
||||
```
|
||||
Gitea (sol/*)
|
||||
│ POST + X-Gitea-Signature (HMAC-SHA256 of raw body)
|
||||
│ X-Gitea-Event, X-Gitea-Delivery
|
||||
▼
|
||||
nginx (slack.solio.tech)
|
||||
│ TLS termination
|
||||
│ path /hooks/caret/gitea → forwards raw body, no JSON parsing
|
||||
▼
|
||||
caret-repo-enforcer (Bun, single file, host process under systemd-style supervisor)
|
||||
listens 127.0.0.1:18790
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ 1. read raw body bytes │
|
||||
│ 2. HMAC verify (timing-safe) → 403 on mismatch │
|
||||
│ 3. dedup by X-Gitea-Delivery (24h LRU on disk) │
|
||||
│ 4. parse JSON │
|
||||
│ 5. event router ──► route table (event → script[]) │
|
||||
│ 6. fan-out: spawn scripts in /host/root/.caret/tools/ as children│
|
||||
│ 7. capture stdout/stderr/exit, attach to log entry │
|
||||
│ 8. on script error or "judgment" exit code → enqueue wake-up │
|
||||
│ 9. structured JSON log line → /host/root/.caret/log/ │
|
||||
│ 10. respond 200 with {ok, runId, scripts:[...]} │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
scripts (bash/node) judgment wake-up tg-stream alert
|
||||
/host/root/.caret/tools/ via Channels plugin on error paths
|
||||
- post-repo-audit.sh POST to local (existing)
|
||||
- audit-repo-policies.sh channel endpoint ─────────────► Rooh's Telegram
|
||||
- secret-scan.sh spawns Claude session
|
||||
- audit-webhooks.sh with payload as prompt
|
||||
│
|
||||
▼
|
||||
Gitea API (curl, sol token) — apply fixes, add collaborators, commit policy files
|
||||
│
|
||||
▼
|
||||
Result visible to Rooh:
|
||||
- log line greppable across tg-stream + repo-enforcer logs
|
||||
- tg-stream Telegram message on error or judgment cases
|
||||
- the actual repo state in Gitea
|
||||
```
|
||||
|
||||
Every hop is synchronous from request to 200 response except the wake-up (fire-and-forget) and tg-stream (fire-and-forget). The 200 carries the run ID so any future replay can be correlated.
|
||||
|
||||
## Components
|
||||
|
||||
### 1. HTTP listener — `caret-repo-enforcer`
|
||||
|
||||
**What:** Single Bun file. HTTP server on `127.0.0.1:18790`. Receives Gitea webhooks via nginx, runs the entire pipeline, returns 200 with a run ID.
|
||||
|
||||
**Where on disk:** `/host/root/.caret/repo-enforcer/server.ts` (single file, ~600-800 lines, same shape as `tg-stream`).
|
||||
|
||||
**Runtime:** Bun (latest stable). No Node, no npm install — Bun's stdlib covers HTTP, crypto, fs, child_process. No package.json needed beyond a stub for type hints.
|
||||
|
||||
**Process model:** host process supervised by a tiny shell wrapper (`run.sh`) under `tmux` for parity with `tg-stream`. **Rationale:** docker container would add image-build overhead, volume mounting for `/host/root/.caret/`, and a network hop for nothing. `tg-stream` already proves the host-process pattern works for a long-lived listener; reuse the same shape for operational simplicity. If Rooh prefers containerization for isolation, the single-file shape ports trivially — we just add a Dockerfile later.
|
||||
|
||||
**Dependencies:** Bun stdlib only. No openclaw imports. No plugin SDK. The Gitea API is called by the scripts via `curl`, not from the listener.
|
||||
|
||||
**Responsibilities:**
|
||||
- Bind 127.0.0.1:18790 (nginx is the only thing that can reach it)
|
||||
- Read raw request body before any parsing
|
||||
- HMAC-SHA256 verify against `GITEA_WEBHOOK_SECRET`
|
||||
- Bearer token check (defense in depth, in case nginx is misconfigured)
|
||||
- Dedup by `X-Gitea-Delivery` header against on-disk LRU
|
||||
- Route the event to scripts via a static table
|
||||
- Spawn scripts as child processes with bounded timeouts
|
||||
- Emit one structured JSON log line per request, regardless of outcome
|
||||
- Emit a tg-stream alert on error paths
|
||||
- Expose `/healthz` for liveness
|
||||
|
||||
### 2. Event router (inside the listener)
|
||||
|
||||
**What:** A static `routeTable` map: `(event, action) → [scriptPath, ...]`. Replaces gitea-transform.js's switch statement.
|
||||
|
||||
**Where:** Inline in `server.ts`. Not a separate file because routes are dense and rarely change; one file beats two.
|
||||
|
||||
**Routes (initial):**
|
||||
|
||||
| Gitea event | Action filter | Scripts (in order) | Mode |
|
||||
|---|---|---|---|
|
||||
| `repository` | `created` | `post-repo-audit.sh`, `audit-repo-policies.sh --fix` | sequential |
|
||||
| `push` | ref = `refs/heads/main` or `master` | `secret-scan.sh`, `audit-repo-policies.sh --check` | parallel |
|
||||
| `push` | other refs | (skip, log "ignored: non-main") | n/a |
|
||||
| `issues` | `opened` with `[IMPLEMENT]` in title | `enqueue-judgment.sh implement` | sequential |
|
||||
| `issue_comment` | `created` by Rooh, body matches approval words | `enqueue-judgment.sh approval` | sequential |
|
||||
| (other) | any | (skip, log "ignored: not routed") | n/a |
|
||||
|
||||
**Rationale for keeping routing in-listener vs a config file:** the routes are part of the security perimeter and they need to be reviewed in code, not in JSON. Editing `server.ts` triggers the listener restart that re-applies the lock — same as openclaw's transform versioning. Config-file routing invites the "I changed the JSON and broke prod silently" failure mode.
|
||||
|
||||
### 3. Script fan-out — `/host/root/.caret/tools/`
|
||||
|
||||
**What:** Ported scripts from `sol/gitea-webhooks/tools/`, with openclaw-specific paths stripped. Each script is standalone bash or node, takes the JSON payload on stdin or via env vars, exits 0 (success), 1 (deterministic failure → tg-stream alert), or 42 (judgment escalation → wake-up).
|
||||
|
||||
**Where on disk:** `/host/root/.caret/tools/`. One file per script. Each file has a header comment naming its trigger and exit-code contract. License preserved from openclaw.
|
||||
|
||||
**Initial set (Phase 2 ports):**
|
||||
- `post-repo-audit.sh` — add Rooh as admin collaborator, ensure HMAC webhook exists with correct secret. Idempotent.
|
||||
- `audit-repo-policies.sh` — ensures Makefile, .editorconfig, .gitignore, README baseline. `--fix` commits missing files; `--check` only reports.
|
||||
- `secret-scan.sh` — finds private keys and high-entropy strings on the diff for the push. Honors `.secret-scan-allowlist`.
|
||||
- `audit-webhooks.sh` — verifies all sol/* repos have a Caret webhook with the right secret. Called by cron, not by webhook events.
|
||||
- `enqueue-judgment.sh` — writes a judgment request file to `/host/root/.caret/judgment-inbox/` and POSTs to the Channels plugin endpoint.
|
||||
|
||||
**Path conventions:** every script reads its config from env vars set by the listener. No hard-coded `/root/.openclaw/` paths. Workspace root is `/host/root/.caret/`. All scripts must be re-runnable on the same payload with no side effects beyond the first run (idempotency contract is in the header comment of each script).
|
||||
|
||||
**Timeouts:** the listener wraps each script in a 30-second timeout (matches openclaw's `precomputeSpawnParams`). Hitting the timeout is a deterministic failure (exit 124) that triggers a tg-stream alert and a log entry but does NOT escalate to judgment — repeated timeouts are an infrastructure bug, not a judgment call.
|
||||
|
||||
### 4. Structured log
|
||||
|
||||
**What:** Append-only JSONL at `/host/root/.caret/log/repo-enforcer.log`. One line per request. Format intentionally identical to `tg-stream`'s log so a single `grep` covers both pipelines.
|
||||
|
||||
**Format:**
|
||||
|
||||
```json
|
||||
{"ts":"2026-04-05T12:34:56.789Z","svc":"repo-enforcer","runId":"r_abc123","level":"info",
|
||||
"event":"push","action":null,"repo":"sol/foo","delivery":"d_xyz","hmac":"ok",
|
||||
"scripts":[{"name":"secret-scan.sh","exit":0,"durMs":421,"stderr":""},
|
||||
{"name":"audit-repo-policies.sh","exit":0,"durMs":1102}],
|
||||
"outcome":"ok","msg":"push processed"}
|
||||
```
|
||||
|
||||
Error lines use `level:"error"` and include `errPath` and `errMsg`. Every dropped event (dedup hit, route miss, ignored ref) gets a line at `level:"debug"` so there are no silent drops — addresses the openclaw gap in PLAN.md risk #3.
|
||||
|
||||
**Rotation:** line-count rotation at 50,000 lines, keep 10 generations, gzip the rest. Same policy as tg-stream. Implemented inline in the listener (no logrotate dependency) so the listener owns its own log lifecycle.
|
||||
|
||||
**Failure mode:** if log write fails (disk full), the listener still returns 200 to Gitea (Gitea is not the right place to retry log failures) but emits a tg-stream alert via the alternate path and increments a `logWriteFailures` counter on `/healthz`.
|
||||
|
||||
### 5. Wake-up channel — judgment escalation
|
||||
|
||||
**What:** A native Claude Code Channels plugin at `/host/root/.caret/channels/gitea-judgment/`. Receives a POST with a payload, starts a Claude session with that payload as the initial prompt, lets the session run and report back via tg-stream when done.
|
||||
|
||||
**Why Channels and not CronCreate:** the trigger is event-driven, not time-driven, so cron is the wrong primitive. Channels plugins are explicitly designed for "external event → Claude session" — exactly this use case. They're a native primitive, no openclaw dependency, and the plugin is a few-dozen-line manifest plus a handler script. **Considered alternative:** call openclaw's `/hooks/{hookPath}/agent` endpoint. Rejected because (a) it reintroduces the openclaw dependency we're explicitly removing, (b) openclaw's gateway is currently degraded (Research 03 §critical), and (c) the channels plugin gives us a clean ownership boundary.
|
||||
|
||||
**Trigger conditions (from PLAN.md J3.2):**
|
||||
1. A deterministic script exits 42 ("judgment requested").
|
||||
2. A script exits non-zero AND its header marks it `escalate-on-error: true` (e.g. policy enforcement failed mid-run on a repo we don't recognize).
|
||||
3. An event matches an explicit opt-in marker — `[IMPLEMENT]` issue title, Rooh approval-word comment.
|
||||
|
||||
**Cost hygiene:** judgment is never fired on a normal push, normal repo create, normal policy check. The default deterministic path applies to every event. Token spend on judgment is bounded by (a) only firing on errors or explicit opt-ins, (b) the channels plugin enforcing a per-hour rate limit (5 sessions/hour) configurable in the manifest, (c) tg-stream alerting Rooh every time a judgment session fires so spend is visible.
|
||||
|
||||
### 6. Secrets storage
|
||||
|
||||
**What lives where:**
|
||||
- `GITEA_WEBHOOK_SECRET` — HMAC secret, 32 random bytes, hex-encoded. Stored at `/host/root/.caret/secrets/gitea-webhook-secret` (mode 0600, owned by the listener user). Loaded into the listener at boot. The same secret is registered on every sol/* repo's webhook config (set by `post-repo-audit.sh` on repo create, and by the periodic `audit-webhooks.sh` cron).
|
||||
- `GITEA_API_TOKEN` — sol account token used by scripts to call the Gitea API. At `/host/root/.caret/secrets/gitea-api-token` (mode 0600). Loaded into env when scripts spawn.
|
||||
- `CARET_BEARER_TOKEN` — bearer token nginx injects on every forwarded request. Defense-in-depth in case the localhost ACL is bypassed. At `/host/root/.caret/secrets/bearer-token` (mode 0600).
|
||||
- `TG_STREAM_TOKEN` — already exists; reused, not duplicated.
|
||||
|
||||
**Rotation procedure:**
|
||||
1. Generate new secret: `openssl rand -hex 32 > /host/root/.caret/secrets/gitea-webhook-secret.new`
|
||||
2. Run `tools/audit-webhooks.sh --rotate-to /host/root/.caret/secrets/gitea-webhook-secret.new` — this updates every sol/* webhook in Gitea to the new secret AND keeps the old secret as a fallback in the listener for 60 seconds (dual-accept window).
|
||||
3. Atomic rename: `mv .new /host/root/.caret/secrets/gitea-webhook-secret`.
|
||||
4. Send the listener SIGHUP — it reloads secrets and drops the old one after the dual-accept window.
|
||||
5. Audit script logs the rotation to repo-enforcer.log with a special `rotation` event.
|
||||
|
||||
**Blast radius of leak:** a leaked webhook secret lets an attacker forge events. Mitigation: leaked secrets get rotated by the procedure above (one shell command), and the listener's dedup cache prevents replay of *real* events as forgeries.
|
||||
|
||||
### 7. Observability
|
||||
|
||||
**Health check:** `GET /healthz` on the listener returns:
|
||||
```json
|
||||
{"ok":true,"uptimeSec":12345,"lastEventTs":"2026-04-05T...","dedupCacheSize":423,
|
||||
"logWriteFailures":0,"hmacFailures24h":0,"scriptFailures24h":2,"version":"0.1.0"}
|
||||
```
|
||||
Used by an external watchdog (a 2-line cron from Rooh's existing setup or tg-stream) that pings every 5 minutes and alerts on non-200 or `ok:false`.
|
||||
|
||||
**Metrics:** lightweight, no Prometheus. Counters in memory, exposed on `/healthz`. Persisted to `/host/root/.caret/repo-enforcer/state.json` on shutdown so they survive restart.
|
||||
|
||||
**Heartbeat:** every 60 seconds the listener writes `{"ts":...,"svc":"repo-enforcer","beat":true}` to the log. Greppable proof of life. If 5 minutes pass without a heartbeat line, the watchdog fires a tg-stream alert.
|
||||
|
||||
**Error alerts via tg-stream:** four alert classes:
|
||||
1. HMAC failure burst (>3 in 1 minute) — possible attack
|
||||
2. Script timeout — possible infra problem
|
||||
3. Script exit non-zero (no escalate-on-error) — possible policy bug
|
||||
4. Listener crash / restart — caught by the supervisor wrapper, posts to tg-stream before re-exec
|
||||
|
||||
All alerts are rate-limited at 1/minute per class to avoid Telegram spam.
|
||||
|
||||
### 8. Gitea API access
|
||||
|
||||
Scripts use `curl` with the sol token. The listener never calls the Gitea API directly — that's a script concern, keeping the listener's surface area small. **Token scope blocker (PLAN.md):** sol token lacks `read:admin`. `audit-webhooks.sh` can list/manage per-repo webhooks (which is enough for Phase 2/3) but cannot manage system-level webhooks. Rooh needs to either elevate the token or accept that we use per-repo registration, not system-level. **Recommendation:** stick with per-repo registration — it's what openclaw already uses, avoids token elevation, and the audit script will catch any sol/* repo missing a hook within 6 hours.
|
||||
|
||||
## Data flows
|
||||
|
||||
### Flow 1 — Rooh creates a new repo
|
||||
|
||||
1. Rooh runs `gitea repo create sol/foo` (or clicks the UI).
|
||||
2. Gitea fires `repository` event with `action: "created"` to all matching webhooks. **In Phase 4:** both openclaw's endpoint and Caret's endpoint receive the event. **In Phase 5+:** only Caret's.
|
||||
3. Caret's nginx forwards POST to `127.0.0.1:18790/hooks/gitea` with raw body intact and bearer header injected.
|
||||
4. Listener reads raw body, computes HMAC-SHA256, compares with `X-Gitea-Signature` (timing-safe). Pass.
|
||||
5. Listener checks bearer header. Pass.
|
||||
6. Listener checks `X-Gitea-Delivery` against dedup cache. Miss → record.
|
||||
7. Listener parses JSON, looks up route: `repository`/`created` → `[post-repo-audit.sh, audit-repo-policies.sh --fix]`.
|
||||
8. Spawns `post-repo-audit.sh` with payload on stdin and env (`CARET_REPO=sol/foo`, `GITEA_API_TOKEN=...`). Script: adds Rooh as admin via Gitea API (idempotent — checks current state first), ensures the Caret webhook exists with the current HMAC secret. Exits 0. Duration ~800ms.
|
||||
9. Spawns `audit-repo-policies.sh --fix`. Script clones repo to a tmp dir, checks for required files, commits any missing ones with the sol bot user, pushes. Exits 0. Duration ~3-4s.
|
||||
10. Listener writes one log line: `{"event":"repository","action":"created","repo":"sol/foo","scripts":[{post-repo-audit:0,807ms},{audit-repo-policies:0,3421ms}],"outcome":"ok"}`.
|
||||
11. Listener returns 200 with run ID. Total time: ~4-5 seconds.
|
||||
12. Rooh sees the new repo with policies applied within ~5 seconds. No tg-stream alert (success path is silent by design — alerts are for failures only).
|
||||
|
||||
### Flow 2 — Policy enforcement fails mid-run (Gitea API 500)
|
||||
|
||||
1. Push event arrives. Listener verifies HMAC, dedups, routes to `[secret-scan.sh, audit-repo-policies.sh --check]`.
|
||||
2. `secret-scan.sh` runs in parallel, exits 0.
|
||||
3. `audit-repo-policies.sh --check` calls Gitea API to fetch the policy template. Gitea returns 500.
|
||||
4. Script retries 3 times with backoff (1s, 2s, 4s). All fail.
|
||||
5. Script exits 1 with stderr = `"gitea api 500 fetching policy template after 3 retries"`.
|
||||
6. Listener sees exit 1, looks up the script's `escalate-on-error` flag. For `audit-repo-policies.sh` this is **false** (Gitea 500 is infra, not judgment). So: log line at `level:"error"`, fire tg-stream alert, return 200 to Gitea.
|
||||
7. Log line: `{"level":"error","event":"push","repo":"sol/foo","scripts":[{secret-scan:0},{audit-repo-policies:1,"stderr":"gitea api 500..."}],"outcome":"script_error","errPath":"audit-repo-policies.sh"}`.
|
||||
8. tg-stream sends Rooh: `[repo-enforcer] sol/foo policy check failed: gitea api 500 fetching policy template after 3 retries (runId r_abc123)`.
|
||||
9. Rooh sees the alert immediately. Greps log by runId for full context. Fixes Gitea or waits it out. The next push will retry; idempotency means no double-application.
|
||||
|
||||
### Flow 3 — Push to main triggers secret-scan, finds a leaked key
|
||||
|
||||
1. Push event arrives. Verify, dedup, route → `[secret-scan.sh, audit-repo-policies.sh --check]`.
|
||||
2. `secret-scan.sh` clones the diff range, scans for private keys (PEM headers, AWS-pattern keys, high-entropy base64 strings), checks `.secret-scan-allowlist`. **Finds a real, non-allowlisted private key.**
|
||||
3. Script exits **42** (judgment escalation) with stdout containing the finding details (path, line number, key fingerprint — never the actual key bytes).
|
||||
4. Listener sees exit 42, calls `enqueue-judgment.sh secret-scan` with the script output as input.
|
||||
5. `enqueue-judgment.sh` writes the case to `/host/root/.caret/judgment-inbox/r_xyz.json` and POSTs to the Channels plugin endpoint at `http://127.0.0.1:18791/channels/gitea-judgment/wake`.
|
||||
6. Channels plugin receives POST, starts a Claude session with prompt: "A secret-scan found this in sol/foo: <details>. Decide: revoke and rotate, or false positive needing allowlist update. Report via tg-stream."
|
||||
7. Listener writes log line at `level:"warn"`: `{"outcome":"escalated","escalation":"secret-scan","judgmentRunId":"r_xyz"}` and returns 200.
|
||||
8. tg-stream alerts Rooh **immediately**: `[repo-enforcer] WARNING — secret found in sol/foo by secret-scan, judgment session r_xyz started`.
|
||||
9. Claude session investigates, takes action (revoke key via Gitea API, post in the issue, ping Rooh on Telegram for confirmation). Reports completion via tg-stream.
|
||||
10. The deterministic path is the safety net: even if the judgment session fails to start, the alert in step 8 already woke Rooh.
|
||||
|
||||
## Storage
|
||||
|
||||
| Store | Path | Format | Purpose | Rotation/cleanup | Corruption mode |
|
||||
|---|---|---|---|---|---|
|
||||
| Structured log | `/host/root/.caret/log/repo-enforcer.log` | JSONL | Audit trail of every event | 50k lines, keep 10 gz'd | New file started; old file flagged via tg-stream |
|
||||
| Dedup cache | `/host/root/.caret/repo-enforcer/dedup.json` | JSON `{deliveryId:ts}` | Replay protection | Entries >24h pruned on read | File deleted and recreated empty (worst case: 24h replay window) |
|
||||
| Secrets | `/host/root/.caret/secrets/*` | raw text, mode 0600 | HMAC, API, bearer tokens | Manual rotation only | Listener refuses to start without all secrets present |
|
||||
| State counters | `/host/root/.caret/repo-enforcer/state.json` | JSON | Counters surviving restart | Truncated on rotation | Reset to zero; not load-bearing |
|
||||
| Judgment inbox | `/host/root/.caret/judgment-inbox/` | JSON files | Pending judgment cases | Removed by channels plugin after ack | Files re-tried on plugin restart |
|
||||
| Lock dir | `/host/root/.caret/repo-enforcer/locks/` | empty files | Per-repo lock for concurrent events | Locks >5min auto-released | Stale locks deleted on next event |
|
||||
|
||||
**Per-repo locks** are file-based, named `{owner}-{repo}.lock`. Acquired before script fan-out, released after. TTL 5 minutes (much shorter than openclaw's 2h because we're not holding for an LLM session, only for scripts). This prevents two near-simultaneous pushes to the same repo from racing in `audit-repo-policies.sh --fix` and creating conflicting commits.
|
||||
|
||||
## Observability
|
||||
|
||||
Already covered in Components §7. Summary of what proves health:
|
||||
- `/healthz` returns 200 and `ok:true` — listener up
|
||||
- Heartbeat log line every 60s — main loop alive
|
||||
- `lastEventTs` recent (or watchdog accepts no events as long as heartbeat is fresh)
|
||||
- `hmacFailures24h` low — no attack noise
|
||||
- `scriptFailures24h` low and Rooh has acknowledged any spikes
|
||||
- tg-stream silent (no news = good news on the deterministic path)
|
||||
|
||||
## Security contract
|
||||
|
||||
**HMAC verification:** every request must include `X-Gitea-Signature` matching `HMAC-SHA256(rawBody, secret)`. Implementation reads raw bytes via Bun's `req.arrayBuffer()` BEFORE any JSON parsing. Comparison is `crypto.timingSafeEqual` over equal-length buffers. **Failure:** return 403, no body, log at `level:"warn"` with `hmac:"fail"` and the source IP. Three failures from the same IP within 60s triggers a tg-stream alert.
|
||||
|
||||
**Bearer token:** nginx injects `Authorization: Bearer ${CARET_BEARER_TOKEN}`. The listener checks it after HMAC. Failure → 401. This is a defense-in-depth layer; HMAC alone is sufficient cryptographically.
|
||||
|
||||
**ACL:** listener binds 127.0.0.1 only. Nginx is the only thing that can reach it. Anything else hitting 18790 directly never gets there.
|
||||
|
||||
**Rotation:** see Components §6.
|
||||
|
||||
**Failed-HMAC POST:** returns 403, logs the incident, dedup cache is NOT updated (so a real retry of the same delivery still works), and an alert fires after burst threshold. The body is discarded — never written to disk in any form.
|
||||
|
||||
## Failure modes and recovery
|
||||
|
||||
| Failure | Detection | Recovery |
|
||||
|---|---|---|
|
||||
| Listener crash | Watchdog misses heartbeat for 5min | Supervisor wrapper auto-restarts; tg-stream alert |
|
||||
| Disk full | Log write fails | Alert via tg-stream alt path; listener stays up; `/healthz` shows `logWriteFailures>0` |
|
||||
| Gitea unreachable | Script exits with retry-exhausted error | tg-stream alert; next webhook event retries naturally |
|
||||
| Telegram unreachable (tg-stream down) | tg-stream POST fails | Listener logs the failed alert and continues; alert is dropped (no alert queue — Rooh greps the log) |
|
||||
| Policy script failure (deterministic) | Exit 1 from script | Log + tg-stream alert; no escalation |
|
||||
| Policy script timeout (>30s) | SIGTERM from listener | Exit 124; logged as timeout; alert fires |
|
||||
| HMAC mismatch | Signature compare fails | 403 returned; alert on burst |
|
||||
| Replay attack | Same delivery ID seen twice | Dedup cache hit; second one logged as `dedup_hit` and skipped |
|
||||
| Duplicate event (Gitea retry) | Same as replay | Same — dedup cache covers both |
|
||||
| Concurrent events same repo | Lock contention | Second event waits up to 5s for lock, then runs; if lock held >5s, second event is queued (file in `locks/queue/`) and runs after release |
|
||||
| Channels plugin down | `enqueue-judgment.sh` POST fails | Judgment file stays in `judgment-inbox/`, channels plugin replays on restart; alert fires immediately so Rooh has the info either way |
|
||||
|
||||
## Parallel-run coexistence with openclaw (Phase 4)
|
||||
|
||||
During Phase 4, every sol/* repo has TWO webhooks: openclaw's existing one (no HMAC) and Caret's new one (with HMAC). Both fire on every event.
|
||||
|
||||
**Idempotency guarantees that prevent double work:**
|
||||
1. **`post-repo-audit.sh`** is idempotent. Adding a collaborator that already exists is a no-op. Ensuring a webhook that already exists is a no-op. So if openclaw added Rooh first, Caret's run is a silent no-op. And vice versa.
|
||||
2. **`audit-repo-policies.sh --fix`** checks file existence before committing. If openclaw already committed the Makefile, Caret sees it and skips. The git commit is content-addressed, so even racing commits resolve to the same content.
|
||||
3. **`secret-scan.sh`** is read-only — scanning twice produces the same result.
|
||||
4. **The dedup cache is per-pipeline.** Caret's cache only protects Caret. We don't try to coordinate with openclaw's dedup.
|
||||
5. **Locks are per-pipeline.** Caret's lock dir is its own; openclaw can't see it. Race between pipelines is resolved by git-level conflict (one wins, the other rebases). In practice this won't happen because the policy fix is fast and idempotent.
|
||||
|
||||
**Reconciliation rules:** the Phase 4 acceptance criterion is "72 hours of dual-run with zero unreconciled divergences." A divergence is any case where Caret's log shows a different outcome than openclaw's. Reconciliation = Caret reads both pipelines' logs nightly, diffs them, alerts on disagreements.
|
||||
|
||||
**One thing the dual-run does NOT cover:** judgment escalations. In Phase 4, judgment escalations from Caret will fire even though openclaw might also be handling the case its own way. Acceptable — judgment cases are rare (errors and explicit opt-ins) and Rooh will see both notifications and can manually deconflict. After Phase 5 cut-over, only Caret's judgment fires.
|
||||
|
||||
## Rollback procedure
|
||||
|
||||
**Single command:** `bash /host/root/.caret/repo-enforcer/rollback.sh`
|
||||
|
||||
What it does:
|
||||
1. Stops the Caret listener (`tmux kill-session -t caret-repo-enforcer`)
|
||||
2. Removes Caret's webhooks from every sol/* repo (loops via Gitea API using sol token)
|
||||
3. Verifies openclaw's gateway container is up (`docker ps | grep openclaw-openclaw-gateway-1`)
|
||||
4. If openclaw container is down, starts it (`docker start openclaw-openclaw-gateway-1`)
|
||||
5. Sends tg-stream confirmation: `[rollback] Caret repo-enforcer stopped, openclaw restored, N webhooks removed`
|
||||
|
||||
**Time to rollback:** ~30 seconds (mostly the loop over sol/* repos to deregister webhooks).
|
||||
|
||||
**State lost:** none. Caret's logs stay on disk for forensics. The dedup cache is irrelevant after rollback. Openclaw's pipeline is unchanged because we never touched it. The only thing that's "gone" is the Caret webhooks on Gitea, and `audit-webhooks.sh` can re-add them if we resume.
|
||||
|
||||
**Tested before cut-over:** Phase 4 includes a rollback drill. We run rollback once on the test repos before widening to all of sol/*.
|
||||
|
||||
## Dependencies on openclaw
|
||||
|
||||
**Hard dependencies (the migration cannot remove these in Phase 1-5 scope):**
|
||||
1. **None for the deterministic path.** Every script in `/host/root/.caret/tools/` runs against Gitea directly with the sol token. No openclaw imports, no plugin SDK, no session storage.
|
||||
2. **None for the listener.** Bun + stdlib + nginx. Nothing openclaw.
|
||||
3. **None for the judgment path** if Channels plugin is the chosen mechanism. Channels is a Claude Code native primitive, not an openclaw thing.
|
||||
|
||||
**Soft dependencies (things openclaw still owns that we don't replicate):**
|
||||
- The 155 projects in `/root/.openclaw/workspace/projects/`. Untouched.
|
||||
- The workspace memory system. Untouched.
|
||||
- Sub-agent orchestration (Manager/Worker spawn). Untouched.
|
||||
- The delivery system to Mattermost/Slack/etc. We use tg-stream instead.
|
||||
- The 28-item heartbeat checklist on the main agent. Untouched.
|
||||
|
||||
**Dependencies needing Rooh's confirmation:**
|
||||
1. Gitea token scope. Sol token currently lacks `read:admin`. **Recommendation:** stay per-repo. **Alternative:** Rooh elevates the token, we get system-level webhooks. (Phase 2 blocker only if Rooh wants system-level.)
|
||||
2. nginx routing. We need a new path `/hooks/caret/gitea` on `slack.solio.tech` forwarding to `127.0.0.1:18790` with raw body intact. **Recommendation:** Rooh adds the nginx config in Phase 2. **Alternative:** Caret listener takes its own port directly via a separate hostname.
|
||||
3. Channels plugin discovery path. Need to confirm `/host/root/.caret/channels/` is where Claude Code looks for plugins, or whether we need a different path.
|
||||
|
||||
## Deliberate non-goals
|
||||
|
||||
This design does NOT:
|
||||
- Replace openclaw's gateway, plugin SDK, session storage, delivery system, tool policy resolution, RBAC, or control UI.
|
||||
- Replace openclaw's heartbeat scheduler or cron service. Our cron needs (`audit-webhooks.sh` periodic) are handled by a single 6h trigger registered separately.
|
||||
- Replicate openclaw's Manager/Worker/Spawner hierarchy.
|
||||
- Manage 155 projects' workspaces.
|
||||
- Provide a web UI. `/healthz` returns JSON; that's the UI.
|
||||
- Replace `gitea-worker`, `coder-agent`, `god-agent`, or any other named openclaw agent.
|
||||
- Implement RBAC scopes. The bearer token + HMAC are flat — anyone holding the secret can fire any event. This is acceptable because the secret is held by Gitea only.
|
||||
- Try to be model-agnostic, multi-tenant, or generally reusable. It's a Caret-owned, sol-namespace, single-purpose tool.
|
||||
|
||||
If Rooh wants any of these, that's a scope expansion that should bump the design back to draft.
|
||||
126
FEATURE-PARITY-TESTS.md
Normal file
126
FEATURE-PARITY-TESTS.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# Feature parity test list
|
||||
|
||||
Every test must pass on the Caret replacement before cut-over (Phase 5). Tests are ordered by criticality — tier 1 is "must pass", tier 2 is "should pass", tier 3 is "nice to have". Sources cited inline: R01 = RESEARCH-01-gitea-webhooks-deep-read.md, R02 = RESEARCH-02-gateway-internals.md, R03 = RESEARCH-03-live-state-audit.md.
|
||||
|
||||
## Tier 1 — mandatory
|
||||
|
||||
### Repo creation / bootstrap
|
||||
|
||||
- [ ] T1.01 new sol/* repo created → Rooh (user id 29) added as admin collaborator within 10s [R01 §Tools fan-out, post-repo-audit.sh]
|
||||
- [ ] T1.02 new sol/* repo created → Caret webhook registered on the repo within 10s, URL points to Caret listener, content-type application/json [R01 §Tools fan-out, R03 §Registered Gitea webhooks]
|
||||
- [ ] T1.03 new sol/* repo created → required policy files (Makefile, .editorconfig, .prettierrc, .prettierignore, .dockerignore, .gitignore, tools/secret-scan.sh) present on default branch within 30s [R01 audit-repo-policies.sh]
|
||||
- [ ] T1.04 repo creation handler is idempotent — re-firing the same `repository.create` event does NOT create duplicate collaborator entries or duplicate webhooks [PLAN §Risks idempotency]
|
||||
- [ ] T1.05 repo creation handler runs as a pure script with zero LLM tokens consumed (asserted by zero entries in any token-spend log for that delivery id) [R01 post-repo-audit.sh "zero tokens"]
|
||||
|
||||
### Authentication / ingress
|
||||
|
||||
- [ ] T1.06 HTTPS POST to listener with valid HMAC-SHA256 over the raw body using the configured webhook secret → 200, processed [R01 §HMAC recipe]
|
||||
- [ ] T1.07 POST with wrong HMAC signature → rejected 403 within 50ms, body parsing skipped, "hmac_failed" line in log [R01 §HMAC recipe]
|
||||
- [ ] T1.08 POST missing `X-Gitea-Signature` header → rejected 403 with "missing_signature" log line [R01 §HMAC recipe]
|
||||
- [ ] T1.09 POST with valid HMAC but non-Gitea content-type → rejected 415 [R03 webhook content-type]
|
||||
- [ ] T1.10 raw body must be available to verifier — listener does NOT JSON-parse before HMAC verification (asserted by sending malformed JSON with valid HMAC and observing 200 + "unparseable_body" log) [R01 §HMAC recipe gotcha]
|
||||
- [ ] T1.11 timing-safe HMAC compare — sending one-byte-off signatures over 1000 requests shows constant-time response (variance < 5ms) [R01 §HMAC recipe `timingSafeEqual`]
|
||||
- [ ] T1.12 listener bound only to localhost OR fronted by nginx ACL — direct connection from non-allowlisted host refused [R01 §Ingress path]
|
||||
|
||||
### Event routing
|
||||
|
||||
- [ ] T1.13 `push` event to non-default branch → recorded in log, no enforcement scripts fired [R01 §Transform phases push]
|
||||
- [ ] T1.14 `push` to main/master → triggers secret-scan and policy re-check [R01 §Transform phases]
|
||||
- [ ] T1.15 `issues.opened` with title prefix `[IMPLEMENT]` from sol with valid `<!-- xen-spawn-sig:HMAC:TIMESTAMP -->` → spawn signature verified, dispatch enqueued [R01 §Transform phases issues.opened]
|
||||
- [ ] T1.16 `issues.opened` `[IMPLEMENT]` from sol with stale (>2h) signature → rejected with "spawn_sig_expired" [R01 gotcha 3]
|
||||
- [ ] T1.17 `issues.opened` `[IMPLEMENT]` from sol with invalid HMAC → rejected with "spawn_sig_invalid", incident written to incidents.jsonl [R01 §Transform phases]
|
||||
- [ ] T1.18 `issue_comment` containing approval word from Rooh (id 29) → lock acquired, EXECUTE_PLAN dispatched [R01 §Transform phases issue_comment]
|
||||
- [ ] T1.19 `issue_comment` approval word from non-Rooh → ignored, log line "approval_ignored_non_owner" [R01 §Trust level]
|
||||
- [ ] T1.20 `issue_comment` approval word from sol account → ignored (sol is contributor only) [R01 gotcha 6]
|
||||
- [ ] T1.21 events from `clawbot` sender → silently dropped (loop prevention) [R01 §Validation gates]
|
||||
- [ ] T1.22 issue body containing `<!-- openclaw-agent -->` (or Caret equivalent) → silently dropped (echo suppression) [R01 §Validation gates]
|
||||
|
||||
### Idempotency / dedup
|
||||
|
||||
- [ ] T1.23 duplicate delivery (same `X-Gitea-Delivery` id within 24h) → returns 200 but skips processing, log line "dedup_hit" [R01 §Ingress dedup, R01 gotcha 4]
|
||||
- [ ] T1.24 dedup cache persisted to disk and survives listener restart [R01 §Ingress dedup]
|
||||
- [ ] T1.25 dedup cache trims entries older than 24h on each write [R01 §Ingress dedup]
|
||||
|
||||
### Locking / concurrency
|
||||
|
||||
- [ ] T1.26 concurrent `issue_comment` events on the same `(owner, repo, issue)` → only one acquires the lock; second sees "lock_held" and is queued or dropped per policy [R01 §Session lock]
|
||||
- [ ] T1.27 lock file TTL is 2h; expired lock is reclaimable [R01 §Session lock]
|
||||
- [ ] T1.28 issue closed while locked → transitions to IS_DONE with 5min grace before lock release [R01 §Transform phases, R01 gotcha 2]
|
||||
- [ ] T1.29 rate limit: 6th concurrent agent dispatch for the same agent id → rejected with "rate_limited" [R01 gotcha 7]
|
||||
|
||||
### Observability / audit
|
||||
|
||||
- [ ] T1.30 every accepted event produces exactly one line in `audit.jsonl` containing `{ts, delivery_id, event, repo, sender, decision}` [R01 §Logging]
|
||||
- [ ] T1.31 every rejected event also produces an audit line with `decision: rejected` and reason [R01 §Logging, PLAN §Risks silent drops]
|
||||
- [ ] T1.32 incidents (HMAC fail, spawn sig fail, script error) appended to `incidents.jsonl` [R01 §Logging]
|
||||
|
||||
### Rollback
|
||||
|
||||
- [ ] T1.33 disabling Caret listener and re-enabling openclaw gateway restores end-to-end pipeline within 60s, verified by canary repo create [PLAN §Phase 5 C5.3]
|
||||
- [ ] T1.34 Caret listener can be stopped with one command and pending in-flight requests drained or 503'd cleanly [PLAN §Phase 5]
|
||||
|
||||
## Tier 2 — should pass
|
||||
|
||||
### Push / scan / policy
|
||||
|
||||
- [ ] T2.01 push to main triggers `secret-scan.sh`; a planted private-key blob in the diff creates a Gitea issue labeled `security` within 30s [R01 secret-scan.sh]
|
||||
- [ ] T2.02 push to main with no findings → exit 0, audit line "scan_clean", no issue created [R01 secret-scan.sh]
|
||||
- [ ] T2.03 secret-scan respects `.secret-scan-allowlist` — allowlisted hash is not flagged [R01 secret-scan.sh]
|
||||
- [ ] T2.04 `audit-repo-policies.sh --fix` re-applies missing files on a repo where one was deleted, within next 6h heartbeat [R01 audit-repo-policies.sh]
|
||||
- [ ] T2.05 `audit-webhooks.sh --fix` recreates a deleted webhook within next 15min check [R01 audit-webhooks.sh]
|
||||
- [ ] T2.06 push that touches `tools/secret-scan.sh` itself → policy re-check still passes (file is part of policy template) [R01 audit-repo-policies.sh]
|
||||
|
||||
### Spawn pipeline
|
||||
|
||||
- [ ] T2.07 `[IMPLEMENT]` issue with valid sig → `spawn-manager.sh` invoked, project workspace `PROJ-XXX-*` created [R01 spawn-manager.sh]
|
||||
- [ ] T2.08 `precomputeSpawnParams` execSync timeout (30s) → falls back to text-directive without crashing the listener [R01 gotcha 9]
|
||||
- [ ] T2.09 `asyncDispatchToSpawner` failure (spawner down) → dispatch failure recorded in incidents.jsonl, NOT a 500 to Gitea [R01 gotcha 8]
|
||||
- [ ] T2.10 `check-implement-orphans.sh` equivalent runs every 15min and detects stale pending spawn files older than 2h [R01 check-implement-orphans.sh]
|
||||
|
||||
### Trust levels
|
||||
|
||||
- [ ] T2.11 sender id 29 → trust=owner; collaborator → trust=contributor; unknown → trust=readonly [R01 §Trust level detection]
|
||||
- [ ] T2.12 readonly sender events → no script fan-out, only audit log [R01 §Trust level detection]
|
||||
|
||||
### Heartbeat / cron
|
||||
|
||||
- [ ] T2.13 6h policy sweep job runs on schedule and exits 0 [PLAN §Phase 4, R03 webhook-verify]
|
||||
- [ ] T2.14 15min webhook-audit job runs on schedule and exits 0 [R01 audit-webhooks.sh]
|
||||
- [ ] T2.15 cron job failure increments a consecutive-failure counter; ≥3 consecutive failures posts a Telegram alert [R03 §Critical finding cron failures]
|
||||
- [ ] T2.16 webhook-verify E2E canary (synthetic event roundtrip) succeeds every 6h [R03 webhook-verify]
|
||||
|
||||
### Gitea API hygiene
|
||||
|
||||
- [ ] T2.17 Gitea API call failure (5xx) → retried with exponential backoff up to 3 attempts before recording incident [PLAN §Risks]
|
||||
- [ ] T2.18 Gitea API rate-limit response (429) → backs off per Retry-After header, no incident on first occurrence [R02 §Replacement difficulty]
|
||||
- [ ] T2.19 token rotation: changing the Gitea token in config and SIGHUP'ing the listener takes effect without restart [PLAN §Risks HMAC secret management]
|
||||
|
||||
### Concurrency edges
|
||||
|
||||
- [ ] T2.20 two `repository.create` events for the same repo arriving within 100ms → exactly one bootstrap run, second deduped [T1.04 + dedup]
|
||||
- [ ] T2.21 listener under 50 req/s sustained for 60s → no dropped events, p99 latency < 500ms [R02 §Hook API]
|
||||
- [ ] T2.22 lock acquisition under contention is fair-ish — no event waits >5min behind a single 2h lock when policy says to drop, not queue [R01 §Session lock]
|
||||
|
||||
### Delivery / alerting
|
||||
|
||||
- [ ] T2.23 critical incident (HMAC failure storm: >10 in 1min) → Telegram alert posted via tg-stream within 30s [PLAN §Phase 3]
|
||||
- [ ] T2.24 alert delivery failure (Telegram down) → fallback to Mattermost, then to local incidents.jsonl [R02 §Delivery system]
|
||||
- [ ] T2.25 listener exposes a metrics counter for events_total, events_rejected_total, hmac_failures_total [R02 §Hook API observability]
|
||||
|
||||
## Tier 3 — nice to have
|
||||
|
||||
- [ ] T3.01 listener exposes `/health` returning `{"ok":true,"uptime_s":N,"last_event_ms":N}` [R02 §HTTP endpoints health]
|
||||
- [ ] T3.02 listener exposes `/ready` returning 503 until dedup cache loaded and Gitea token validated [R02 §HTTP endpoints]
|
||||
- [ ] T3.03 audit.jsonl rotated by line count (configurable, default 100k lines) — rotation happens without losing in-flight writes [PLAN B2.5]
|
||||
- [ ] T3.04 audit.jsonl old segments gzipped after rotation [PLAN B2.5]
|
||||
- [ ] T3.05 listener restart replays no events (dedup cache prevents) and reports start time in /health [R01 §Ingress dedup persisted]
|
||||
- [ ] T3.06 structured log lines parseable as single-line JSON, no multi-line stack traces [PLAN B2.5]
|
||||
- [ ] T3.07 listener handles SIGTERM gracefully — finishes in-flight, refuses new, exits within 10s [PLAN §Phase 5]
|
||||
- [ ] T3.08 dry-run mode (`CARET_DRYRUN=1`) logs the script that *would* run without executing [R02 §Tools invoke dryRun]
|
||||
- [ ] T3.09 admin API endpoint to list registered Gitea webhooks across all sol/* repos in one call (requires admin token) [R03 §Registered Gitea webhooks token scope]
|
||||
- [ ] T3.10 manual replay endpoint: POST `/replay/{delivery_id}` re-processes a stored event bypassing dedup [PLAN §Risks idempotency]
|
||||
- [ ] T3.11 listener config hot-reload on SIGHUP without restart [R02 §Hot-reload config]
|
||||
- [ ] T3.12 PR (`pull_request`) events trigger same policy checks as push to main [R01 §Transform phases]
|
||||
- [ ] T3.13 webhook secret rotation runbook: rotate, deploy, verify, rollback documented and tested [PLAN §Risks HMAC secret management]
|
||||
- [ ] T3.14 listener supports both `Authorization: Bearer` AND `X-Gitea-Signature` simultaneously for migration window [R01 §Ingress path layered auth]
|
||||
- [ ] T3.15 cost report: weekly summary of LLM tokens spent by judgment-path wakeups vs deterministic-path zero-cost runs [PLAN §Phase 3 cost hygiene]
|
||||
39
Makefile
Normal file
39
Makefile
Normal file
@@ -0,0 +1,39 @@
|
||||
BUMP ?= patch
|
||||
|
||||
.PHONY: check test lint fmt fmt-check secret-scan hooks release
|
||||
|
||||
check: lint fmt-check secret-scan test
|
||||
|
||||
test:
|
||||
@echo "No tests configured for docs repo"
|
||||
|
||||
lint:
|
||||
@echo "No linter configured for docs repo"
|
||||
|
||||
fmt:
|
||||
@echo "No formatter configured for docs repo"
|
||||
|
||||
fmt-check:
|
||||
@echo "No format check configured for docs repo"
|
||||
|
||||
secret-scan:
|
||||
bash tools/secret-scan.sh .
|
||||
|
||||
hooks:
|
||||
mkdir -p .git/hooks && printf '#!/bin/sh\nmake check' > .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit
|
||||
|
||||
release:
|
||||
@current=$$(git describe --tags --abbrev=0 2>/dev/null || echo "v0.0.0"); \
|
||||
major=$$(echo $$current | sed 's/^v//' | cut -d. -f1); \
|
||||
minor=$$(echo $$current | sed 's/^v//' | cut -d. -f2); \
|
||||
patch=$$(echo $$current | sed 's/^v//' | cut -d. -f3); \
|
||||
case "$(BUMP)" in \
|
||||
major) major=$$((major+1)); minor=0; patch=0 ;; \
|
||||
minor) minor=$$((minor+1)); patch=0 ;; \
|
||||
patch) patch=$$((patch+1)) ;; \
|
||||
*) echo "BUMP must be patch, minor, or major"; exit 1 ;; \
|
||||
esac; \
|
||||
next="v$$major.$$minor.$$patch"; \
|
||||
echo "Tagging $$next (was $$current)"; \
|
||||
git tag -a "$$next" -m "Release $$next"; \
|
||||
git push origin "$$next"
|
||||
69
tools/secret-scan.sh
Executable file
69
tools/secret-scan.sh
Executable file
@@ -0,0 +1,69 @@
|
||||
#!/usr/bin/env bash
|
||||
# secret-scan.sh — Scans for private keys and high-entropy secrets
|
||||
# Usage: bash tools/secret-scan.sh [directory]
|
||||
# Uses .secret-scan-allowlist for false positives (one file path per line)
|
||||
|
||||
set -e
|
||||
|
||||
SCAN_DIR="${1:-.}"
|
||||
ALLOWLIST=".secret-scan-allowlist"
|
||||
FINDINGS=0
|
||||
|
||||
# Build find exclusions
|
||||
EXCLUDES=(-not -path "*/node_modules/*" -not -path "*/.git/*" -not -path "*/coverage/*" -not -path "*/dist/*")
|
||||
|
||||
# Load allowlist
|
||||
ALLOWLIST_PATHS=()
|
||||
if [ -f "$ALLOWLIST" ]; then
|
||||
while IFS= read -r line || [ -n "$line" ]; do
|
||||
[[ "$line" =~ ^#.*$ || -z "$line" ]] && continue
|
||||
ALLOWLIST_PATHS+=("$line")
|
||||
done < "$ALLOWLIST"
|
||||
fi
|
||||
|
||||
is_allowed() {
|
||||
local file="$1"
|
||||
for allowed in "${ALLOWLIST_PATHS[@]}"; do
|
||||
if [[ "$file" == *"$allowed"* ]]; then
|
||||
return 0
|
||||
fi
|
||||
done
|
||||
return 1
|
||||
}
|
||||
|
||||
echo "Scanning $SCAN_DIR for secrets..."
|
||||
|
||||
# Scan for private keys
|
||||
while IFS= read -r file; do
|
||||
[ -f "$file" ] || continue
|
||||
is_allowed "$file" && continue
|
||||
if grep -qE '-----BEGIN (RSA |EC |OPENSSH |DSA )?PRIVATE KEY-----' "$file" 2>/dev/null; then
|
||||
echo "FINDING [private-key]: $file"
|
||||
FINDINGS=$((FINDINGS + 1))
|
||||
fi
|
||||
done < <(find "$SCAN_DIR" "${EXCLUDES[@]}" -type f)
|
||||
|
||||
# Scan for high-entropy hex strings (40+ chars)
|
||||
while IFS= read -r file; do
|
||||
[ -f "$file" ] || continue
|
||||
is_allowed "$file" && continue
|
||||
if grep -qE '[0-9a-f]{40,}' "$file" 2>/dev/null; then
|
||||
# Filter out common false positives (git SHAs in lock files, etc.)
|
||||
BASENAME=$(basename "$file")
|
||||
if [[ "$BASENAME" != "package-lock.json" && "$BASENAME" != "*.lock" ]]; then
|
||||
MATCHES=$(grep -oE '[0-9a-f]{40,}' "$file" 2>/dev/null || true)
|
||||
if [ -n "$MATCHES" ]; then
|
||||
echo "FINDING [high-entropy-hex]: $file"
|
||||
FINDINGS=$((FINDINGS + 1))
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
done < <(find "$SCAN_DIR" "${EXCLUDES[@]}" -type f -not -name "package-lock.json" -not -name "*.lock")
|
||||
|
||||
if [ "$FINDINGS" -gt 0 ]; then
|
||||
echo "secret-scan: $FINDINGS finding(s) — FAIL"
|
||||
exit 1
|
||||
else
|
||||
echo "secret-scan: clean — PASS"
|
||||
exit 0
|
||||
fi
|
||||
Reference in New Issue
Block a user