Files
openclaw-to-caret-migration/FEATURE-PARITY-TESTS.md

11 KiB

Feature parity test list

Every test must pass on the Caret replacement before cut-over (Phase 5). Tests are ordered by criticality — tier 1 is "must pass", tier 2 is "should pass", tier 3 is "nice to have". Sources cited inline: R01 = RESEARCH-01-gitea-webhooks-deep-read.md, R02 = RESEARCH-02-gateway-internals.md, R03 = RESEARCH-03-live-state-audit.md.

Tier 1 — mandatory

Repo creation / bootstrap

  • T1.01 new sol/* repo created → Rooh (user id 29) added as admin collaborator within 10s [R01 §Tools fan-out, post-repo-audit.sh]
  • T1.02 new sol/* repo created → Caret webhook registered on the repo within 10s, URL points to Caret listener, content-type application/json [R01 §Tools fan-out, R03 §Registered Gitea webhooks]
  • T1.03 new sol/* repo created → required policy files (Makefile, .editorconfig, .prettierrc, .prettierignore, .dockerignore, .gitignore, tools/secret-scan.sh) present on default branch within 30s [R01 audit-repo-policies.sh]
  • T1.04 repo creation handler is idempotent — re-firing the same repository.create event does NOT create duplicate collaborator entries or duplicate webhooks [PLAN §Risks idempotency]
  • T1.05 repo creation handler runs as a pure script with zero LLM tokens consumed (asserted by zero entries in any token-spend log for that delivery id) [R01 post-repo-audit.sh "zero tokens"]

Authentication / ingress

  • T1.06 HTTPS POST to listener with valid HMAC-SHA256 over the raw body using the configured webhook secret → 200, processed [R01 §HMAC recipe]
  • T1.07 POST with wrong HMAC signature → rejected 403 within 50ms, body parsing skipped, "hmac_failed" line in log [R01 §HMAC recipe]
  • T1.08 POST missing X-Gitea-Signature header → rejected 403 with "missing_signature" log line [R01 §HMAC recipe]
  • T1.09 POST with valid HMAC but non-Gitea content-type → rejected 415 [R03 webhook content-type]
  • T1.10 raw body must be available to verifier — listener does NOT JSON-parse before HMAC verification (asserted by sending malformed JSON with valid HMAC and observing 200 + "unparseable_body" log) [R01 §HMAC recipe gotcha]
  • T1.11 timing-safe HMAC compare — sending one-byte-off signatures over 1000 requests shows constant-time response (variance < 5ms) [R01 §HMAC recipe timingSafeEqual]
  • T1.12 listener bound only to localhost OR fronted by nginx ACL — direct connection from non-allowlisted host refused [R01 §Ingress path]

Event routing

  • T1.13 push event to non-default branch → recorded in log, no enforcement scripts fired [R01 §Transform phases push]
  • T1.14 push to main/master → triggers secret-scan and policy re-check [R01 §Transform phases]
  • T1.15 issues.opened with title prefix [IMPLEMENT] from sol with valid <!-- xen-spawn-sig:HMAC:TIMESTAMP --> → spawn signature verified, dispatch enqueued [R01 §Transform phases issues.opened]
  • T1.16 issues.opened [IMPLEMENT] from sol with stale (>2h) signature → rejected with "spawn_sig_expired" [R01 gotcha 3]
  • T1.17 issues.opened [IMPLEMENT] from sol with invalid HMAC → rejected with "spawn_sig_invalid", incident written to incidents.jsonl [R01 §Transform phases]
  • T1.18 issue_comment containing approval word from Rooh (id 29) → lock acquired, EXECUTE_PLAN dispatched [R01 §Transform phases issue_comment]
  • T1.19 issue_comment approval word from non-Rooh → ignored, log line "approval_ignored_non_owner" [R01 §Trust level]
  • T1.20 issue_comment approval word from sol account → ignored (sol is contributor only) [R01 gotcha 6]
  • T1.21 events from clawbot sender → silently dropped (loop prevention) [R01 §Validation gates]
  • T1.22 issue body containing <!-- openclaw-agent --> (or Caret equivalent) → silently dropped (echo suppression) [R01 §Validation gates]

Idempotency / dedup

  • T1.23 duplicate delivery (same X-Gitea-Delivery id within 24h) → returns 200 but skips processing, log line "dedup_hit" [R01 §Ingress dedup, R01 gotcha 4]
  • T1.24 dedup cache persisted to disk and survives listener restart [R01 §Ingress dedup]
  • T1.25 dedup cache trims entries older than 24h on each write [R01 §Ingress dedup]

Locking / concurrency

  • T1.26 concurrent issue_comment events on the same (owner, repo, issue) → only one acquires the lock; second sees "lock_held" and is queued or dropped per policy [R01 §Session lock]
  • T1.27 lock file TTL is 2h; expired lock is reclaimable [R01 §Session lock]
  • T1.28 issue closed while locked → transitions to IS_DONE with 5min grace before lock release [R01 §Transform phases, R01 gotcha 2]
  • T1.29 rate limit: 6th concurrent agent dispatch for the same agent id → rejected with "rate_limited" [R01 gotcha 7]

Observability / audit

  • T1.30 every accepted event produces exactly one line in audit.jsonl containing {ts, delivery_id, event, repo, sender, decision} [R01 §Logging]
  • T1.31 every rejected event also produces an audit line with decision: rejected and reason [R01 §Logging, PLAN §Risks silent drops]
  • T1.32 incidents (HMAC fail, spawn sig fail, script error) appended to incidents.jsonl [R01 §Logging]

Rollback

  • T1.33 disabling Caret listener and re-enabling openclaw gateway restores end-to-end pipeline within 60s, verified by canary repo create [PLAN §Phase 5 C5.3]
  • T1.34 Caret listener can be stopped with one command and pending in-flight requests drained or 503'd cleanly [PLAN §Phase 5]

Tier 2 — should pass

Push / scan / policy

  • T2.01 push to main triggers secret-scan.sh; a planted private-key blob in the diff creates a Gitea issue labeled security within 30s [R01 secret-scan.sh]
  • T2.02 push to main with no findings → exit 0, audit line "scan_clean", no issue created [R01 secret-scan.sh]
  • T2.03 secret-scan respects .secret-scan-allowlist — allowlisted hash is not flagged [R01 secret-scan.sh]
  • T2.04 audit-repo-policies.sh --fix re-applies missing files on a repo where one was deleted, within next 6h heartbeat [R01 audit-repo-policies.sh]
  • T2.05 audit-webhooks.sh --fix recreates a deleted webhook within next 15min check [R01 audit-webhooks.sh]
  • T2.06 push that touches tools/secret-scan.sh itself → policy re-check still passes (file is part of policy template) [R01 audit-repo-policies.sh]

Spawn pipeline

  • T2.07 [IMPLEMENT] issue with valid sig → spawn-manager.sh invoked, project workspace PROJ-XXX-* created [R01 spawn-manager.sh]
  • T2.08 precomputeSpawnParams execSync timeout (30s) → falls back to text-directive without crashing the listener [R01 gotcha 9]
  • T2.09 asyncDispatchToSpawner failure (spawner down) → dispatch failure recorded in incidents.jsonl, NOT a 500 to Gitea [R01 gotcha 8]
  • T2.10 check-implement-orphans.sh equivalent runs every 15min and detects stale pending spawn files older than 2h [R01 check-implement-orphans.sh]

Trust levels

  • T2.11 sender id 29 → trust=owner; collaborator → trust=contributor; unknown → trust=readonly [R01 §Trust level detection]
  • T2.12 readonly sender events → no script fan-out, only audit log [R01 §Trust level detection]

Heartbeat / cron

  • T2.13 6h policy sweep job runs on schedule and exits 0 [PLAN §Phase 4, R03 webhook-verify]
  • T2.14 15min webhook-audit job runs on schedule and exits 0 [R01 audit-webhooks.sh]
  • T2.15 cron job failure increments a consecutive-failure counter; ≥3 consecutive failures posts a Telegram alert [R03 §Critical finding cron failures]
  • T2.16 webhook-verify E2E canary (synthetic event roundtrip) succeeds every 6h [R03 webhook-verify]

Gitea API hygiene

  • T2.17 Gitea API call failure (5xx) → retried with exponential backoff up to 3 attempts before recording incident [PLAN §Risks]
  • T2.18 Gitea API rate-limit response (429) → backs off per Retry-After header, no incident on first occurrence [R02 §Replacement difficulty]
  • T2.19 token rotation: changing the Gitea token in config and SIGHUP'ing the listener takes effect without restart [PLAN §Risks HMAC secret management]

Concurrency edges

  • T2.20 two repository.create events for the same repo arriving within 100ms → exactly one bootstrap run, second deduped [T1.04 + dedup]
  • T2.21 listener under 50 req/s sustained for 60s → no dropped events, p99 latency < 500ms [R02 §Hook API]
  • T2.22 lock acquisition under contention is fair-ish — no event waits >5min behind a single 2h lock when policy says to drop, not queue [R01 §Session lock]

Delivery / alerting

  • T2.23 critical incident (HMAC failure storm: >10 in 1min) → Telegram alert posted via tg-stream within 30s [PLAN §Phase 3]
  • T2.24 alert delivery failure (Telegram down) → fallback to Mattermost, then to local incidents.jsonl [R02 §Delivery system]
  • T2.25 listener exposes a metrics counter for events_total, events_rejected_total, hmac_failures_total [R02 §Hook API observability]

Tier 3 — nice to have

  • T3.01 listener exposes /health returning {"ok":true,"uptime_s":N,"last_event_ms":N} [R02 §HTTP endpoints health]
  • T3.02 listener exposes /ready returning 503 until dedup cache loaded and Gitea token validated [R02 §HTTP endpoints]
  • T3.03 audit.jsonl rotated by line count (configurable, default 100k lines) — rotation happens without losing in-flight writes [PLAN B2.5]
  • T3.04 audit.jsonl old segments gzipped after rotation [PLAN B2.5]
  • T3.05 listener restart replays no events (dedup cache prevents) and reports start time in /health [R01 §Ingress dedup persisted]
  • T3.06 structured log lines parseable as single-line JSON, no multi-line stack traces [PLAN B2.5]
  • T3.07 listener handles SIGTERM gracefully — finishes in-flight, refuses new, exits within 10s [PLAN §Phase 5]
  • T3.08 dry-run mode (CARET_DRYRUN=1) logs the script that would run without executing [R02 §Tools invoke dryRun]
  • T3.09 admin API endpoint to list registered Gitea webhooks across all sol/* repos in one call (requires admin token) [R03 §Registered Gitea webhooks token scope]
  • T3.10 manual replay endpoint: POST /replay/{delivery_id} re-processes a stored event bypassing dedup [PLAN §Risks idempotency]
  • T3.11 listener config hot-reload on SIGHUP without restart [R02 §Hot-reload config]
  • T3.12 PR (pull_request) events trigger same policy checks as push to main [R01 §Transform phases]
  • T3.13 webhook secret rotation runbook: rotate, deploy, verify, rollback documented and tested [PLAN §Risks HMAC secret management]
  • T3.14 listener supports both Authorization: Bearer AND X-Gitea-Signature simultaneously for migration window [R01 §Ingress path layered auth]
  • T3.15 cost report: weekly summary of LLM tokens spent by judgment-path wakeups vs deterministic-path zero-cost runs [PLAN §Phase 3 cost hygiene]