# ARCHITECTURE.md — current openclaw Gitea slice and migration boundary **Repo:** `sol/openclaw-to-caret-migration` **Date:** 2026-04-06 **Source reports:** - `research/RESEARCH-01-gitea-webhooks-deep-read.md` - `research/RESEARCH-02-gateway-internals.md` - `research/RESEARCH-03-live-state-audit.md` ## Executive summary The migration target is much smaller than a full OpenClaw replacement. OpenClaw today owns a large orchestration platform: gateway auth, session storage, plugin loading, subagent spawning, cron, heartbeat, tool policy enforcement, multi-channel delivery, and the long-lived workspace for 155+ projects. Replacing all of that would be a 4-8 week systems project. But the **Gitea-facing slice** that this migration actually needs is narrower: 1. **Webhook ingress** 2. **Event validation / routing** 3. **Deterministic script fan-out** 4. **Issue workflow gates / lock logic** 5. **Optional judgment wake-up when automation is not enough** That slice can be rebuilt as a small standalone listener plus a handful of copied/adapted scripts. The practical shape is a **600-800 line Bun listener** with raw-body signature verification, dedup, file locks, script dispatch, and structured logs. The live audit also changed the urgency: this is not a clean migration away from a stable system. The current OpenClaw installation is already **degraded**, with 8 of 12 cron jobs failing due to a bad model reference (`claudehack/claude-sonnet-4-6`). That does not directly prove the Gitea webhook path is broken, but it does mean the surrounding automation is already brittle and parts of the verification pipeline are failing. ## Scope boundary ### In scope for this migration - Gitea webhook receiver for repo / issue / comment style events - Authentication of incoming webhook traffic - Deduplication and idempotency checks - Event router - Deterministic script execution for policy enforcement and repo hygiene - File-based issue lock management - Minimal queue / retry behavior where needed - Structured audit logging - Optional handoff into a Claude-native judgment path ### Explicitly out of scope These stay owned by OpenClaw unless Phase 1 expands scope intentionally: - Full gateway RPC / WebSocket protocol - Session transcript storage system - General subagent orchestration framework - Global cron and heartbeat scheduler - Plugin SDK and plugin runtime - Delivery abstraction for Mattermost / Telegram / Discord / WhatsApp - Full tool allowlist inheritance engine - Existing 155-project workspace and project registry - Global memory / archive / compaction machinery ## Current system: end-to-end picture ### Deterministic path today ```text Gitea -> HTTPS POST https://slack.solio.tech/hooks/gitea -> nginx - TLS termination - local forwarding - injects Authorization: Bearer -> OpenClaw gateway /hooks/agent -> gitea-transform.js -> event router -> pure scripts (post-repo-audit, policy audit, security checks, etc.) -> logs / queue state / lock files ``` ### Judgment / agent path today ```text Gitea event -> transform validation and trust checks -> route decision -> if issue workflow requires agent action: precompute spawn params async dispatch to spawner / manager path OpenClaw creates isolated session agent writes back to Gitea / chat surfaces ``` ### Platform services supporting both ```text OpenClaw gateway - auth / bearer validation - hook ingestion - session spawn - tool allowlist resolution - cron service - heartbeat runner - plugin loading - outbound delivery - workspace/session state persistence ``` ## Security model: what exists now ## ## 1) Incoming Gitea webhooks are not protected by Gitea HMAC today This was the most important architecture surprise. Although Gitea supports `X-Gitea-Signature`, the current OpenClaw transform layer does not have access to the raw request body, so it does **not** perform real body-level HMAC verification. The live repo audit also showed the visible repo webhooks have **no secret set**. Current protection is instead layered as: 1. HTTPS via nginx 2. nginx forwarding only to local gateway 3. injected bearer token (`Authorization: Bearer ...`) 4. gateway token validation 5. delivery dedup by `X-Gitea-Delivery` This is workable, but weaker and more indirect than true webhook HMAC. ## 2) Spawn signatures are a separate HMAC system There *is* HMAC in the system, but it protects a different boundary. When `sol` creates an `[IMPLEMENT]` issue, the issue body includes a spawn signature comment: ```html ``` The transform recomputes HMAC-SHA256 over `repo|title|timestamp`, validates it with a local secret, and rejects invalid or stale signatures. This is not webhook authentication. It is an authorization gate for a privileged workflow. ## 3) Trust routing is identity-aware The transform classifies senders into trust levels such as owner, collaborator/contributor, and readonly. That trust level affects: - which agent receives the event - whether approval words are honored - whether a manager spawn may occur - whether an event is ignored as untrusted or looped ## 4) Issue lock files are a core safety mechanism Issue workflows are protected with file locks under a hooks lock directory. Locks have TTL-based behavior, and closed issues move into a short grace state before release. This matters because concurrent comments or duplicate deliveries can otherwise spawn duplicate work. ## Live-state findings that affect architecture ## Current health status: DEGRADED The current OpenClaw deployment is degraded now, not hypothetically later. ### Confirmed problems - `openclaw.json` references a non-existent model alias: `claudehack/claude-sonnet-4-6` - 8 of 12 cron jobs are failing repeatedly - `ws-sync` is failing, so cached repo state is stale - `webhook-verify` is failing, so the pipeline's own end-to-end verification job is unhealthy - failover chains are slow and noisy under API pressure ### Why this matters for migration design - The migration should reduce dependency on fragile global cron/heartbeat behavior - The replacement should make ingress validation and deterministic enforcement stand on their own - The replacement should log every event locally, even when downstream agent work fails - The replacement should avoid hidden couplings to provider/model config where possible ## Current components and responsibilities ### 1) nginx edge Responsibilities today: - TLS termination - forwarding inbound webhook traffic - injecting the gateway bearer token - relying on network locality and host-level topology as part of trust **Migration implication:** The new Caret listener can either: - keep using nginx as the front door and share the bearer-token pattern, or - terminate webhook traffic directly and verify raw-body HMAC itself The second option is better if Rooh wants the replacement to improve security rather than merely preserve behavior. ### 2) OpenClaw gateway Responsibilities today: - receive hook traffic - authenticate requests - dispatch transform logic - spawn agent sessions - run heartbeats and cron jobs - host plugins and outbound delivery - enforce tool policies **Migration implication:** We should not replace the whole gateway. We only need a listener for the Gitea slice. ### 3) `gitea-transform.js` This is the current Gitea event router. It performs: - event-type filtering - dedup checks - trust classification - loop prevention - rate limiting - lock checks - route decisions - script execution for deterministic cases - manager/spawner dispatch for workflow cases - audit logging **Migration implication:** This is the closest thing to the spec for the new listener. The replacement should preserve its behavior selectively, not copy the whole gateway. ### 4) Deterministic script layer Examples found in research: - `post-repo-audit.sh` - `audit-webhooks.sh` - `audit-repo-policies.sh` - `secret-scan.sh` - `check-implement-orphans.sh` - `spawn-manager.sh` These are mostly stateless bash/node tools with path/config coupling. **Migration implication:** Do not rewrite these from scratch unless necessary. Copy/adapt the working ones, strip OpenClaw-specific paths, and make config explicit. ### 5) Session / workflow orchestration OpenClaw provides: - isolated session spawn - role/tool policy resolution - session transcript storage - channel delivery - wake mechanisms **Migration implication:** This is the expensive part to rebuild. Avoid it. Use Claude-native primitives only for the narrow judgment path. ## The minimal replacement architecture The smallest viable Caret-owned architecture is: ```text Gitea -> Caret listener (Bun) - raw body capture - HMAC verify - delivery dedup - trust + routing - file locks - structured logs - script fan-out - optional judgment trigger -> deterministic tools/ -> optional Claude-native wake-up path ``` ### Listener responsibilities The listener should own exactly these jobs: 1. Read raw request body before parsing 2. Verify `X-Gitea-Signature` with timing-safe HMAC compare 3. Parse event metadata and delivery ID 4. Deduplicate by delivery ID 5. Apply event-type filters 6. Classify sender / trust level 7. Enforce loop prevention for agent-authored comments 8. Acquire/check per-issue lock where needed 9. Dispatch deterministic scripts by event type 10. Emit structured JSON logs for all outcomes 11. Optionally trigger a judgment wake-up when deterministic automation cannot decide ### Deterministic script fan-out The likely event map after design review: | Event | Action | |---|---| | `repository.create` | collaborator add + webhook ensure + repo policy baseline | | `push` to protected branch | secret scan + policy re-check | | `issues.opened` on automation-tagged issues | route to gated workflow logic | | `issue_comment` on active workflow issue | approval parsing, lock check, optional wake-up | | unsupported / irrelevant event | log and ignore | This keeps the zero-token path zero-token. ### Judgment path Only use judgment for cases that deterministic automation cannot safely resolve, such as: - ambiguous repo type - policy enforcement failure requiring explanation - explicit request for AI review - human-authored workflow step that needs synthesis rather than a script This should not require recreating OpenClaw's full spawn/orchestration model. The design target should be a small Claude-native wake-up primitive, not a manager framework clone. ## Hard dependencies vs removable dependencies ### Dependencies the new Gitea slice can remove - OpenClaw hook ingestion for Gitea webhooks - OpenClaw transform execution for Gitea routing - reliance on nginx bearer injection as the only authenticity check - OpenClaw-specific queue inbox / lock path layout - OpenClaw-specific script path assumptions ### Dependencies the new slice should keep, at least initially - Gitea itself - existing policy scripts and repo hygiene logic - existing human workflow semantics where already working - OpenClaw-owned broader workspace/project system - OpenClaw-owned non-Gitea cron/heartbeat ecosystem - Claude-native or OpenClaw-native judgment wake-up until a better primitive is chosen ## Data / state the replacement must own The replacement does not need a database. File-backed state is enough. ### Required local state - `logs/events.jsonl` or similar structured event log - `state/dedup.json` for recent delivery IDs - `state/locks/-.lock` for per-issue workflow control - `state/runs/` or similar optional execution receipts - config files for webhook secret, Gitea endpoint, token, allowed repos/users ### Nice-to-have state - replay queue for transient failures - dead-letter folder for malformed events - event latency counters / health summaries ## Architectural differences between current and target state | Concern | Current OpenClaw state | Target Caret state | |---|---|---| | Webhook auth | bearer token + nginx locality | raw-body Gitea HMAC preferred | | Router | transform inside gateway | standalone Bun listener | | Deterministic actions | scripts invoked by transform | same scripts invoked by listener | | Locks | OpenClaw hooks lock dir | Caret-owned lock dir | | Dedup | OpenClaw cache file | Caret-owned dedup state | | Judgment wake-up | OpenClaw session spawn | Claude-native minimal wake-up | | Cron/heartbeat | OpenClaw global scheduler | only if truly needed for this slice | | Workspace ownership | OpenClaw workspace | unchanged unless explicitly expanded | ## Main migration conclusions ### Conclusion 1: do not rebuild OpenClaw That would be a category error. The gateway, plugin runtime, delivery layer, cron/heartbeat engine, and session/orchestration stack are a separate platform project. ### Conclusion 2: rebuild the Gitea ingress/router slice only This is the actual migration target and is small enough to complete quickly. ### Conclusion 3: improve security while migrating The replacement should implement actual raw-body Gitea HMAC verification. The current webhook path does not. ### Conclusion 4: keep deterministic work pure-script The current split is correct. Repo policy and enforcement work should remain fast, cheap, and idempotent. ### Conclusion 5: judgment must be narrow and explicit Do not wake Claude on every webhook. Use it only for ambiguity, escalation, or clearly user-requested reasoning. ### Conclusion 6: design should assume the current system is fragile Because surrounding cron/verification infrastructure is already degraded, the replacement should be independently observable and easy to test without depending on OpenClaw's unhealthy scheduler chain. ## Open questions for Phase 1 design These questions should be answered in `DESIGN.md`. 1. **Ingress topology:** keep nginx in front, or let the Caret listener terminate the webhook directly? 2. **Auth model:** bearer only for parity, or proper Gitea HMAC as the new standard? 3. **Judgment primitive:** Channels plugin, direct Claude Code primitive, or temporary dependency on OpenClaw for wake-up? 4. **Script packaging:** copy the existing scripts wholesale first, or split them into library + thin wrappers? 5. **Repo registration:** per-repo hooks only, or system-level hook once token/admin constraints are solved? 6. **Retry model:** synchronous fire-and-log only, or file-backed retry queue for transient failures? 7. **Observability:** plain JSONL logs only, or add a health endpoint plus counters and replay tooling? 8. **Workflow semantics:** which current issue/comment workflows are worth preserving exactly, and which can be simplified? ## Recommended next step Move to **Phase 1 — Architecture design** with the following framing: - Treat this document as the baseline map of the current system - Design only the **Gitea-facing slice**, not a gateway replacement - Preserve the deterministic/judgment split - Improve webhook authentication with real HMAC - Make observability first-class because the current environment is already degraded That keeps the project in the "days" category instead of letting it sprawl back into a multi-week platform rewrite.