15 KiB
ARCHITECTURE.md — current openclaw Gitea slice and migration boundary
Repo: sol/openclaw-to-caret-migration
Date: 2026-04-06
Source reports:
research/RESEARCH-01-gitea-webhooks-deep-read.mdresearch/RESEARCH-02-gateway-internals.mdresearch/RESEARCH-03-live-state-audit.md
Executive summary
The migration target is much smaller than a full OpenClaw replacement.
OpenClaw today owns a large orchestration platform: gateway auth, session storage, plugin loading, subagent spawning, cron, heartbeat, tool policy enforcement, multi-channel delivery, and the long-lived workspace for 155+ projects. Replacing all of that would be a 4-8 week systems project.
But the Gitea-facing slice that this migration actually needs is narrower:
- Webhook ingress
- Event validation / routing
- Deterministic script fan-out
- Issue workflow gates / lock logic
- Optional judgment wake-up when automation is not enough
That slice can be rebuilt as a small standalone listener plus a handful of copied/adapted scripts. The practical shape is a 600-800 line Bun listener with raw-body signature verification, dedup, file locks, script dispatch, and structured logs.
The live audit also changed the urgency: this is not a clean migration away from a stable system. The current OpenClaw installation is already degraded, with 8 of 12 cron jobs failing due to a bad model reference (claudehack/claude-sonnet-4-6). That does not directly prove the Gitea webhook path is broken, but it does mean the surrounding automation is already brittle and parts of the verification pipeline are failing.
Scope boundary
In scope for this migration
- Gitea webhook receiver for repo / issue / comment style events
- Authentication of incoming webhook traffic
- Deduplication and idempotency checks
- Event router
- Deterministic script execution for policy enforcement and repo hygiene
- File-based issue lock management
- Minimal queue / retry behavior where needed
- Structured audit logging
- Optional handoff into a Claude-native judgment path
Explicitly out of scope
These stay owned by OpenClaw unless Phase 1 expands scope intentionally:
- Full gateway RPC / WebSocket protocol
- Session transcript storage system
- General subagent orchestration framework
- Global cron and heartbeat scheduler
- Plugin SDK and plugin runtime
- Delivery abstraction for Mattermost / Telegram / Discord / WhatsApp
- Full tool allowlist inheritance engine
- Existing 155-project workspace and project registry
- Global memory / archive / compaction machinery
Current system: end-to-end picture
Deterministic path today
Gitea
-> HTTPS POST https://slack.solio.tech/hooks/gitea
-> nginx
- TLS termination
- local forwarding
- injects Authorization: Bearer <OPENCLAW_HOOKS_TOKEN>
-> OpenClaw gateway /hooks/agent
-> gitea-transform.js
-> event router
-> pure scripts (post-repo-audit, policy audit, security checks, etc.)
-> logs / queue state / lock files
Judgment / agent path today
Gitea event
-> transform validation and trust checks
-> route decision
-> if issue workflow requires agent action:
precompute spawn params
async dispatch to spawner / manager path
OpenClaw creates isolated session
agent writes back to Gitea / chat surfaces
Platform services supporting both
OpenClaw gateway
- auth / bearer validation
- hook ingestion
- session spawn
- tool allowlist resolution
- cron service
- heartbeat runner
- plugin loading
- outbound delivery
- workspace/session state persistence
Security model: what exists now
1) Incoming Gitea webhooks are not protected by Gitea HMAC today
This was the most important architecture surprise.
Although Gitea supports X-Gitea-Signature, the current OpenClaw transform layer does not have access to the raw request body, so it does not perform real body-level HMAC verification. The live repo audit also showed the visible repo webhooks have no secret set.
Current protection is instead layered as:
- HTTPS via nginx
- nginx forwarding only to local gateway
- injected bearer token (
Authorization: Bearer ...) - gateway token validation
- delivery dedup by
X-Gitea-Delivery
This is workable, but weaker and more indirect than true webhook HMAC.
2) Spawn signatures are a separate HMAC system
There is HMAC in the system, but it protects a different boundary.
When sol creates an [IMPLEMENT] issue, the issue body includes a spawn signature comment:
<!-- xen-spawn-sig:HMAC:TIMESTAMP -->
The transform recomputes HMAC-SHA256 over repo|title|timestamp, validates it with a local secret, and rejects invalid or stale signatures. This is not webhook authentication. It is an authorization gate for a privileged workflow.
3) Trust routing is identity-aware
The transform classifies senders into trust levels such as owner, collaborator/contributor, and readonly. That trust level affects:
- which agent receives the event
- whether approval words are honored
- whether a manager spawn may occur
- whether an event is ignored as untrusted or looped
4) Issue lock files are a core safety mechanism
Issue workflows are protected with file locks under a hooks lock directory. Locks have TTL-based behavior, and closed issues move into a short grace state before release. This matters because concurrent comments or duplicate deliveries can otherwise spawn duplicate work.
Live-state findings that affect architecture
Current health status: DEGRADED
The current OpenClaw deployment is degraded now, not hypothetically later.
Confirmed problems
openclaw.jsonreferences a non-existent model alias:claudehack/claude-sonnet-4-6- 8 of 12 cron jobs are failing repeatedly
ws-syncis failing, so cached repo state is stalewebhook-verifyis failing, so the pipeline's own end-to-end verification job is unhealthy- failover chains are slow and noisy under API pressure
Why this matters for migration design
- The migration should reduce dependency on fragile global cron/heartbeat behavior
- The replacement should make ingress validation and deterministic enforcement stand on their own
- The replacement should log every event locally, even when downstream agent work fails
- The replacement should avoid hidden couplings to provider/model config where possible
Current components and responsibilities
1) nginx edge
Responsibilities today:
- TLS termination
- forwarding inbound webhook traffic
- injecting the gateway bearer token
- relying on network locality and host-level topology as part of trust
Migration implication: The new Caret listener can either:
- keep using nginx as the front door and share the bearer-token pattern, or
- terminate webhook traffic directly and verify raw-body HMAC itself
The second option is better if Rooh wants the replacement to improve security rather than merely preserve behavior.
2) OpenClaw gateway
Responsibilities today:
- receive hook traffic
- authenticate requests
- dispatch transform logic
- spawn agent sessions
- run heartbeats and cron jobs
- host plugins and outbound delivery
- enforce tool policies
Migration implication: We should not replace the whole gateway. We only need a listener for the Gitea slice.
3) gitea-transform.js
This is the current Gitea event router. It performs:
- event-type filtering
- dedup checks
- trust classification
- loop prevention
- rate limiting
- lock checks
- route decisions
- script execution for deterministic cases
- manager/spawner dispatch for workflow cases
- audit logging
Migration implication: This is the closest thing to the spec for the new listener. The replacement should preserve its behavior selectively, not copy the whole gateway.
4) Deterministic script layer
Examples found in research:
post-repo-audit.shaudit-webhooks.shaudit-repo-policies.shsecret-scan.shcheck-implement-orphans.shspawn-manager.sh
These are mostly stateless bash/node tools with path/config coupling.
Migration implication: Do not rewrite these from scratch unless necessary. Copy/adapt the working ones, strip OpenClaw-specific paths, and make config explicit.
5) Session / workflow orchestration
OpenClaw provides:
- isolated session spawn
- role/tool policy resolution
- session transcript storage
- channel delivery
- wake mechanisms
Migration implication: This is the expensive part to rebuild. Avoid it. Use Claude-native primitives only for the narrow judgment path.
The minimal replacement architecture
The smallest viable Caret-owned architecture is:
Gitea
-> Caret listener (Bun)
- raw body capture
- HMAC verify
- delivery dedup
- trust + routing
- file locks
- structured logs
- script fan-out
- optional judgment trigger
-> deterministic tools/
-> optional Claude-native wake-up path
Listener responsibilities
The listener should own exactly these jobs:
- Read raw request body before parsing
- Verify
X-Gitea-Signaturewith timing-safe HMAC compare - Parse event metadata and delivery ID
- Deduplicate by delivery ID
- Apply event-type filters
- Classify sender / trust level
- Enforce loop prevention for agent-authored comments
- Acquire/check per-issue lock where needed
- Dispatch deterministic scripts by event type
- Emit structured JSON logs for all outcomes
- Optionally trigger a judgment wake-up when deterministic automation cannot decide
Deterministic script fan-out
The likely event map after design review:
| Event | Action |
|---|---|
repository.create |
collaborator add + webhook ensure + repo policy baseline |
push to protected branch |
secret scan + policy re-check |
issues.opened on automation-tagged issues |
route to gated workflow logic |
issue_comment on active workflow issue |
approval parsing, lock check, optional wake-up |
| unsupported / irrelevant event | log and ignore |
This keeps the zero-token path zero-token.
Judgment path
Only use judgment for cases that deterministic automation cannot safely resolve, such as:
- ambiguous repo type
- policy enforcement failure requiring explanation
- explicit request for AI review
- human-authored workflow step that needs synthesis rather than a script
This should not require recreating OpenClaw's full spawn/orchestration model. The design target should be a small Claude-native wake-up primitive, not a manager framework clone.
Hard dependencies vs removable dependencies
Dependencies the new Gitea slice can remove
- OpenClaw hook ingestion for Gitea webhooks
- OpenClaw transform execution for Gitea routing
- reliance on nginx bearer injection as the only authenticity check
- OpenClaw-specific queue inbox / lock path layout
- OpenClaw-specific script path assumptions
Dependencies the new slice should keep, at least initially
- Gitea itself
- existing policy scripts and repo hygiene logic
- existing human workflow semantics where already working
- OpenClaw-owned broader workspace/project system
- OpenClaw-owned non-Gitea cron/heartbeat ecosystem
- Claude-native or OpenClaw-native judgment wake-up until a better primitive is chosen
Data / state the replacement must own
The replacement does not need a database. File-backed state is enough.
Required local state
logs/events.jsonlor similar structured event logstate/dedup.jsonfor recent delivery IDsstate/locks/<repo>-<issue>.lockfor per-issue workflow controlstate/runs/or similar optional execution receipts- config files for webhook secret, Gitea endpoint, token, allowed repos/users
Nice-to-have state
- replay queue for transient failures
- dead-letter folder for malformed events
- event latency counters / health summaries
Architectural differences between current and target state
| Concern | Current OpenClaw state | Target Caret state |
|---|---|---|
| Webhook auth | bearer token + nginx locality | raw-body Gitea HMAC preferred |
| Router | transform inside gateway | standalone Bun listener |
| Deterministic actions | scripts invoked by transform | same scripts invoked by listener |
| Locks | OpenClaw hooks lock dir | Caret-owned lock dir |
| Dedup | OpenClaw cache file | Caret-owned dedup state |
| Judgment wake-up | OpenClaw session spawn | Claude-native minimal wake-up |
| Cron/heartbeat | OpenClaw global scheduler | only if truly needed for this slice |
| Workspace ownership | OpenClaw workspace | unchanged unless explicitly expanded |
Main migration conclusions
Conclusion 1: do not rebuild OpenClaw
That would be a category error. The gateway, plugin runtime, delivery layer, cron/heartbeat engine, and session/orchestration stack are a separate platform project.
Conclusion 2: rebuild the Gitea ingress/router slice only
This is the actual migration target and is small enough to complete quickly.
Conclusion 3: improve security while migrating
The replacement should implement actual raw-body Gitea HMAC verification. The current webhook path does not.
Conclusion 4: keep deterministic work pure-script
The current split is correct. Repo policy and enforcement work should remain fast, cheap, and idempotent.
Conclusion 5: judgment must be narrow and explicit
Do not wake Claude on every webhook. Use it only for ambiguity, escalation, or clearly user-requested reasoning.
Conclusion 6: design should assume the current system is fragile
Because surrounding cron/verification infrastructure is already degraded, the replacement should be independently observable and easy to test without depending on OpenClaw's unhealthy scheduler chain.
Open questions for Phase 1 design
These questions should be answered in DESIGN.md.
- Ingress topology: keep nginx in front, or let the Caret listener terminate the webhook directly?
- Auth model: bearer only for parity, or proper Gitea HMAC as the new standard?
- Judgment primitive: Channels plugin, direct Claude Code primitive, or temporary dependency on OpenClaw for wake-up?
- Script packaging: copy the existing scripts wholesale first, or split them into library + thin wrappers?
- Repo registration: per-repo hooks only, or system-level hook once token/admin constraints are solved?
- Retry model: synchronous fire-and-log only, or file-backed retry queue for transient failures?
- Observability: plain JSONL logs only, or add a health endpoint plus counters and replay tooling?
- Workflow semantics: which current issue/comment workflows are worth preserving exactly, and which can be simplified?
Recommended next step
Move to Phase 1 — Architecture design with the following framing:
- Treat this document as the baseline map of the current system
- Design only the Gitea-facing slice, not a gateway replacement
- Preserve the deterministic/judgment split
- Improve webhook authentication with real HMAC
- Make observability first-class because the current environment is already degraded
That keeps the project in the "days" category instead of letting it sprawl back into a multi-week platform rewrite.