From 40c0ca33002e40cfd6d0ccd53fc4cb6c8b3cdef9 Mon Sep 17 00:00:00 2001 From: Caret Date: Mon, 6 Apr 2026 12:37:12 +0000 Subject: [PATCH] plan: initial migration plan and README --- PLAN.md | 109 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 8 +++- 2 files changed, 115 insertions(+), 2 deletions(-) create mode 100644 PLAN.md diff --git a/PLAN.md b/PLAN.md new file mode 100644 index 0000000..449d820 --- /dev/null +++ b/PLAN.md @@ -0,0 +1,109 @@ +# openclaw → Caret migration plan + +**Status:** in progress, 2026-04-06 +**Owner:** Caret +**Approver:** Rooh +**Tracking:** issues in this repo + +## Goal + +Take over the agent infrastructure that openclaw currently runs through Xen — webhooks, policy enforcement, heartbeat checks, session spawning, scheduled work — and stand up a Caret-owned replacement that works 100% of the features correctly, so openclaw/Xen can be disabled later with zero regression. + +The migration has to preserve two categories of behavior cleanly: + +1. **Deterministic work** — pure-script operations that don't need an LLM. These must stay cheap (zero token cost), fast (sub-second), and reliable. Examples: adding collaborators on repo creation, ensuring the HMAC webhook exists on new repos, running the policy template baseline, HMAC verification. +2. **Judgment work** — operations that benefit from opus-level reasoning. These should *wake me up* via a native Claude Code primitive (channels plugin or similar), not a permanent process. Examples: drafting a README from commit history, deciding which template fits an unusual repo, reviewing PR policy violations conversationally, explaining anomalies in plain language. + +The split matters because openclaw's current gitea-webhooks pipeline is explicitly tagged `Zero tokens — pure script enforcement` in its `post-repo-audit.sh` header comment. Keeping the same split avoids ballooning token spend. + +## Phases + +### Phase 0 — Research (in progress) + +Read the three reference repos and the live system state to understand what I'm replacing. Three parallel Explore subagents are doing this now. + +- **R0.1** Read `sol/gitea-webhooks` deeply — data flow, HMAC, transform logic, tool fan-out, repo-type detection, openclaw couplings. (subagent afa92905872a43a9b) +- **R0.2** Read `sol/workspace-ops` and `sol/agent-reliability` — scope, entry points, openclaw couplings, overlaps. (same subagent) +- **R0.3** Map the openclaw gateway internals — session spawn API, cron/heartbeat mechanism, tool allowlist enforcement, plugin wiring, replacement difficulty matrix. (subagent ae5ca38f70b1e9626) +- **R0.4** Audit the currently-running state — every registered Gitea webhook, live cron/timer infra, HEARTBEAT.md checklists, active agent processes, shared state stores. (subagent abf0cb0928d823a0b) +- **R0.5** Synthesize the three reports into `ARCHITECTURE.md` in this repo — a single readable picture of what openclaw does today and what I'll need to rebuild. + +Exit criteria: I can answer any architectural question about the current openclaw gitea pipeline without grepping the source again. + +### Phase 1 — Architecture design + +Before writing any code, lock down the target design with Rooh's sign-off. + +- **A1.1** Write `DESIGN.md` in this repo describing the target Caret-owned stack: components, data flows, ownership boundaries, storage, observability. +- **A1.2** List every openclaw dependency the new stack removes, and every openclaw feature it depends on staying around (if any). +- **A1.3** DMG-style brief: surface every design choice as a multiple-choice question with a recommendation and reasoning, let Rooh approve or override. No code written until this is signed off. +- **A1.4** Define "100% feature parity" concretely — a test list that must pass after migration. Every test is a one-line assertion about behavior ("new sol/* repo gets Makefile within 10s", "HMAC-signed POST with wrong secret is rejected with 403", etc.). + +Exit criteria: `DESIGN.md` in the repo, signed off by Rooh, with a test list that defines done. + +### Phase 2 — Deterministic path build + +Build the pure-script side first. No LLM in the loop. + +- **B2.1** Scaffold the webhook listener (bun HTTP server) in its own docker container under `/host/root/.caret/`. Minimal footprint, one file where possible. +- **B2.2** Implement HMAC verification exactly matching the openclaw pattern so existing Gitea webhooks work without re-registration. +- **B2.3** Port the core scripts from `sol/gitea-webhooks/tools/` into `/host/root/.caret/tools/` with the openclaw-specific bits (hardcoded workspace paths, Mattermost channel IDs, etc.) stripped out. +- **B2.4** Wire the listener's event router to call the right script for each event type. `repository.create` → `post-repo-audit.sh` + `audit-repo-policies.sh --fix`. `push` to main → `secret-scan.sh` + policy re-check. Etc. +- **B2.5** Structured JSON logging to `/host/root/.caret/log/repo-enforcer.log` with line-count rotation (same pattern as tg-stream). +- **B2.6** Unit tests: mock webhook payloads via curl against a locally-running listener. Cover verification success/failure, event routing, idempotency. + +Exit criteria: a canary test — create a throwaway `sol/caret-test-` repo against the live system, watch policies apply within 10 seconds, verify the log captures the event and the commit hash. + +### Phase 3 — Judgment path build + +Build the wake-me-up side. This is the piece openclaw doesn't currently do — it exclusively uses pure scripts. + +- **J3.1** Evaluate the Channels plugin mechanism (`claude code channels`) as the native primitive for "external event → Claude session". Build a minimal plugin at `/host/root/.caret/channels/gitea-judgment/` that receives an HTTP POST and starts a session with the payload as the initial prompt. +- **J3.2** Define the trigger conditions — when does the deterministic path hand off to judgment? e.g. "policy enforcer errored", "repo-type detection was ambiguous", "a flag in the issue body requests AI review". +- **J3.3** Ensure cost hygiene: judgment is always opt-in or error-triggered, never fired on every event. Document the budget in `DESIGN.md`. + +Exit criteria: a second canary — a repo with a deliberately weird structure fires the deterministic path, that path detects it can't auto-fix, and a Claude session wakes up to handle it, reporting back via `tg-stream`. + +### Phase 4 — Parallel run with Xen + +Openclaw stays on. My replacement runs alongside it, both hitting the same Gitea events. + +- **P4.1** Register my webhook endpoint on a small set of test repos first (not all of sol/*). Verify both pipelines fire and neither breaks. +- **P4.2** After 24 hours of clean dual-run on the test repos, widen to all sol/* repos. +- **P4.3** Monitor the log for any divergence — cases where my pipeline disagreed with openclaw's. Investigate and reconcile every one. + +Exit criteria: 72 hours of dual-run with zero unreconciled divergences. + +### Phase 5 — Cut-over + +Only run this with Rooh's explicit go-ahead. + +- **C5.1** Disable openclaw's gitea-transform by stopping the openclaw-openclaw-gateway-1 container OR by removing its hooks from its settings (reversible). +- **C5.2** Watch my pipeline handle all incoming Gitea events solo for 24 hours. +- **C5.3** If anything breaks: immediate rollback by restarting openclaw. Rollback must be one command, tested before cut-over. +- **C5.4** After 7 days of clean solo run, mark the migration complete. Delete the staging files, archive the research repos, move this project from "in progress" to "done" in the repo description. + +Exit criteria: 7 days of Caret-only operation with no regressions. + +## Dependencies and coordination + +- **Gitea token scope:** the sol token I have doesn't have `read:admin` scope, so I can't list or create system-level webhooks. I'll need Rooh to either elevate the token or register the system webhook manually one time. +- **Xen's review:** even though this migration replaces Xen's territory, I should loop Xen in at Phase 1 (design review) and Phase 4 (parallel run starts). Not approval — just visibility so Xen doesn't delete something I'm depending on. +- **Openclaw upgrades:** if openclaw ships an upgrade while we're mid-migration, it could overwrite files I'm reading. I should work against a snapshot commit of the research repos. + +## Risks + +1. **HMAC secret management** — if I don't get the secret storage and rotation story right, a leaked secret compromises every webhook. Must be documented and rotatable. +2. **Idempotency** — if my pipeline runs twice on the same event (retry, dual-register, replay), it must not double-apply fixes or double-add collaborators. Every operation must be idempotent. +3. **Silent drops** — openclaw currently has gaps (e.g. the bug I found earlier where concurrent pollers race). I must not introduce new silent drops. Every incoming event must produce a log line, even if the line says "ignored because X". +4. **Rollback speed** — if the cut-over goes wrong, Rooh must see working policy enforcement again within minutes. The rollback procedure is a first-class deliverable, not an afterthought. +5. **Token scope escalation** — I cannot register system-wide webhooks without admin scope. Either we elevate the token or we accept manual one-time setup. +6. **Coupling I don't yet see** — the research phase will probably surface dependencies I haven't imagined. The plan should update as findings land. + +## Tracking + +Issues in this repo track each phase task. Each task becomes an issue, each issue gets closed when its exit criterion is met. `ARCHITECTURE.md`, `DESIGN.md`, and this `PLAN.md` are the three canonical documents the work references. + +## Changelog + +- **2026-04-06** Plan drafted. Three research subagents spawned. Repo created, README + plan committed. diff --git a/README.md b/README.md index 9718b2f..5035460 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,7 @@ -# openclaw-to-caret-migration +# openclaw → Caret migration -Migration project: openclaw agent infrastructure (Xen-owned) to a Claude-native stack owned by Caret. Tracks architecture decisions, phase plans, and implementation progress via issues. \ No newline at end of file +Taking over the openclaw gitea integration (currently owned by Xen) and rebuilding it as a Caret-owned stack that can fully replace openclaw's pipeline when Xen is disabled. + +See `PLAN.md` for the phased plan. Progress is tracked via issues in this repo. + +Status: **Phase 0 — research in progress** (2026-04-06)