plan: initial migration plan and README
This commit is contained in:
109
PLAN.md
Normal file
109
PLAN.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# openclaw → Caret migration plan
|
||||
|
||||
**Status:** in progress, 2026-04-06
|
||||
**Owner:** Caret
|
||||
**Approver:** Rooh
|
||||
**Tracking:** issues in this repo
|
||||
|
||||
## Goal
|
||||
|
||||
Take over the agent infrastructure that openclaw currently runs through Xen — webhooks, policy enforcement, heartbeat checks, session spawning, scheduled work — and stand up a Caret-owned replacement that works 100% of the features correctly, so openclaw/Xen can be disabled later with zero regression.
|
||||
|
||||
The migration has to preserve two categories of behavior cleanly:
|
||||
|
||||
1. **Deterministic work** — pure-script operations that don't need an LLM. These must stay cheap (zero token cost), fast (sub-second), and reliable. Examples: adding collaborators on repo creation, ensuring the HMAC webhook exists on new repos, running the policy template baseline, HMAC verification.
|
||||
2. **Judgment work** — operations that benefit from opus-level reasoning. These should *wake me up* via a native Claude Code primitive (channels plugin or similar), not a permanent process. Examples: drafting a README from commit history, deciding which template fits an unusual repo, reviewing PR policy violations conversationally, explaining anomalies in plain language.
|
||||
|
||||
The split matters because openclaw's current gitea-webhooks pipeline is explicitly tagged `Zero tokens — pure script enforcement` in its `post-repo-audit.sh` header comment. Keeping the same split avoids ballooning token spend.
|
||||
|
||||
## Phases
|
||||
|
||||
### Phase 0 — Research (in progress)
|
||||
|
||||
Read the three reference repos and the live system state to understand what I'm replacing. Three parallel Explore subagents are doing this now.
|
||||
|
||||
- **R0.1** Read `sol/gitea-webhooks` deeply — data flow, HMAC, transform logic, tool fan-out, repo-type detection, openclaw couplings. (subagent afa92905872a43a9b)
|
||||
- **R0.2** Read `sol/workspace-ops` and `sol/agent-reliability` — scope, entry points, openclaw couplings, overlaps. (same subagent)
|
||||
- **R0.3** Map the openclaw gateway internals — session spawn API, cron/heartbeat mechanism, tool allowlist enforcement, plugin wiring, replacement difficulty matrix. (subagent ae5ca38f70b1e9626)
|
||||
- **R0.4** Audit the currently-running state — every registered Gitea webhook, live cron/timer infra, HEARTBEAT.md checklists, active agent processes, shared state stores. (subagent abf0cb0928d823a0b)
|
||||
- **R0.5** Synthesize the three reports into `ARCHITECTURE.md` in this repo — a single readable picture of what openclaw does today and what I'll need to rebuild.
|
||||
|
||||
Exit criteria: I can answer any architectural question about the current openclaw gitea pipeline without grepping the source again.
|
||||
|
||||
### Phase 1 — Architecture design
|
||||
|
||||
Before writing any code, lock down the target design with Rooh's sign-off.
|
||||
|
||||
- **A1.1** Write `DESIGN.md` in this repo describing the target Caret-owned stack: components, data flows, ownership boundaries, storage, observability.
|
||||
- **A1.2** List every openclaw dependency the new stack removes, and every openclaw feature it depends on staying around (if any).
|
||||
- **A1.3** DMG-style brief: surface every design choice as a multiple-choice question with a recommendation and reasoning, let Rooh approve or override. No code written until this is signed off.
|
||||
- **A1.4** Define "100% feature parity" concretely — a test list that must pass after migration. Every test is a one-line assertion about behavior ("new sol/* repo gets Makefile within 10s", "HMAC-signed POST with wrong secret is rejected with 403", etc.).
|
||||
|
||||
Exit criteria: `DESIGN.md` in the repo, signed off by Rooh, with a test list that defines done.
|
||||
|
||||
### Phase 2 — Deterministic path build
|
||||
|
||||
Build the pure-script side first. No LLM in the loop.
|
||||
|
||||
- **B2.1** Scaffold the webhook listener (bun HTTP server) in its own docker container under `/host/root/.caret/`. Minimal footprint, one file where possible.
|
||||
- **B2.2** Implement HMAC verification exactly matching the openclaw pattern so existing Gitea webhooks work without re-registration.
|
||||
- **B2.3** Port the core scripts from `sol/gitea-webhooks/tools/` into `/host/root/.caret/tools/` with the openclaw-specific bits (hardcoded workspace paths, Mattermost channel IDs, etc.) stripped out.
|
||||
- **B2.4** Wire the listener's event router to call the right script for each event type. `repository.create` → `post-repo-audit.sh` + `audit-repo-policies.sh --fix`. `push` to main → `secret-scan.sh` + policy re-check. Etc.
|
||||
- **B2.5** Structured JSON logging to `/host/root/.caret/log/repo-enforcer.log` with line-count rotation (same pattern as tg-stream).
|
||||
- **B2.6** Unit tests: mock webhook payloads via curl against a locally-running listener. Cover verification success/failure, event routing, idempotency.
|
||||
|
||||
Exit criteria: a canary test — create a throwaway `sol/caret-test-<ts>` repo against the live system, watch policies apply within 10 seconds, verify the log captures the event and the commit hash.
|
||||
|
||||
### Phase 3 — Judgment path build
|
||||
|
||||
Build the wake-me-up side. This is the piece openclaw doesn't currently do — it exclusively uses pure scripts.
|
||||
|
||||
- **J3.1** Evaluate the Channels plugin mechanism (`claude code channels`) as the native primitive for "external event → Claude session". Build a minimal plugin at `/host/root/.caret/channels/gitea-judgment/` that receives an HTTP POST and starts a session with the payload as the initial prompt.
|
||||
- **J3.2** Define the trigger conditions — when does the deterministic path hand off to judgment? e.g. "policy enforcer errored", "repo-type detection was ambiguous", "a flag in the issue body requests AI review".
|
||||
- **J3.3** Ensure cost hygiene: judgment is always opt-in or error-triggered, never fired on every event. Document the budget in `DESIGN.md`.
|
||||
|
||||
Exit criteria: a second canary — a repo with a deliberately weird structure fires the deterministic path, that path detects it can't auto-fix, and a Claude session wakes up to handle it, reporting back via `tg-stream`.
|
||||
|
||||
### Phase 4 — Parallel run with Xen
|
||||
|
||||
Openclaw stays on. My replacement runs alongside it, both hitting the same Gitea events.
|
||||
|
||||
- **P4.1** Register my webhook endpoint on a small set of test repos first (not all of sol/*). Verify both pipelines fire and neither breaks.
|
||||
- **P4.2** After 24 hours of clean dual-run on the test repos, widen to all sol/* repos.
|
||||
- **P4.3** Monitor the log for any divergence — cases where my pipeline disagreed with openclaw's. Investigate and reconcile every one.
|
||||
|
||||
Exit criteria: 72 hours of dual-run with zero unreconciled divergences.
|
||||
|
||||
### Phase 5 — Cut-over
|
||||
|
||||
Only run this with Rooh's explicit go-ahead.
|
||||
|
||||
- **C5.1** Disable openclaw's gitea-transform by stopping the openclaw-openclaw-gateway-1 container OR by removing its hooks from its settings (reversible).
|
||||
- **C5.2** Watch my pipeline handle all incoming Gitea events solo for 24 hours.
|
||||
- **C5.3** If anything breaks: immediate rollback by restarting openclaw. Rollback must be one command, tested before cut-over.
|
||||
- **C5.4** After 7 days of clean solo run, mark the migration complete. Delete the staging files, archive the research repos, move this project from "in progress" to "done" in the repo description.
|
||||
|
||||
Exit criteria: 7 days of Caret-only operation with no regressions.
|
||||
|
||||
## Dependencies and coordination
|
||||
|
||||
- **Gitea token scope:** the sol token I have doesn't have `read:admin` scope, so I can't list or create system-level webhooks. I'll need Rooh to either elevate the token or register the system webhook manually one time.
|
||||
- **Xen's review:** even though this migration replaces Xen's territory, I should loop Xen in at Phase 1 (design review) and Phase 4 (parallel run starts). Not approval — just visibility so Xen doesn't delete something I'm depending on.
|
||||
- **Openclaw upgrades:** if openclaw ships an upgrade while we're mid-migration, it could overwrite files I'm reading. I should work against a snapshot commit of the research repos.
|
||||
|
||||
## Risks
|
||||
|
||||
1. **HMAC secret management** — if I don't get the secret storage and rotation story right, a leaked secret compromises every webhook. Must be documented and rotatable.
|
||||
2. **Idempotency** — if my pipeline runs twice on the same event (retry, dual-register, replay), it must not double-apply fixes or double-add collaborators. Every operation must be idempotent.
|
||||
3. **Silent drops** — openclaw currently has gaps (e.g. the bug I found earlier where concurrent pollers race). I must not introduce new silent drops. Every incoming event must produce a log line, even if the line says "ignored because X".
|
||||
4. **Rollback speed** — if the cut-over goes wrong, Rooh must see working policy enforcement again within minutes. The rollback procedure is a first-class deliverable, not an afterthought.
|
||||
5. **Token scope escalation** — I cannot register system-wide webhooks without admin scope. Either we elevate the token or we accept manual one-time setup.
|
||||
6. **Coupling I don't yet see** — the research phase will probably surface dependencies I haven't imagined. The plan should update as findings land.
|
||||
|
||||
## Tracking
|
||||
|
||||
Issues in this repo track each phase task. Each task becomes an issue, each issue gets closed when its exit criterion is met. `ARCHITECTURE.md`, `DESIGN.md`, and this `PLAN.md` are the three canonical documents the work references.
|
||||
|
||||
## Changelog
|
||||
|
||||
- **2026-04-06** Plan drafted. Three research subagents spawned. Repo created, README + plan committed.
|
||||
@@ -1,3 +1,7 @@
|
||||
# openclaw-to-caret-migration
|
||||
# openclaw → Caret migration
|
||||
|
||||
Migration project: openclaw agent infrastructure (Xen-owned) to a Claude-native stack owned by Caret. Tracks architecture decisions, phase plans, and implementation progress via issues.
|
||||
Taking over the openclaw gitea integration (currently owned by Xen) and rebuilding it as a Caret-owned stack that can fully replace openclaw's pipeline when Xen is disabled.
|
||||
|
||||
See `PLAN.md` for the phased plan. Progress is tracked via issues in this repo.
|
||||
|
||||
Status: **Phase 0 — research in progress** (2026-04-06)
|
||||
|
||||
Reference in New Issue
Block a user