sol/openclaw-to-caret-migration

Fork 0

Files

Caret d36fa6538a research: Phase 0 reports 2 and 3 — gateway internals + live state audit

2026-04-06 12:50:22 +00:00

8.9 KiB

Raw Blame History

Research 02 — openclaw gateway internals

Subagent: ae5ca38f70b1e9626 (Explore) Completed: 2026-04-06 12:50 UTC

Gateway API surface

WebSocket-first RPC at ws://localhost:18789/, with HTTP fallback routes.

HTTP endpoints

Method	Path	Purpose
POST	`/hooks/{hookPath}/wake`	Trigger heartbeat or immediate agent wake. Body: `{text, mode}`.
POST	`/hooks/{hookPath}/agent`	Spawn isolated agent session. Body: `{agentId, sessionKey, message, channel, to, deliver, model, thinking, timeoutSeconds}`. Returns `{ok, runId}`. Idempotency: 60s dedup by `Authorization + X-Idempotency-Key`.
POST	`/tools/invoke`	Call a tool directly. Body: `{tool, action, args, sessionKey, dryRun}`.
GET	`/health` / `/healthz` / `/ready`	Liveness / readiness probes.
GET	`/` and `/app/*`	Built-in web control UI (the SPA we saw when probing earlier).
Plugin-registered routes	Custom plugin HTTP endpoints; auth enforced per plugin's `requiresAuth`.

Authentication

Authorization: Bearer <token> OR X-OpenClaw-Token: <token> header
Token sources: gateway.auth.token in config, OPENCLAW_GATEWAY_TOKEN env var, device token at ~/.openclaw/credentials/device-token
WebSocket auth: passed in URL query ?token=... or connect frame

RPC method RBAC scopes

READ: health, channels.status, sessions.list, cron.list, node.list, ...
WRITE: send, agent, agent.wait, wake, node.invoke, ...
ADMIN: config.set, agents.create, cron.add, sessions.reset, ...
APPROVALS, PAIRING: narrower scoped methods.

Session spawn recipe

The primary spawn path

Client RPC request → gateway dispatch → agentHandlers.agent() → agentCommandFromIngress() → in-process task

Not a child process. Sessions run as in-process tasks under the gateway. Each session's message history lives in ~/.openclaw/sessions/*.jsonl.

Agent identity & tool allowlist resolution at spawn

Resolve agent ID from params.agentId or agents.defaults.id.
Resolve tool allowlist: first match wins among agents[id].tools.allow/deny → agents[id].toolProfile → agents.defaults.tools.* → subagent role restrictions.
Hard-deny list always wins (exec.approval.*, node_invoke_system_run, etc.).
Runtime context: runtime="subagent" (sandboxed) or "acp" (host access).
Workspace and session store selected from agent's config.

Subagent / ACP spawn (for nesting)

const result = await spawn({
  task: "Analyze the attached image",
  mode: "run" | "session",
  thread: true,
  agentId: "analyzer"
});
// Returns { status, childSessionKey: "subagent:uuid", runId }

Sessions prefixed subagent:* run in a sandbox (gVisor or Docker container). acp:* runs on host under parent's cwd. Parent sees subagent output but can't reach into its filesystem.

Cron / heartbeat mechanism

It's not a crontab. It's an in-process scheduler built into the gateway.

Heartbeat loop

At gateway boot, startHeartbeatRunner() in src/infra/heartbeat-runner.ts starts.
For each agent where agents[id].heartbeat.enabled == true:
- Parse heartbeat.every interval
- Calculate next-due time
- Set a timer (internally a setInterval that checks wall clock every ~10s)
When timer fires:
- Read memory/heartbeat-state.json (for dedup / avoid double-fires)
- Read pending memory/system-events/ (queued by cron jobs, exec completions, etc.)
- Build a prompt from heartbeat config + pending events
- Spawn agent with extraSystemPrompt = heartbeat prompt
- Agent responds (may be empty)
- Update heartbeat state file

Cron service (parallel to heartbeat)

Class: CronService in src/cron/service.ts
Config: cron.jobs[].schedule (cron expression)
State: ~/.openclaw/memory/cron/store.json with {id, schedule, agentId, prompt, lastRunMs, nextDueMs}
Run logs: ~/.openclaw/memory/cron/runs/
Can enqueue system-events/*.json that heartbeat picks up next cycle.

Ad hoc triggers

openclaw wake --now fires heartbeat immediately
openclaw cron run <id> --force fires a cron job immediately
openclaw system-event "text" queues an event for next heartbeat

Plugin discovery and wiring

Loader

src/plugins/loader.ts → loadOpenClawPlugins():

Scan ~/.openclaw/plugins/ directory
Read each plugin's manifest (plugin.yaml or package.json exports)
Dynamic-import plugin module via jiti
Initialize PluginRuntime with sandbox context, gateway request handler, scoped filesystem access
Register plugin's hooks (lifecycle events) and gateway methods (HTTP/RPC)

Example: Telegram plugin

Starts a polling loop calling Telegram Bot API getUpdates()
For each incoming message, calls dispatchGatewayMethod("agent", {...}) to spawn a Claude session
Claude's response routed back via plugin's send handler

Replacement difficulty matrix

Component	Difficulty	Notes
Session storage (JSONL messages)	Easy	Simple file format, adopt as-is
Heartbeat scheduler	Medium	Timer logic easy; state/dedup is the work
Cron service	Medium	Schedule parsing + state persistence
Hook API (POST /hooks)	Easy	Stateless request/response
RPC / WebSocket protocol	Hard	Custom protocol with dedup, framing, RBAC
Tool policy and allowlist resolution	Medium	Glob pattern + inheritance hierarchy
Plugin system	Hard	Dynamic loading, sandboxed runtime contexts
Subagent / ACP spawn	Hard	Nesting, thread binding, runtime isolation
Delivery system (Telegram, Slack, etc.)	Hard	Multi-channel abstraction; tightly coupled
Control UI	Medium	React SPA; can be replaced if protocol stays compatible
Authentication and RBAC	Medium	Token validation + scope checks

Don't reinvent this

Session transcript storage (src/config/sessions/) — JSONL with dedup, compression, archival. Adopt.
Plugin SDK (src/plugin-sdk/) — type-safe hook runners, tool registration. Many plugins depend on it.
Tool policy resolution (src/agents/tool-policy*.ts) — battle-tested glob + inheritance. 2-3 weeks to replace.
Delivery system (src/infra/outbound/) — routes to Telegram/Slack/Discord/WhatsApp with retries and dedup. Very tightly coupled.
Exec approvals (src/infra/exec-approvals-*) — human-in-the-loop for sensitive ops. Keep if you plan approvals.
Hot-reload config (src/gateway/config-reload.ts) — atomic updates with broadcasts.

Migration path summary

To replace openclaw's orchestration while keeping agents and tools:

Adopt existing session storage (or thin DB adapter)
Keep plugin system — at minimum the hook-runner pattern for startup/shutdown
Reimplement heartbeat scheduler as a background job
Reimplement cron service with same semantics
Build your own HTTP/RPC gateway, keeping /tools/invoke signature for compatibility
Map hook API to your agent spawn endpoint
Reimplement tool policy resolution using your config schema
Adopt delivery system or build equivalent (biggest lift)

Estimated effort: 4-8 weeks competent team, assuming Claude SDK agent harness is mostly intact and session/tool abstractions reused.

Caret's conclusion

Full orchestration replacement is a 4-8 week project. That's NOT what I want.

What I DO want is much smaller: the specific slice that handles Gitea webhook events → policy enforcement → optional agent wake-up. That's a ~600-800 line bun listener, not a whole orchestrator. Everything else (session storage, plugin SDK, delivery system, tool policy) I keep depending on openclaw for, or reuse Claude Code's native primitives (Channels plugins, CronCreate, hooks).

The research confirms the right shape: build a minimal webhook listener + event router + script fan-out that can run standalone, and wire it into Claude Code's native Channels mechanism for the judgment wake-ups. Don't try to replicate the whole orchestrator.

8.9 KiB Raw Blame History