Files
openclaw-to-caret-migration/research/RESEARCH-02-gateway-internals.md

8.9 KiB

Research 02 — openclaw gateway internals

Subagent: ae5ca38f70b1e9626 (Explore) Completed: 2026-04-06 12:50 UTC

Gateway API surface

WebSocket-first RPC at ws://localhost:18789/, with HTTP fallback routes.

HTTP endpoints

Method Path Purpose
POST /hooks/{hookPath}/wake Trigger heartbeat or immediate agent wake. Body: {text, mode}.
POST /hooks/{hookPath}/agent Spawn isolated agent session. Body: {agentId, sessionKey, message, channel, to, deliver, model, thinking, timeoutSeconds}. Returns {ok, runId}. Idempotency: 60s dedup by Authorization + X-Idempotency-Key.
POST /tools/invoke Call a tool directly. Body: {tool, action, args, sessionKey, dryRun}.
GET /health / /healthz / /ready Liveness / readiness probes.
GET / and /app/* Built-in web control UI (the SPA we saw when probing earlier).
Plugin-registered routes Custom plugin HTTP endpoints; auth enforced per plugin's requiresAuth.

Authentication

  • Authorization: Bearer <token> OR X-OpenClaw-Token: <token> header
  • Token sources: gateway.auth.token in config, OPENCLAW_GATEWAY_TOKEN env var, device token at ~/.openclaw/credentials/device-token
  • WebSocket auth: passed in URL query ?token=... or connect frame

RPC method RBAC scopes

  • READ: health, channels.status, sessions.list, cron.list, node.list, ...
  • WRITE: send, agent, agent.wait, wake, node.invoke, ...
  • ADMIN: config.set, agents.create, cron.add, sessions.reset, ...
  • APPROVALS, PAIRING: narrower scoped methods.

Session spawn recipe

The primary spawn path

Client RPC request → gateway dispatch → agentHandlers.agent() → agentCommandFromIngress() → in-process task

Not a child process. Sessions run as in-process tasks under the gateway. Each session's message history lives in ~/.openclaw/sessions/*.jsonl.

Agent identity & tool allowlist resolution at spawn

  1. Resolve agent ID from params.agentId or agents.defaults.id.
  2. Resolve tool allowlist: first match wins among agents[id].tools.allow/denyagents[id].toolProfileagents.defaults.tools.* → subagent role restrictions.
  3. Hard-deny list always wins (exec.approval.*, node_invoke_system_run, etc.).
  4. Runtime context: runtime="subagent" (sandboxed) or "acp" (host access).
  5. Workspace and session store selected from agent's config.

Subagent / ACP spawn (for nesting)

const result = await spawn({
  task: "Analyze the attached image",
  mode: "run" | "session",
  thread: true,
  agentId: "analyzer"
});
// Returns { status, childSessionKey: "subagent:uuid", runId }

Sessions prefixed subagent:* run in a sandbox (gVisor or Docker container). acp:* runs on host under parent's cwd. Parent sees subagent output but can't reach into its filesystem.

Cron / heartbeat mechanism

It's not a crontab. It's an in-process scheduler built into the gateway.

Heartbeat loop

  1. At gateway boot, startHeartbeatRunner() in src/infra/heartbeat-runner.ts starts.
  2. For each agent where agents[id].heartbeat.enabled == true:
    • Parse heartbeat.every interval
    • Calculate next-due time
    • Set a timer (internally a setInterval that checks wall clock every ~10s)
  3. When timer fires:
    • Read memory/heartbeat-state.json (for dedup / avoid double-fires)
    • Read pending memory/system-events/ (queued by cron jobs, exec completions, etc.)
    • Build a prompt from heartbeat config + pending events
    • Spawn agent with extraSystemPrompt = heartbeat prompt
    • Agent responds (may be empty)
    • Update heartbeat state file

Cron service (parallel to heartbeat)

  • Class: CronService in src/cron/service.ts
  • Config: cron.jobs[].schedule (cron expression)
  • State: ~/.openclaw/memory/cron/store.json with {id, schedule, agentId, prompt, lastRunMs, nextDueMs}
  • Run logs: ~/.openclaw/memory/cron/runs/
  • Can enqueue system-events/*.json that heartbeat picks up next cycle.

Ad hoc triggers

  • openclaw wake --now fires heartbeat immediately
  • openclaw cron run <id> --force fires a cron job immediately
  • openclaw system-event "text" queues an event for next heartbeat

Plugin discovery and wiring

Loader

src/plugins/loader.tsloadOpenClawPlugins():

  1. Scan ~/.openclaw/plugins/ directory
  2. Read each plugin's manifest (plugin.yaml or package.json exports)
  3. Dynamic-import plugin module via jiti
  4. Initialize PluginRuntime with sandbox context, gateway request handler, scoped filesystem access
  5. Register plugin's hooks (lifecycle events) and gateway methods (HTTP/RPC)

Example: Telegram plugin

  • Starts a polling loop calling Telegram Bot API getUpdates()
  • For each incoming message, calls dispatchGatewayMethod("agent", {...}) to spawn a Claude session
  • Claude's response routed back via plugin's send handler

Replacement difficulty matrix

Component Difficulty Notes
Session storage (JSONL messages) Easy Simple file format, adopt as-is
Heartbeat scheduler Medium Timer logic easy; state/dedup is the work
Cron service Medium Schedule parsing + state persistence
Hook API (POST /hooks) Easy Stateless request/response
RPC / WebSocket protocol Hard Custom protocol with dedup, framing, RBAC
Tool policy and allowlist resolution Medium Glob pattern + inheritance hierarchy
Plugin system Hard Dynamic loading, sandboxed runtime contexts
Subagent / ACP spawn Hard Nesting, thread binding, runtime isolation
Delivery system (Telegram, Slack, etc.) Hard Multi-channel abstraction; tightly coupled
Control UI Medium React SPA; can be replaced if protocol stays compatible
Authentication and RBAC Medium Token validation + scope checks

Don't reinvent this

  1. Session transcript storage (src/config/sessions/) — JSONL with dedup, compression, archival. Adopt.
  2. Plugin SDK (src/plugin-sdk/) — type-safe hook runners, tool registration. Many plugins depend on it.
  3. Tool policy resolution (src/agents/tool-policy*.ts) — battle-tested glob + inheritance. 2-3 weeks to replace.
  4. Delivery system (src/infra/outbound/) — routes to Telegram/Slack/Discord/WhatsApp with retries and dedup. Very tightly coupled.
  5. Exec approvals (src/infra/exec-approvals-*) — human-in-the-loop for sensitive ops. Keep if you plan approvals.
  6. Hot-reload config (src/gateway/config-reload.ts) — atomic updates with broadcasts.

Migration path summary

To replace openclaw's orchestration while keeping agents and tools:

  1. Adopt existing session storage (or thin DB adapter)
  2. Keep plugin system — at minimum the hook-runner pattern for startup/shutdown
  3. Reimplement heartbeat scheduler as a background job
  4. Reimplement cron service with same semantics
  5. Build your own HTTP/RPC gateway, keeping /tools/invoke signature for compatibility
  6. Map hook API to your agent spawn endpoint
  7. Reimplement tool policy resolution using your config schema
  8. Adopt delivery system or build equivalent (biggest lift)

Estimated effort: 4-8 weeks competent team, assuming Claude SDK agent harness is mostly intact and session/tool abstractions reused.

Caret's conclusion

Full orchestration replacement is a 4-8 week project. That's NOT what I want.

What I DO want is much smaller: the specific slice that handles Gitea webhook events → policy enforcement → optional agent wake-up. That's a ~600-800 line bun listener, not a whole orchestrator. Everything else (session storage, plugin SDK, delivery system, tool policy) I keep depending on openclaw for, or reuse Claude Code's native primitives (Channels plugins, CronCreate, hooks).

The research confirms the right shape: build a minimal webhook listener + event router + script fan-out that can run standalone, and wire it into Claude Code's native Channels mechanism for the judgment wake-ups. Don't try to replicate the whole orchestrator.