8.9 KiB
Research 02 — openclaw gateway internals
Subagent: ae5ca38f70b1e9626 (Explore)
Completed: 2026-04-06 12:50 UTC
Gateway API surface
WebSocket-first RPC at ws://localhost:18789/, with HTTP fallback routes.
HTTP endpoints
| Method | Path | Purpose |
|---|---|---|
| POST | /hooks/{hookPath}/wake |
Trigger heartbeat or immediate agent wake. Body: {text, mode}. |
| POST | /hooks/{hookPath}/agent |
Spawn isolated agent session. Body: {agentId, sessionKey, message, channel, to, deliver, model, thinking, timeoutSeconds}. Returns {ok, runId}. Idempotency: 60s dedup by Authorization + X-Idempotency-Key. |
| POST | /tools/invoke |
Call a tool directly. Body: {tool, action, args, sessionKey, dryRun}. |
| GET | /health / /healthz / /ready |
Liveness / readiness probes. |
| GET | / and /app/* |
Built-in web control UI (the SPA we saw when probing earlier). |
| Plugin-registered routes | Custom plugin HTTP endpoints; auth enforced per plugin's requiresAuth. |
Authentication
Authorization: Bearer <token>ORX-OpenClaw-Token: <token>header- Token sources:
gateway.auth.tokenin config,OPENCLAW_GATEWAY_TOKENenv var, device token at~/.openclaw/credentials/device-token - WebSocket auth: passed in URL query
?token=...or connect frame
RPC method RBAC scopes
- READ:
health,channels.status,sessions.list,cron.list,node.list, ... - WRITE:
send,agent,agent.wait,wake,node.invoke, ... - ADMIN:
config.set,agents.create,cron.add,sessions.reset, ... - APPROVALS, PAIRING: narrower scoped methods.
Session spawn recipe
The primary spawn path
Client RPC request → gateway dispatch → agentHandlers.agent() → agentCommandFromIngress() → in-process task
Not a child process. Sessions run as in-process tasks under the gateway. Each session's message history lives in ~/.openclaw/sessions/*.jsonl.
Agent identity & tool allowlist resolution at spawn
- Resolve agent ID from
params.agentIdoragents.defaults.id. - Resolve tool allowlist: first match wins among
agents[id].tools.allow/deny→agents[id].toolProfile→agents.defaults.tools.*→ subagent role restrictions. - Hard-deny list always wins (
exec.approval.*,node_invoke_system_run, etc.). - Runtime context:
runtime="subagent"(sandboxed) or"acp"(host access). - Workspace and session store selected from agent's config.
Subagent / ACP spawn (for nesting)
const result = await spawn({
task: "Analyze the attached image",
mode: "run" | "session",
thread: true,
agentId: "analyzer"
});
// Returns { status, childSessionKey: "subagent:uuid", runId }
Sessions prefixed subagent:* run in a sandbox (gVisor or Docker container). acp:* runs on host under parent's cwd. Parent sees subagent output but can't reach into its filesystem.
Cron / heartbeat mechanism
It's not a crontab. It's an in-process scheduler built into the gateway.
Heartbeat loop
- At gateway boot,
startHeartbeatRunner()insrc/infra/heartbeat-runner.tsstarts. - For each agent where
agents[id].heartbeat.enabled == true:- Parse
heartbeat.everyinterval - Calculate next-due time
- Set a timer (internally a
setIntervalthat checks wall clock every ~10s)
- Parse
- When timer fires:
- Read
memory/heartbeat-state.json(for dedup / avoid double-fires) - Read pending
memory/system-events/(queued by cron jobs, exec completions, etc.) - Build a prompt from heartbeat config + pending events
- Spawn agent with
extraSystemPrompt= heartbeat prompt - Agent responds (may be empty)
- Update heartbeat state file
- Read
Cron service (parallel to heartbeat)
- Class:
CronServiceinsrc/cron/service.ts - Config:
cron.jobs[].schedule(cron expression) - State:
~/.openclaw/memory/cron/store.jsonwith{id, schedule, agentId, prompt, lastRunMs, nextDueMs} - Run logs:
~/.openclaw/memory/cron/runs/ - Can enqueue
system-events/*.jsonthat heartbeat picks up next cycle.
Ad hoc triggers
openclaw wake --nowfires heartbeat immediatelyopenclaw cron run <id> --forcefires a cron job immediatelyopenclaw system-event "text"queues an event for next heartbeat
Plugin discovery and wiring
Loader
src/plugins/loader.ts → loadOpenClawPlugins():
- Scan
~/.openclaw/plugins/directory - Read each plugin's manifest (plugin.yaml or package.json exports)
- Dynamic-import plugin module via jiti
- Initialize
PluginRuntimewith sandbox context, gateway request handler, scoped filesystem access - Register plugin's hooks (lifecycle events) and gateway methods (HTTP/RPC)
Example: Telegram plugin
- Starts a polling loop calling Telegram Bot API
getUpdates() - For each incoming message, calls
dispatchGatewayMethod("agent", {...})to spawn a Claude session - Claude's response routed back via plugin's send handler
Replacement difficulty matrix
| Component | Difficulty | Notes |
|---|---|---|
| Session storage (JSONL messages) | Easy | Simple file format, adopt as-is |
| Heartbeat scheduler | Medium | Timer logic easy; state/dedup is the work |
| Cron service | Medium | Schedule parsing + state persistence |
| Hook API (POST /hooks) | Easy | Stateless request/response |
| RPC / WebSocket protocol | Hard | Custom protocol with dedup, framing, RBAC |
| Tool policy and allowlist resolution | Medium | Glob pattern + inheritance hierarchy |
| Plugin system | Hard | Dynamic loading, sandboxed runtime contexts |
| Subagent / ACP spawn | Hard | Nesting, thread binding, runtime isolation |
| Delivery system (Telegram, Slack, etc.) | Hard | Multi-channel abstraction; tightly coupled |
| Control UI | Medium | React SPA; can be replaced if protocol stays compatible |
| Authentication and RBAC | Medium | Token validation + scope checks |
Don't reinvent this
- Session transcript storage (
src/config/sessions/) — JSONL with dedup, compression, archival. Adopt. - Plugin SDK (
src/plugin-sdk/) — type-safe hook runners, tool registration. Many plugins depend on it. - Tool policy resolution (
src/agents/tool-policy*.ts) — battle-tested glob + inheritance. 2-3 weeks to replace. - Delivery system (
src/infra/outbound/) — routes to Telegram/Slack/Discord/WhatsApp with retries and dedup. Very tightly coupled. - Exec approvals (
src/infra/exec-approvals-*) — human-in-the-loop for sensitive ops. Keep if you plan approvals. - Hot-reload config (
src/gateway/config-reload.ts) — atomic updates with broadcasts.
Migration path summary
To replace openclaw's orchestration while keeping agents and tools:
- Adopt existing session storage (or thin DB adapter)
- Keep plugin system — at minimum the hook-runner pattern for startup/shutdown
- Reimplement heartbeat scheduler as a background job
- Reimplement cron service with same semantics
- Build your own HTTP/RPC gateway, keeping
/tools/invokesignature for compatibility - Map hook API to your agent spawn endpoint
- Reimplement tool policy resolution using your config schema
- Adopt delivery system or build equivalent (biggest lift)
Estimated effort: 4-8 weeks competent team, assuming Claude SDK agent harness is mostly intact and session/tool abstractions reused.
Caret's conclusion
Full orchestration replacement is a 4-8 week project. That's NOT what I want.
What I DO want is much smaller: the specific slice that handles Gitea webhook events → policy enforcement → optional agent wake-up. That's a ~600-800 line bun listener, not a whole orchestrator. Everything else (session storage, plugin SDK, delivery system, tool policy) I keep depending on openclaw for, or reuse Claude Code's native primitives (Channels plugins, CronCreate, hooks).
The research confirms the right shape: build a minimal webhook listener + event router + script fan-out that can run standalone, and wire it into Claude Code's native Channels mechanism for the judgment wake-ups. Don't try to replicate the whole orchestrator.