12 KiB
Agent telemetry & live execution stream — project spec
This document captures concrete product and engineering additions discussed for Vibn: moving from poll-based session updates and in-memory jobs to a durable, ordered, push-friendly execution timeline—the web equivalent of a terminal agent’s clarity (step-by-step visibility, tool boundaries, failures, and later multi-agent signals).
1. Why this exists
Current behavior (baseline)
| Surface | How progress reaches the user | Limits |
|---|---|---|
Agent sessions (agent_sessions) |
Runner PATCHes output, status, changed_files to Next; UI polls GET …/agent/sessions/[id]. |
Latency, reconnect story, no single ordered stream; rich semantics encoded only in text. |
Jobs (/api/agent/run, /api/jobs/:id) |
In-memory job-store (progress, toolCalls[]); UI polls job endpoint. |
Lost on restart; not shared across runner replicas; not unified with session UI. |
| Orchestrator / Atlas chat | Request/response to runner; advisor path may be remote URL. | No execution timeline for “long COO run” in-product unless you add the same event layer. |
Product intent
- Trust during long runs: users see what happened, when, and whether something was blocked—not only a final status.
- Differentiation: “Ink-like” clarity in the browser—structured steps, not a blob of logs.
- Foundation for multi-agent: handoffs, child work, and safety events need a common event pipe, not ad-hoc strings.
2. Goals
- Append-only execution events with monotonic ordering (per session or per job), suitable for replay after refresh.
- Server-push to the client (recommend SSE first; WebSocket if you need bi-directional on the same channel).
- Persistence so reconnect, refresh, and horizontal scaling do not lose history.
- Single conceptual model (
AgentEvent) usable by:- Build → Agent tab (sessions),
- Job flows (create/analyze-style),
- optionally orchestrator long runs later.
- Backward compatibility during rollout: existing
PATCH+outputcan remain as a fallback or be fed from the same emitter.
Non-goals (for v1)
- Full OpenTelemetry export (optional later).
- Real-time collaborative multi-user cursors on the same session.
- Merging claude-code-fork—this spec is API + UI + persistence only.
3. Concept: AgentEvent
Core shape (suggested)
type AgentEvent = {
seq: number; // monotonic per stream (session_id or job_id)
ts: string; // ISO-8601
runId: string; // session UUID or job id — ties events to a run
runKind: 'session' | 'job';
phase: 'queued' | 'running' | 'completed' | 'failed' | 'stopped';
type: AgentEventType;
payload: Record<string, unknown>; // type-specific
};
type AgentEventType =
| 'run.started'
| 'run.phase' // e.g. planning, executing, committing
| 'llm.turn.start'
| 'llm.turn.end'
| 'tool.start'
| 'tool.end'
| 'tool.output' // chunked stdout/stderr if needed
| 'safety.block' // policy / protected path / command denied
| 'file.changed' // maps to today’s changed_files semantics
| 'git.commit'
| 'deploy.triggered'
| 'deploy.status'
| 'error'
| 'run.completed'
| 'handoff' // v2: parent → child agent
| 'child_job.started' // v2: linked run id
;
Mapping from today’s session outputLine
Today (outputLine.type) |
Suggested event(s) |
|---|---|
step / info |
run.phase or llm.turn.* with summary in payload.message |
stdout / stderr |
tool.output or dedicated stream events |
error |
error + optional safety.block if policy-driven |
done |
run.completed |
Keep human-readable message on events for UI defaults; add structured fields (tool, argsSummary, durationMs) for timeline rendering and filters.
4. Architecture (high level)
flowchart LR
subgraph runner [vibn-agent-runner]
RA[runSessionAgent / runAgent]
EMIT[emitAgentEvent]
end
subgraph api [vibn-frontend Next.js]
ING[POST internal ingest or PATCH extend]
DB[(Postgres agent_events)]
SSE[SSE GET /api/.../stream]
end
subgraph browser [Browser]
UI[Timeline + live log]
end
RA --> EMIT
EMIT -->|HTTPS + secret or mTLS| ING
ING --> DB
UI -->|EventSource| SSE
SSE --> DB
Principles
- Runner remains stateless regarding “truth”: it emits events; Next + DB are the source of truth for the UI (matches today’s session model).
- Alternatively, runner could expose SSE directly—usually worse for auth, CORS, and one domain for the product. Prefer Next as SSE endpoint reading from DB.
5. Backend: vibn-agent-runner
5.1 Emit from execution paths
| Location | Action |
|---|---|
agent-session-runner.ts |
Replace or supplement patchSession output-only updates with emitAgentEvent each turn / tool / error. |
runAgent / tool loop (executeTool) |
Same emitter for job runs. |
server.ts /agent/execute |
Emit run.started after 202; run.completed / error on exit. |
Security / blocked tools (security.ts or equivalent) |
Emit safety.block with reason code (no secrets in payload). |
5.2 Transport runner → Next
Option A (recommended): extend existing PATCH or add POST /api/internal/agent-events (or per-session batch append):
- Headers:
x-agent-runner-secret(same as today’s PATCH). - Body: single event or small batch
{ events: AgentEvent[] }with server-assignedseqto avoid races.
Option B: Runner writes to Redis/Postgres directly—couples runner to DB credentials; only do if you already run runner inside the same trust zone with DB URL.
5.3 Jobs store
- Short term: continue in-memory for job metadata; persist events to Postgres keyed by
jobId. - Medium term: optional Redis for job status + pub/sub to Next for low-latency SSE fanout (only if DB polling becomes a bottleneck).
6. Backend: vibn-frontend (Next.js)
6.1 Persistence
New table (example): agent_run_events
| Column | Notes |
|---|---|
id |
UUID |
run_id |
Session id or job id (text) |
run_kind |
'session' | 'job' |
seq |
BIGSERIAL or per-run sequence enforced with unique constraint (run_id, seq) |
project_id |
Nullable for jobs if not scoped |
event |
JSONB — full AgentEvent or { type, ts, payload } |
created_at |
default now() |
Index: (run_id, seq) for range queries (WHERE run_id = $1 AND seq > $lastSeen).
Optional: migrate legacy agent_sessions.output to be derived (last N lines for email export) or dual-write during transition.
6.2 SSE route (example contract)
GET /api/projects/[projectId]/agent/sessions/[sessionId]/events/stream- Auth: session cookie / same as GET session (user must own project).
- Query:
?afterSeq=123for replay. - Response:
text/event-stream; each message:data: {JSON}\n\n. - Heartbeat comments every ~15–30s to keep proxies alive.
For jobs (if not project-scoped): GET /api/jobs/[jobId]/events/stream with appropriate auth.
6.3 Ingest route (runner-only)
POST /api/internal/agent-events(or nested under project/session as you prefer).- Validates
x-agent-runner-secret. - Inserts rows with server-generated
seq(transaction per run or advisory lock perrun_id).
7. Frontend (product UI)
7.1 Agent tab — timeline
- EventSource (SSE) subscription when session is
running; on load, fetch historical events (GET …/events?afterSeq=0or SSE from 0). - Timeline components:
- Group by
llm.turn/tool.start–tool.end. - Expandable tool args (sanitized).
- Distinct styling for
safety.blockanderror.
- Group by
- Reconnect: on
EventSourceerror, reopen withlastSeqfrom last received event.
7.2 Jobs / analyze flows
- Same timeline component keyed by
jobIdif you surface those runs in UI. - Unifies mental model: “every run has a stream.”
7.3 Deprecate slow polling
- Reduce
GET …/agent/sessions/[id]poll interval when SSE connected; keep single poll forstatus/changed_filesif those stay on session row only, or also emitfile.changedevents and drive UI from stream + one final consistency read.
8. Security & privacy
- Never put tokens, env values, or full file contents in events by default; use truncation and hashes where needed.
safety.block: log reason code + user-safe message; align withsecurity.tsbehavior.- Rate limits on ingest endpoint (per
run_id/ per IP) to avoid abuse if misconfigured.
9. Environment variables
| Variable | Where | Purpose |
|---|---|---|
AGENT_RUNNER_SECRET |
Runner + Next | Ingest / extended PATCH auth |
VIBN_API_URL |
Runner | Base URL for callbacks |
AGENT_RUNNER_URL |
Next | Start runs (unchanged) |
Add if needed:
| Variable | Purpose |
|---|---|
AGENT_EVENTS_INGEST_PATH |
Optional override for ingest URL |
SSE_MAX_BUFFER |
Cap replay batch size |
10. Phased roadmap (suggested)
Phase 1 — Foundation
- Define
AgentEventTypeScript types in a shared package or duplicated minimal types in runner + frontend. - Create
agent_run_events(or equivalent) + migration. - Implement ingest endpoint; wire runner session path to emit core events:
run.started,tool.start/tool.end,error,run.completed,file.changed. - Dual-write: keep existing
PATCHoutputLineso nothing breaks.
Phase 2 — Push
- SSE route + EventSource in Agent tab.
- Backfill UI from DB on mount; then live tail.
- Lower or gate polling on
GETsession.
Phase 3 — Jobs + durability
- Emit same events from job execution path; persist by
jobId. - Optional: replace in-memory job list with DB for multi-instance runner (later).
Phase 4 — Rich semantics
safety.blockfrom policy layer.deploy.*events if Coolify integration is user-visible.- Multi-agent:
handoff,child_job.*with links in payload.
11. Success metrics
- Time-to-first-visible-step after Run < 1s p95 (SSE).
- After hard refresh mid-run, user sees consistent history (no duplicate seq, no gaps if you guarantee at-least-once ingest with idempotency keys later).
- Support tickets / confusion drops on “what is the agent doing?” (qualitative).
12. Related code (repo anchors)
Use these when implementing:
- Runner session loop + PATCH bridge:
vibn-agent-runner/src/agent-session-runner.ts - Runner HTTP:
vibn-agent-runner/src/server.ts(/agent/execute,/agent/stop,/agent/approve,/api/agent/run,/api/jobs/:id) - In-memory jobs:
vibn-agent-runner/src/job-store.ts - Next session API + runner callback:
vibn-frontend/app/api/projects/[projectId]/agent/sessions/[sessionId]/route.ts - Session create + fire-and-forget execute:
vibn-frontend/app/api/projects/[projectId]/agent/sessions/route.ts
13. Open decisions
- Single table for sessions + jobs vs two tables (simpler queries vs flexibility).
- Seq generation: DB sequence per
run_idvs global monotonic with(run_id, seq)composite only in app logic. - Idempotency: runner retries may duplicate events—use
event_idUUID from runner for dedupe on ingest. - Orchestrator chat: treat as v2 unless you need a COO run timeline immediately.
Document version: 1.0 — aligned with discussion of runner ↔ frontend telemetry, SSE-first delivery, Postgres persistence, and future multi-agent event types.