# Agent telemetry & live execution stream — project spec This document captures **concrete product and engineering additions** discussed for Vibn: moving from **poll-based session updates** and **in-memory jobs** to a **durable, ordered, push-friendly execution timeline**—the web equivalent of a terminal agent’s clarity (step-by-step visibility, tool boundaries, failures, and later multi-agent signals). --- ## 1. Why this exists ### Current behavior (baseline) | Surface | How progress reaches the user | Limits | |--------|------------------------------|--------| | **Agent sessions** (`agent_sessions`) | Runner `PATCH`es `output`, `status`, `changed_files` to Next; UI **polls** `GET …/agent/sessions/[id]`. | Latency, reconnect story, no single ordered stream; rich semantics encoded only in `text`. | | **Jobs** (`/api/agent/run`, `/api/jobs/:id`) | In-memory `job-store` (`progress`, `toolCalls[]`); UI polls job endpoint. | Lost on restart; not shared across runner replicas; not unified with session UI. | | **Orchestrator / Atlas chat** | Request/response to runner; advisor path may be remote URL. | No execution timeline for “long COO run” in-product unless you add the same event layer. | ### Product intent - **Trust during long runs**: users see *what* happened, *when*, and *whether something was blocked*—not only a final status. - **Differentiation**: “Ink-like” clarity in the browser—structured steps, not a blob of logs. - **Foundation for multi-agent**: handoffs, child work, and safety events need a **common event pipe**, not ad-hoc strings. --- ## 2. Goals 1. **Append-only execution events** with **monotonic ordering** (per session or per job), suitable for replay after refresh. 2. **Server-push to the client** (recommend **SSE** first; WebSocket if you need bi-directional on the same channel). 3. **Persistence** so reconnect, refresh, and horizontal scaling do not lose history. 4. **Single conceptual model** (`AgentEvent`) usable by: - Build → **Agent** tab (sessions), - **Job** flows (create/analyze-style), - optionally **orchestrator** long runs later. 5. **Backward compatibility** during rollout: existing `PATCH` + `output` can remain as a fallback or be fed from the same emitter. ### Non-goals (for v1) - Full **OpenTelemetry** export (optional later). - **Real-time collaborative** multi-user cursors on the same session. - Merging **claude-code-fork**—this spec is **API + UI + persistence** only. --- ## 3. Concept: `AgentEvent` ### Core shape (suggested) ```ts type AgentEvent = { seq: number; // monotonic per stream (session_id or job_id) ts: string; // ISO-8601 runId: string; // session UUID or job id — ties events to a run runKind: 'session' | 'job'; phase: 'queued' | 'running' | 'completed' | 'failed' | 'stopped'; type: AgentEventType; payload: Record; // type-specific }; type AgentEventType = | 'run.started' | 'run.phase' // e.g. planning, executing, committing | 'llm.turn.start' | 'llm.turn.end' | 'tool.start' | 'tool.end' | 'tool.output' // chunked stdout/stderr if needed | 'safety.block' // policy / protected path / command denied | 'file.changed' // maps to today’s changed_files semantics | 'git.commit' | 'deploy.triggered' | 'deploy.status' | 'error' | 'run.completed' | 'handoff' // v2: parent → child agent | 'child_job.started' // v2: linked run id ; ``` ### Mapping from today’s session `outputLine` | Today (`outputLine.type`) | Suggested event(s) | |---------------------------|--------------------| | `step` / `info` | `run.phase` or `llm.turn.*` with summary in `payload.message` | | `stdout` / `stderr` | `tool.output` or dedicated stream events | | `error` | `error` + optional `safety.block` if policy-driven | | `done` | `run.completed` | Keep **human-readable `message`** on events for UI defaults; add **structured fields** (`tool`, `argsSummary`, `durationMs`) for timeline rendering and filters. --- ## 4. Architecture (high level) ```mermaid flowchart LR subgraph runner [vibn-agent-runner] RA[runSessionAgent / runAgent] EMIT[emitAgentEvent] end subgraph api [vibn-frontend Next.js] ING[POST internal ingest or PATCH extend] DB[(Postgres agent_events)] SSE[SSE GET /api/.../stream] end subgraph browser [Browser] UI[Timeline + live log] end RA --> EMIT EMIT -->|HTTPS + secret or mTLS| ING ING --> DB UI -->|EventSource| SSE SSE --> DB ``` **Principles** - **Runner remains stateless** regarding “truth”: it emits events; **Next + DB** are the source of truth for the UI (matches today’s session model). - Alternatively, runner could expose **SSE directly**—usually worse for **auth**, **CORS**, and **one domain** for the product. Prefer **Next as SSE endpoint** reading from DB. --- ## 5. Backend: `vibn-agent-runner` ### 5.1 Emit from execution paths | Location | Action | |----------|--------| | `agent-session-runner.ts` | Replace or supplement `patchSession` output-only updates with **`emitAgentEvent`** each turn / tool / error. | | `runAgent` / tool loop (`executeTool`) | Same emitter for **job** runs. | | `server.ts` `/agent/execute` | Emit `run.started` after 202; `run.completed` / `error` on exit. | | Security / blocked tools (`security.ts` or equivalent) | Emit `safety.block` with reason code (no secrets in payload). | ### 5.2 Transport runner → Next **Option A (recommended):** extend existing **PATCH** or add **`POST /api/internal/agent-events`** (or per-session batch append): - Headers: `x-agent-runner-secret` (same as today’s PATCH). - Body: single event or small batch `{ events: AgentEvent[] }` with server-assigned `seq` to avoid races. **Option B:** Runner writes to **Redis/Postgres** directly—couples runner to DB credentials; only do if you already run runner inside the same trust zone with DB URL. ### 5.3 Jobs store - **Short term:** continue in-memory for job metadata; **persist events** to Postgres keyed by `jobId`. - **Medium term:** optional **Redis** for job status + pub/sub to Next for low-latency SSE fanout (only if DB polling becomes a bottleneck). --- ## 6. Backend: `vibn-frontend` (Next.js) ### 6.1 Persistence **New table (example): `agent_run_events`** | Column | Notes | |--------|--------| | `id` | UUID | | `run_id` | Session id or job id (text) | | `run_kind` | `'session' \| 'job'` | | `seq` | BIGSERIAL or per-run sequence enforced with unique constraint `(run_id, seq)` | | `project_id` | Nullable for jobs if not scoped | | `event` | JSONB — full `AgentEvent` or `{ type, ts, payload }` | | `created_at` | default now() | Index: `(run_id, seq)` for range queries (`WHERE run_id = $1 AND seq > $lastSeen`). **Optional:** migrate legacy `agent_sessions.output` to be **derived** (last N lines for email export) or **dual-write** during transition. ### 6.2 SSE route (example contract) - **`GET /api/projects/[projectId]/agent/sessions/[sessionId]/events/stream`** - Auth: session cookie / same as GET session (user must own project). - Query: `?afterSeq=123` for replay. - Response: `text/event-stream`; each message: `data: {JSON}\n\n`. - Heartbeat comments every ~15–30s to keep proxies alive. For **jobs** (if not project-scoped): `GET /api/jobs/[jobId]/events/stream` with appropriate auth. ### 6.3 Ingest route (runner-only) - **`POST /api/internal/agent-events`** (or nested under project/session as you prefer). - Validates `x-agent-runner-secret`. - Inserts rows with **server-generated `seq`** (transaction per run or advisory lock per `run_id`). --- ## 7. Frontend (product UI) ### 7.1 Agent tab — timeline - **EventSource** (SSE) subscription when session is `running`; on load, **fetch historical** events (`GET …/events?afterSeq=0` or SSE from 0). - **Timeline components**: - Group by `llm.turn` / `tool.start`–`tool.end`. - Expandable tool args (sanitized). - Distinct styling for `safety.block` and `error`. - **Reconnect**: on `EventSource` error, reopen with `lastSeq` from last received event. ### 7.2 Jobs / analyze flows - Same timeline component keyed by `jobId` if you surface those runs in UI. - Unifies mental model: “every run has a stream.” ### 7.3 Deprecate slow polling - Reduce `GET …/agent/sessions/[id]` poll interval when SSE connected; keep **single poll** for `status` / `changed_files` if those stay on session row only, or **also** emit `file.changed` events and drive UI from stream + one final consistency read. --- ## 8. Security & privacy - **Never** put tokens, env values, or full file contents in events by default; use **truncation** and **hashes** where needed. - **`safety.block`**: log reason **code** + user-safe message; align with `security.ts` behavior. - **Rate limits** on ingest endpoint (per `run_id` / per IP) to avoid abuse if misconfigured. --- ## 9. Environment variables | Variable | Where | Purpose | |----------|--------|---------| | `AGENT_RUNNER_SECRET` | Runner + Next | Ingest / extended PATCH auth | | `VIBN_API_URL` | Runner | Base URL for callbacks | | `AGENT_RUNNER_URL` | Next | Start runs (unchanged) | Add if needed: | Variable | Purpose | |----------|---------| | `AGENT_EVENTS_INGEST_PATH` | Optional override for ingest URL | | `SSE_MAX_BUFFER` | Cap replay batch size | --- ## 10. Phased roadmap (suggested) ### Phase 1 — Foundation - [ ] Define `AgentEvent` TypeScript types in a **shared package** or duplicated minimal types in runner + frontend. - [ ] Create `agent_run_events` (or equivalent) + migration. - [ ] Implement **ingest** endpoint; wire **runner session path** to emit core events: `run.started`, `tool.start` / `tool.end`, `error`, `run.completed`, `file.changed`. - [ ] **Dual-write**: keep existing `PATCH` `outputLine` so nothing breaks. ### Phase 2 — Push - [ ] SSE route + **EventSource** in Agent tab. - [ ] Backfill UI from DB on mount; then live tail. - [ ] Lower or gate polling on `GET` session. ### Phase 3 — Jobs + durability - [ ] Emit same events from **job** execution path; persist by `jobId`. - [ ] Optional: replace in-memory job list with DB for **multi-instance** runner (later). ### Phase 4 — Rich semantics - [ ] `safety.block` from policy layer. - [ ] `deploy.*` events if Coolify integration is user-visible. - [ ] **Multi-agent**: `handoff`, `child_job.*` with links in payload. --- ## 11. Success metrics - Time-to-first-visible-step after **Run** < **1s** p95 (SSE). - After hard refresh mid-run, user sees **consistent history** (no duplicate seq, no gaps if you guarantee at-least-once ingest with idempotency keys later). - Support tickets / confusion drops on “what is the agent doing?” (qualitative). --- ## 12. Related code (repo anchors) Use these when implementing: - Runner session loop + PATCH bridge: `vibn-agent-runner/src/agent-session-runner.ts` - Runner HTTP: `vibn-agent-runner/src/server.ts` (`/agent/execute`, `/agent/stop`, `/agent/approve`, `/api/agent/run`, `/api/jobs/:id`) - In-memory jobs: `vibn-agent-runner/src/job-store.ts` - Next session API + runner callback: `vibn-frontend/app/api/projects/[projectId]/agent/sessions/[sessionId]/route.ts` - Session create + fire-and-forget execute: `vibn-frontend/app/api/projects/[projectId]/agent/sessions/route.ts` --- ## 13. Open decisions 1. **Single table** for sessions + jobs vs **two tables** (simpler queries vs flexibility). 2. **Seq generation**: DB sequence per `run_id` vs global monotonic with `(run_id, seq)` composite only in app logic. 3. **Idempotency**: runner retries may duplicate events—use **`event_id` UUID** from runner for dedupe on ingest. 4. **Orchestrator chat**: treat as v2 unless you need a **COO run** timeline immediately. --- *Document version: 1.0 — aligned with discussion of runner ↔ frontend telemetry, SSE-first delivery, Postgres persistence, and future multi-agent event types.*