This repository has been archived on 2026-06-07. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
master-ai/docs/AGENT_TELEMETRY_STREAMING_PROJECT.md

12 KiB
Raw Blame History

Agent telemetry & live execution stream — project spec

This document captures concrete product and engineering additions discussed for Vibn: moving from poll-based session updates and in-memory jobs to a durable, ordered, push-friendly execution timeline—the web equivalent of a terminal agents clarity (step-by-step visibility, tool boundaries, failures, and later multi-agent signals).


1. Why this exists

Current behavior (baseline)

Surface How progress reaches the user Limits
Agent sessions (agent_sessions) Runner PATCHes output, status, changed_files to Next; UI polls GET …/agent/sessions/[id]. Latency, reconnect story, no single ordered stream; rich semantics encoded only in text.
Jobs (/api/agent/run, /api/jobs/:id) In-memory job-store (progress, toolCalls[]); UI polls job endpoint. Lost on restart; not shared across runner replicas; not unified with session UI.
Orchestrator / Atlas chat Request/response to runner; advisor path may be remote URL. No execution timeline for “long COO run” in-product unless you add the same event layer.

Product intent

  • Trust during long runs: users see what happened, when, and whether something was blocked—not only a final status.
  • Differentiation: “Ink-like” clarity in the browser—structured steps, not a blob of logs.
  • Foundation for multi-agent: handoffs, child work, and safety events need a common event pipe, not ad-hoc strings.

2. Goals

  1. Append-only execution events with monotonic ordering (per session or per job), suitable for replay after refresh.
  2. Server-push to the client (recommend SSE first; WebSocket if you need bi-directional on the same channel).
  3. Persistence so reconnect, refresh, and horizontal scaling do not lose history.
  4. Single conceptual model (AgentEvent) usable by:
    • Build → Agent tab (sessions),
    • Job flows (create/analyze-style),
    • optionally orchestrator long runs later.
  5. Backward compatibility during rollout: existing PATCH + output can remain as a fallback or be fed from the same emitter.

Non-goals (for v1)

  • Full OpenTelemetry export (optional later).
  • Real-time collaborative multi-user cursors on the same session.
  • Merging claude-code-fork—this spec is API + UI + persistence only.

3. Concept: AgentEvent

Core shape (suggested)

type AgentEvent = {
  seq: number;           // monotonic per stream (session_id or job_id)
  ts: string;            // ISO-8601
  runId: string;         // session UUID or job id — ties events to a run
  runKind: 'session' | 'job';
  phase: 'queued' | 'running' | 'completed' | 'failed' | 'stopped';

  type: AgentEventType;
  payload: Record<string, unknown>;  // type-specific
};

type AgentEventType =
  | 'run.started'
  | 'run.phase'              // e.g. planning, executing, committing
  | 'llm.turn.start'
  | 'llm.turn.end'
  | 'tool.start'
  | 'tool.end'
  | 'tool.output'            // chunked stdout/stderr if needed
  | 'safety.block'           // policy / protected path / command denied
  | 'file.changed'           // maps to todays changed_files semantics
  | 'git.commit'
  | 'deploy.triggered'
  | 'deploy.status'
  | 'error'
  | 'run.completed'
  | 'handoff'                // v2: parent → child agent
  | 'child_job.started'      // v2: linked run id
  ;

Mapping from todays session outputLine

Today (outputLine.type) Suggested event(s)
step / info run.phase or llm.turn.* with summary in payload.message
stdout / stderr tool.output or dedicated stream events
error error + optional safety.block if policy-driven
done run.completed

Keep human-readable message on events for UI defaults; add structured fields (tool, argsSummary, durationMs) for timeline rendering and filters.


4. Architecture (high level)

flowchart LR
  subgraph runner [vibn-agent-runner]
    RA[runSessionAgent / runAgent]
    EMIT[emitAgentEvent]
  end
  subgraph api [vibn-frontend Next.js]
    ING[POST internal ingest or PATCH extend]
    DB[(Postgres agent_events)]
    SSE[SSE GET /api/.../stream]
  end
  subgraph browser [Browser]
    UI[Timeline + live log]
  end
  RA --> EMIT
  EMIT -->|HTTPS + secret or mTLS| ING
  ING --> DB
  UI -->|EventSource| SSE
  SSE --> DB

Principles

  • Runner remains stateless regarding “truth”: it emits events; Next + DB are the source of truth for the UI (matches todays session model).
  • Alternatively, runner could expose SSE directly—usually worse for auth, CORS, and one domain for the product. Prefer Next as SSE endpoint reading from DB.

5. Backend: vibn-agent-runner

5.1 Emit from execution paths

Location Action
agent-session-runner.ts Replace or supplement patchSession output-only updates with emitAgentEvent each turn / tool / error.
runAgent / tool loop (executeTool) Same emitter for job runs.
server.ts /agent/execute Emit run.started after 202; run.completed / error on exit.
Security / blocked tools (security.ts or equivalent) Emit safety.block with reason code (no secrets in payload).

5.2 Transport runner → Next

Option A (recommended): extend existing PATCH or add POST /api/internal/agent-events (or per-session batch append):

  • Headers: x-agent-runner-secret (same as todays PATCH).
  • Body: single event or small batch { events: AgentEvent[] } with server-assigned seq to avoid races.

Option B: Runner writes to Redis/Postgres directly—couples runner to DB credentials; only do if you already run runner inside the same trust zone with DB URL.

5.3 Jobs store

  • Short term: continue in-memory for job metadata; persist events to Postgres keyed by jobId.
  • Medium term: optional Redis for job status + pub/sub to Next for low-latency SSE fanout (only if DB polling becomes a bottleneck).

6. Backend: vibn-frontend (Next.js)

6.1 Persistence

New table (example): agent_run_events

Column Notes
id UUID
run_id Session id or job id (text)
run_kind 'session' | 'job'
seq BIGSERIAL or per-run sequence enforced with unique constraint (run_id, seq)
project_id Nullable for jobs if not scoped
event JSONB — full AgentEvent or { type, ts, payload }
created_at default now()

Index: (run_id, seq) for range queries (WHERE run_id = $1 AND seq > $lastSeen).

Optional: migrate legacy agent_sessions.output to be derived (last N lines for email export) or dual-write during transition.

6.2 SSE route (example contract)

  • GET /api/projects/[projectId]/agent/sessions/[sessionId]/events/stream
    • Auth: session cookie / same as GET session (user must own project).
    • Query: ?afterSeq=123 for replay.
    • Response: text/event-stream; each message: data: {JSON}\n\n.
    • Heartbeat comments every ~1530s to keep proxies alive.

For jobs (if not project-scoped): GET /api/jobs/[jobId]/events/stream with appropriate auth.

6.3 Ingest route (runner-only)

  • POST /api/internal/agent-events (or nested under project/session as you prefer).
  • Validates x-agent-runner-secret.
  • Inserts rows with server-generated seq (transaction per run or advisory lock per run_id).

7. Frontend (product UI)

7.1 Agent tab — timeline

  • EventSource (SSE) subscription when session is running; on load, fetch historical events (GET …/events?afterSeq=0 or SSE from 0).
  • Timeline components:
    • Group by llm.turn / tool.starttool.end.
    • Expandable tool args (sanitized).
    • Distinct styling for safety.block and error.
  • Reconnect: on EventSource error, reopen with lastSeq from last received event.

7.2 Jobs / analyze flows

  • Same timeline component keyed by jobId if you surface those runs in UI.
  • Unifies mental model: “every run has a stream.”

7.3 Deprecate slow polling

  • Reduce GET …/agent/sessions/[id] poll interval when SSE connected; keep single poll for status / changed_files if those stay on session row only, or also emit file.changed events and drive UI from stream + one final consistency read.

8. Security & privacy

  • Never put tokens, env values, or full file contents in events by default; use truncation and hashes where needed.
  • safety.block: log reason code + user-safe message; align with security.ts behavior.
  • Rate limits on ingest endpoint (per run_id / per IP) to avoid abuse if misconfigured.

9. Environment variables

Variable Where Purpose
AGENT_RUNNER_SECRET Runner + Next Ingest / extended PATCH auth
VIBN_API_URL Runner Base URL for callbacks
AGENT_RUNNER_URL Next Start runs (unchanged)

Add if needed:

Variable Purpose
AGENT_EVENTS_INGEST_PATH Optional override for ingest URL
SSE_MAX_BUFFER Cap replay batch size

10. Phased roadmap (suggested)

Phase 1 — Foundation

  • Define AgentEvent TypeScript types in a shared package or duplicated minimal types in runner + frontend.
  • Create agent_run_events (or equivalent) + migration.
  • Implement ingest endpoint; wire runner session path to emit core events: run.started, tool.start / tool.end, error, run.completed, file.changed.
  • Dual-write: keep existing PATCH outputLine so nothing breaks.

Phase 2 — Push

  • SSE route + EventSource in Agent tab.
  • Backfill UI from DB on mount; then live tail.
  • Lower or gate polling on GET session.

Phase 3 — Jobs + durability

  • Emit same events from job execution path; persist by jobId.
  • Optional: replace in-memory job list with DB for multi-instance runner (later).

Phase 4 — Rich semantics

  • safety.block from policy layer.
  • deploy.* events if Coolify integration is user-visible.
  • Multi-agent: handoff, child_job.* with links in payload.

11. Success metrics

  • Time-to-first-visible-step after Run < 1s p95 (SSE).
  • After hard refresh mid-run, user sees consistent history (no duplicate seq, no gaps if you guarantee at-least-once ingest with idempotency keys later).
  • Support tickets / confusion drops on “what is the agent doing?” (qualitative).

Use these when implementing:

  • Runner session loop + PATCH bridge: vibn-agent-runner/src/agent-session-runner.ts
  • Runner HTTP: vibn-agent-runner/src/server.ts (/agent/execute, /agent/stop, /agent/approve, /api/agent/run, /api/jobs/:id)
  • In-memory jobs: vibn-agent-runner/src/job-store.ts
  • Next session API + runner callback: vibn-frontend/app/api/projects/[projectId]/agent/sessions/[sessionId]/route.ts
  • Session create + fire-and-forget execute: vibn-frontend/app/api/projects/[projectId]/agent/sessions/route.ts

13. Open decisions

  1. Single table for sessions + jobs vs two tables (simpler queries vs flexibility).
  2. Seq generation: DB sequence per run_id vs global monotonic with (run_id, seq) composite only in app logic.
  3. Idempotency: runner retries may duplicate events—use event_id UUID from runner for dedupe on ingest.
  4. Orchestrator chat: treat as v2 unless you need a COO run timeline immediately.

Document version: 1.0 — aligned with discussion of runner ↔ frontend telemetry, SSE-first delivery, Postgres persistence, and future multi-agent event types.