Files

mawkone 3563b98de1 chore: clean up root directory, move docs to /docs and legacy plans to /docs_archive

2026-05-07 15:05:34 -07:00

12 KiB

Raw Blame History

Agent telemetry & live execution stream — project spec

This document captures concrete product and engineering additions discussed for Vibn: moving from poll-based session updates and in-memory jobs to a durable, ordered, push-friendly execution timeline—the web equivalent of a terminal agent’s clarity (step-by-step visibility, tool boundaries, failures, and later multi-agent signals).

1. Why this exists

Current behavior (baseline)

Surface	How progress reaches the user	Limits
Agent sessions (`agent_sessions`)	Runner `PATCH`es `output`, `status`, `changed_files` to Next; UI polls `GET …/agent/sessions/[id]`.	Latency, reconnect story, no single ordered stream; rich semantics encoded only in `text`.
Jobs (`/api/agent/run`, `/api/jobs/:id`)	In-memory `job-store` (`progress`, `toolCalls[]`); UI polls job endpoint.	Lost on restart; not shared across runner replicas; not unified with session UI.
Orchestrator / Atlas chat	Request/response to runner; advisor path may be remote URL.	No execution timeline for “long COO run” in-product unless you add the same event layer.

Product intent

Trust during long runs: users see what happened, when, and whether something was blocked—not only a final status.
Differentiation: “Ink-like” clarity in the browser—structured steps, not a blob of logs.
Foundation for multi-agent: handoffs, child work, and safety events need a common event pipe, not ad-hoc strings.

2. Goals

Append-only execution events with monotonic ordering (per session or per job), suitable for replay after refresh.
Server-push to the client (recommend SSE first; WebSocket if you need bi-directional on the same channel).
Persistence so reconnect, refresh, and horizontal scaling do not lose history.
Single conceptual model (AgentEvent) usable by:
- Build → Agent tab (sessions),
- Job flows (create/analyze-style),
- optionally orchestrator long runs later.
Backward compatibility during rollout: existing PATCH + output can remain as a fallback or be fed from the same emitter.

Non-goals (for v1)

Full OpenTelemetry export (optional later).
Real-time collaborative multi-user cursors on the same session.
Merging claude-code-fork—this spec is API + UI + persistence only.

3. Concept: `AgentEvent`

Core shape (suggested)

type AgentEvent = {
  seq: number;           // monotonic per stream (session_id or job_id)
  ts: string;            // ISO-8601
  runId: string;         // session UUID or job id — ties events to a run
  runKind: 'session' | 'job';
  phase: 'queued' | 'running' | 'completed' | 'failed' | 'stopped';

  type: AgentEventType;
  payload: Record<string, unknown>;  // type-specific
};

type AgentEventType =
  | 'run.started'
  | 'run.phase'              // e.g. planning, executing, committing
  | 'llm.turn.start'
  | 'llm.turn.end'
  | 'tool.start'
  | 'tool.end'
  | 'tool.output'            // chunked stdout/stderr if needed
  | 'safety.block'           // policy / protected path / command denied
  | 'file.changed'           // maps to today’s changed_files semantics
  | 'git.commit'
  | 'deploy.triggered'
  | 'deploy.status'
  | 'error'
  | 'run.completed'
  | 'handoff'                // v2: parent → child agent
  | 'child_job.started'      // v2: linked run id
  ;

Mapping from today’s session `outputLine`

Today (`outputLine.type`)	Suggested event(s)
`step` / `info`	`run.phase` or `llm.turn.*` with summary in `payload.message`
`stdout` / `stderr`	`tool.output` or dedicated stream events
`error`	`error` + optional `safety.block` if policy-driven
`done`	`run.completed`

Keep human-readable message on events for UI defaults; add structured fields (tool, argsSummary, durationMs) for timeline rendering and filters.

4. Architecture (high level)

flowchart LR
  subgraph runner [vibn-agent-runner]
    RA[runSessionAgent / runAgent]
    EMIT[emitAgentEvent]
  end
  subgraph api [vibn-frontend Next.js]
    ING[POST internal ingest or PATCH extend]
    DB[(Postgres agent_events)]
    SSE[SSE GET /api/.../stream]
  end
  subgraph browser [Browser]
    UI[Timeline + live log]
  end
  RA --> EMIT
  EMIT -->|HTTPS + secret or mTLS| ING
  ING --> DB
  UI -->|EventSource| SSE
  SSE --> DB

Principles

Runner remains stateless regarding “truth”: it emits events; Next + DB are the source of truth for the UI (matches today’s session model).
Alternatively, runner could expose SSE directly—usually worse for auth, CORS, and one domain for the product. Prefer Next as SSE endpoint reading from DB.

5. Backend: `vibn-agent-runner`

5.1 Emit from execution paths

Location	Action
`agent-session-runner.ts`	Replace or supplement `patchSession` output-only updates with `emitAgentEvent` each turn / tool / error.
`runAgent` / tool loop (`executeTool`)	Same emitter for job runs.
`server.ts` `/agent/execute`	Emit `run.started` after 202; `run.completed` / `error` on exit.
Security / blocked tools (`security.ts` or equivalent)	Emit `safety.block` with reason code (no secrets in payload).

5.2 Transport runner → Next

Option A (recommended): extend existing PATCH or add POST /api/internal/agent-events (or per-session batch append):

Headers: x-agent-runner-secret (same as today’s PATCH).
Body: single event or small batch { events: AgentEvent[] } with server-assigned seq to avoid races.

Option B: Runner writes to Redis/Postgres directly—couples runner to DB credentials; only do if you already run runner inside the same trust zone with DB URL.

5.3 Jobs store

Short term: continue in-memory for job metadata; persist events to Postgres keyed by jobId.
Medium term: optional Redis for job status + pub/sub to Next for low-latency SSE fanout (only if DB polling becomes a bottleneck).

6. Backend: `vibn-frontend` (Next.js)

6.1 Persistence

New table (example): agent_run_events

Column	Notes
`id`	UUID
`run_id`	Session id or job id (text)
`run_kind`	`'session' \| 'job'`
`seq`	BIGSERIAL or per-run sequence enforced with unique constraint `(run_id, seq)`
`project_id`	Nullable for jobs if not scoped
`event`	JSONB — full `AgentEvent` or `{ type, ts, payload }`
`created_at`	default now()

Index: (run_id, seq) for range queries (WHERE run_id = $1 AND seq > $lastSeen).

Optional: migrate legacy agent_sessions.output to be derived (last N lines for email export) or dual-write during transition.

6.2 SSE route (example contract)

GET /api/projects/[projectId]/agent/sessions/[sessionId]/events/stream
- Auth: session cookie / same as GET session (user must own project).
- Query: ?afterSeq=123 for replay.
- Response: text/event-stream; each message: data: {JSON}\n\n.
- Heartbeat comments every ~15–30s to keep proxies alive.

For jobs (if not project-scoped): GET /api/jobs/[jobId]/events/stream with appropriate auth.

6.3 Ingest route (runner-only)

POST /api/internal/agent-events (or nested under project/session as you prefer).
Validates x-agent-runner-secret.
Inserts rows with server-generated seq (transaction per run or advisory lock per run_id).

7. Frontend (product UI)

7.1 Agent tab — timeline

EventSource (SSE) subscription when session is running; on load, fetch historical events (GET …/events?afterSeq=0 or SSE from 0).
Timeline components:
- Group by llm.turn / tool.start–tool.end.
- Expandable tool args (sanitized).
- Distinct styling for safety.block and error.
Reconnect: on EventSource error, reopen with lastSeq from last received event.

7.2 Jobs / analyze flows

Same timeline component keyed by jobId if you surface those runs in UI.
Unifies mental model: “every run has a stream.”

7.3 Deprecate slow polling

Reduce GET …/agent/sessions/[id] poll interval when SSE connected; keep single poll for status / changed_files if those stay on session row only, or also emit file.changed events and drive UI from stream + one final consistency read.

8. Security & privacy

Never put tokens, env values, or full file contents in events by default; use truncation and hashes where needed.
safety.block: log reason code + user-safe message; align with security.ts behavior.
Rate limits on ingest endpoint (per run_id / per IP) to avoid abuse if misconfigured.

9. Environment variables

Variable	Where	Purpose
`AGENT_RUNNER_SECRET`	Runner + Next	Ingest / extended PATCH auth
`VIBN_API_URL`	Runner	Base URL for callbacks
`AGENT_RUNNER_URL`	Next	Start runs (unchanged)

Add if needed:

Variable	Purpose
`AGENT_EVENTS_INGEST_PATH`	Optional override for ingest URL
`SSE_MAX_BUFFER`	Cap replay batch size

10. Phased roadmap (suggested)

Phase 1 — Foundation

Define AgentEvent TypeScript types in a shared package or duplicated minimal types in runner + frontend.
Create agent_run_events (or equivalent) + migration.
Implement ingest endpoint; wire runner session path to emit core events: run.started, tool.start / tool.end, error, run.completed, file.changed.
Dual-write: keep existing PATCH outputLine so nothing breaks.

Phase 2 — Push

SSE route + EventSource in Agent tab.
Backfill UI from DB on mount; then live tail.
Lower or gate polling on GET session.

Phase 3 — Jobs + durability

Emit same events from job execution path; persist by jobId.
Optional: replace in-memory job list with DB for multi-instance runner (later).

Phase 4 — Rich semantics

safety.block from policy layer.
deploy.* events if Coolify integration is user-visible.
Multi-agent: handoff, child_job.* with links in payload.

11. Success metrics

Time-to-first-visible-step after Run < 1s p95 (SSE).
After hard refresh mid-run, user sees consistent history (no duplicate seq, no gaps if you guarantee at-least-once ingest with idempotency keys later).
Support tickets / confusion drops on “what is the agent doing?” (qualitative).

Use these when implementing:

Runner session loop + PATCH bridge: vibn-agent-runner/src/agent-session-runner.ts
Runner HTTP: vibn-agent-runner/src/server.ts (/agent/execute, /agent/stop, /agent/approve, /api/agent/run, /api/jobs/:id)
In-memory jobs: vibn-agent-runner/src/job-store.ts
Next session API + runner callback: vibn-frontend/app/api/projects/[projectId]/agent/sessions/[sessionId]/route.ts
Session create + fire-and-forget execute: vibn-frontend/app/api/projects/[projectId]/agent/sessions/route.ts

13. Open decisions

Single table for sessions + jobs vs two tables (simpler queries vs flexibility).
Seq generation: DB sequence per run_id vs global monotonic with (run_id, seq) composite only in app logic.
Idempotency: runner retries may duplicate events—use event_id UUID from runner for dedupe on ingest.
Orchestrator chat: treat as v2 unless you need a COO run timeline immediately.

Document version: 1.0 — aligned with discussion of runner ↔ frontend telemetry, SSE-first delivery, Postgres persistence, and future multi-agent event types.

12 KiB Raw Blame History Unescape Escape