Files
vibn-frontend/AGENT_TELEMETRY_STREAMING_PROJECT.md
mawkone 99deb546c8 Rip out Theia, bump submodules, retire platform/ scaffold, snapshot docs + design assets
Theia rip-out (parent):
- Remove theia submodule entry (the local fork, Gitea repo, Coolify app,
  Cloud Run services, and Artifact Registry image are all gone)
- Drop README.md + INFRASTRUCTURE.md (obsolete "Project OS" snapshots
  that also leaked API tokens) and setup.sh (Theia clone bootstrap)
- Delete UI-DESIGN-GUIDE.md, BACKEND_AGENTS_PLAN.md, VIBN_BUILD_PLAN.md,
  VISUAL_EDITOR_PLAN.md, core-packages.md, ai-packages.md, tools-list.md
  (all 100% Theia-specific or superseded)
- Surgical scrubs of remaining Theia mentions in
  AGENT_EXECUTION_ARCHITECTURE.md and TURBOREPO_MIGRATION_PLAN.md

Submodule bumps:
- vibn-agent-runner: Theia rip-out + MCP refactor (api/wrapper/server
  pattern across shell/file/git/memory/prd/search/agent/gitea/coolify)
- vibn-frontend: Theia rip-out + P5.1 attach E2E + Justine UI WIP

Retire platform/ scaffold:
- Remove platform/backend/ (control-plane, executors, mcp-adapter),
  platform/client-ide/ (gcp-productos extension), platform/contracts/,
  platform/infra/terraform/, platform/scripts/templates/turborepo/
  (replaced by vibn-agent-runner + vibn-frontend + Coolify direct)
- Drop architecture.md, technical_spec.md, vision-ext.md,
  "1.Generate Control Plane API scaffold.md" (same era)

Docs / planning snapshots (new):
- AI_CAPABILITIES.md, AI_CAPABILITIES_ROADMAP.md
- AGENT_TELEMETRY_STREAMING_PROJECT.md
- VIBN_PRD.md, product-idea-a.md

Design assets (new):
- branding/{coolify,gitea,ux-testing}/ static brand collateral
- justine/ HTML mockups for the new onboarding/build flows
- preview-assist-ui/ Vite scratch app
- master-ai.code-workspace

Infra helpers (new):
- setup-coolify-montreal.sh provisioner
- gitea-docker-compose.yml
- vibn-coolify-schema.sql for the Coolify Postgres extensions
- prd-agent-prompt.pdf, prompt, root.txt, remixed-9edec9e9.tsx scratch
- flatten.sh helper

.gitignore: ignore **/node_modules, **/.next, **/.turbo, **/coverage

Made-with: Cursor
2026-04-22 18:06:37 -07:00

293 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Agent telemetry & live execution stream — project spec
This document captures **concrete product and engineering additions** discussed for Vibn: moving from **poll-based session updates** and **in-memory jobs** to a **durable, ordered, push-friendly execution timeline**—the web equivalent of a terminal agents clarity (step-by-step visibility, tool boundaries, failures, and later multi-agent signals).
---
## 1. Why this exists
### Current behavior (baseline)
| Surface | How progress reaches the user | Limits |
|--------|------------------------------|--------|
| **Agent sessions** (`agent_sessions`) | Runner `PATCH`es `output`, `status`, `changed_files` to Next; UI **polls** `GET …/agent/sessions/[id]`. | Latency, reconnect story, no single ordered stream; rich semantics encoded only in `text`. |
| **Jobs** (`/api/agent/run`, `/api/jobs/:id`) | In-memory `job-store` (`progress`, `toolCalls[]`); UI polls job endpoint. | Lost on restart; not shared across runner replicas; not unified with session UI. |
| **Orchestrator / Atlas chat** | Request/response to runner; advisor path may be remote URL. | No execution timeline for “long COO run” in-product unless you add the same event layer. |
### Product intent
- **Trust during long runs**: users see *what* happened, *when*, and *whether something was blocked*—not only a final status.
- **Differentiation**: “Ink-like” clarity in the browser—structured steps, not a blob of logs.
- **Foundation for multi-agent**: handoffs, child work, and safety events need a **common event pipe**, not ad-hoc strings.
---
## 2. Goals
1. **Append-only execution events** with **monotonic ordering** (per session or per job), suitable for replay after refresh.
2. **Server-push to the client** (recommend **SSE** first; WebSocket if you need bi-directional on the same channel).
3. **Persistence** so reconnect, refresh, and horizontal scaling do not lose history.
4. **Single conceptual model** (`AgentEvent`) usable by:
- Build → **Agent** tab (sessions),
- **Job** flows (create/analyze-style),
- optionally **orchestrator** long runs later.
5. **Backward compatibility** during rollout: existing `PATCH` + `output` can remain as a fallback or be fed from the same emitter.
### Non-goals (for v1)
- Full **OpenTelemetry** export (optional later).
- **Real-time collaborative** multi-user cursors on the same session.
- Merging **claude-code-fork**—this spec is **API + UI + persistence** only.
---
## 3. Concept: `AgentEvent`
### Core shape (suggested)
```ts
type AgentEvent = {
seq: number; // monotonic per stream (session_id or job_id)
ts: string; // ISO-8601
runId: string; // session UUID or job id — ties events to a run
runKind: 'session' | 'job';
phase: 'queued' | 'running' | 'completed' | 'failed' | 'stopped';
type: AgentEventType;
payload: Record<string, unknown>; // type-specific
};
type AgentEventType =
| 'run.started'
| 'run.phase' // e.g. planning, executing, committing
| 'llm.turn.start'
| 'llm.turn.end'
| 'tool.start'
| 'tool.end'
| 'tool.output' // chunked stdout/stderr if needed
| 'safety.block' // policy / protected path / command denied
| 'file.changed' // maps to todays changed_files semantics
| 'git.commit'
| 'deploy.triggered'
| 'deploy.status'
| 'error'
| 'run.completed'
| 'handoff' // v2: parent → child agent
| 'child_job.started' // v2: linked run id
;
```
### Mapping from todays session `outputLine`
| Today (`outputLine.type`) | Suggested event(s) |
|---------------------------|--------------------|
| `step` / `info` | `run.phase` or `llm.turn.*` with summary in `payload.message` |
| `stdout` / `stderr` | `tool.output` or dedicated stream events |
| `error` | `error` + optional `safety.block` if policy-driven |
| `done` | `run.completed` |
Keep **human-readable `message`** on events for UI defaults; add **structured fields** (`tool`, `argsSummary`, `durationMs`) for timeline rendering and filters.
---
## 4. Architecture (high level)
```mermaid
flowchart LR
subgraph runner [vibn-agent-runner]
RA[runSessionAgent / runAgent]
EMIT[emitAgentEvent]
end
subgraph api [vibn-frontend Next.js]
ING[POST internal ingest or PATCH extend]
DB[(Postgres agent_events)]
SSE[SSE GET /api/.../stream]
end
subgraph browser [Browser]
UI[Timeline + live log]
end
RA --> EMIT
EMIT -->|HTTPS + secret or mTLS| ING
ING --> DB
UI -->|EventSource| SSE
SSE --> DB
```
**Principles**
- **Runner remains stateless** regarding “truth”: it emits events; **Next + DB** are the source of truth for the UI (matches todays session model).
- Alternatively, runner could expose **SSE directly**—usually worse for **auth**, **CORS**, and **one domain** for the product. Prefer **Next as SSE endpoint** reading from DB.
---
## 5. Backend: `vibn-agent-runner`
### 5.1 Emit from execution paths
| Location | Action |
|----------|--------|
| `agent-session-runner.ts` | Replace or supplement `patchSession` output-only updates with **`emitAgentEvent`** each turn / tool / error. |
| `runAgent` / tool loop (`executeTool`) | Same emitter for **job** runs. |
| `server.ts` `/agent/execute` | Emit `run.started` after 202; `run.completed` / `error` on exit. |
| Security / blocked tools (`security.ts` or equivalent) | Emit `safety.block` with reason code (no secrets in payload). |
### 5.2 Transport runner → Next
**Option A (recommended):** extend existing **PATCH** or add **`POST /api/internal/agent-events`** (or per-session batch append):
- Headers: `x-agent-runner-secret` (same as todays PATCH).
- Body: single event or small batch `{ events: AgentEvent[] }` with server-assigned `seq` to avoid races.
**Option B:** Runner writes to **Redis/Postgres** directly—couples runner to DB credentials; only do if you already run runner inside the same trust zone with DB URL.
### 5.3 Jobs store
- **Short term:** continue in-memory for job metadata; **persist events** to Postgres keyed by `jobId`.
- **Medium term:** optional **Redis** for job status + pub/sub to Next for low-latency SSE fanout (only if DB polling becomes a bottleneck).
---
## 6. Backend: `vibn-frontend` (Next.js)
### 6.1 Persistence
**New table (example): `agent_run_events`**
| Column | Notes |
|--------|--------|
| `id` | UUID |
| `run_id` | Session id or job id (text) |
| `run_kind` | `'session' \| 'job'` |
| `seq` | BIGSERIAL or per-run sequence enforced with unique constraint `(run_id, seq)` |
| `project_id` | Nullable for jobs if not scoped |
| `event` | JSONB — full `AgentEvent` or `{ type, ts, payload }` |
| `created_at` | default now() |
Index: `(run_id, seq)` for range queries (`WHERE run_id = $1 AND seq > $lastSeen`).
**Optional:** migrate legacy `agent_sessions.output` to be **derived** (last N lines for email export) or **dual-write** during transition.
### 6.2 SSE route (example contract)
- **`GET /api/projects/[projectId]/agent/sessions/[sessionId]/events/stream`**
- Auth: session cookie / same as GET session (user must own project).
- Query: `?afterSeq=123` for replay.
- Response: `text/event-stream`; each message: `data: {JSON}\n\n`.
- Heartbeat comments every ~1530s to keep proxies alive.
For **jobs** (if not project-scoped): `GET /api/jobs/[jobId]/events/stream` with appropriate auth.
### 6.3 Ingest route (runner-only)
- **`POST /api/internal/agent-events`** (or nested under project/session as you prefer).
- Validates `x-agent-runner-secret`.
- Inserts rows with **server-generated `seq`** (transaction per run or advisory lock per `run_id`).
---
## 7. Frontend (product UI)
### 7.1 Agent tab — timeline
- **EventSource** (SSE) subscription when session is `running`; on load, **fetch historical** events (`GET …/events?afterSeq=0` or SSE from 0).
- **Timeline components**:
- Group by `llm.turn` / `tool.start``tool.end`.
- Expandable tool args (sanitized).
- Distinct styling for `safety.block` and `error`.
- **Reconnect**: on `EventSource` error, reopen with `lastSeq` from last received event.
### 7.2 Jobs / analyze flows
- Same timeline component keyed by `jobId` if you surface those runs in UI.
- Unifies mental model: “every run has a stream.”
### 7.3 Deprecate slow polling
- Reduce `GET …/agent/sessions/[id]` poll interval when SSE connected; keep **single poll** for `status` / `changed_files` if those stay on session row only, or **also** emit `file.changed` events and drive UI from stream + one final consistency read.
---
## 8. Security & privacy
- **Never** put tokens, env values, or full file contents in events by default; use **truncation** and **hashes** where needed.
- **`safety.block`**: log reason **code** + user-safe message; align with `security.ts` behavior.
- **Rate limits** on ingest endpoint (per `run_id` / per IP) to avoid abuse if misconfigured.
---
## 9. Environment variables
| Variable | Where | Purpose |
|----------|--------|---------|
| `AGENT_RUNNER_SECRET` | Runner + Next | Ingest / extended PATCH auth |
| `VIBN_API_URL` | Runner | Base URL for callbacks |
| `AGENT_RUNNER_URL` | Next | Start runs (unchanged) |
Add if needed:
| Variable | Purpose |
|----------|---------|
| `AGENT_EVENTS_INGEST_PATH` | Optional override for ingest URL |
| `SSE_MAX_BUFFER` | Cap replay batch size |
---
## 10. Phased roadmap (suggested)
### Phase 1 — Foundation
- [ ] Define `AgentEvent` TypeScript types in a **shared package** or duplicated minimal types in runner + frontend.
- [ ] Create `agent_run_events` (or equivalent) + migration.
- [ ] Implement **ingest** endpoint; wire **runner session path** to emit core events: `run.started`, `tool.start` / `tool.end`, `error`, `run.completed`, `file.changed`.
- [ ] **Dual-write**: keep existing `PATCH` `outputLine` so nothing breaks.
### Phase 2 — Push
- [ ] SSE route + **EventSource** in Agent tab.
- [ ] Backfill UI from DB on mount; then live tail.
- [ ] Lower or gate polling on `GET` session.
### Phase 3 — Jobs + durability
- [ ] Emit same events from **job** execution path; persist by `jobId`.
- [ ] Optional: replace in-memory job list with DB for **multi-instance** runner (later).
### Phase 4 — Rich semantics
- [ ] `safety.block` from policy layer.
- [ ] `deploy.*` events if Coolify integration is user-visible.
- [ ] **Multi-agent**: `handoff`, `child_job.*` with links in payload.
---
## 11. Success metrics
- Time-to-first-visible-step after **Run** &lt; **1s** p95 (SSE).
- After hard refresh mid-run, user sees **consistent history** (no duplicate seq, no gaps if you guarantee at-least-once ingest with idempotency keys later).
- Support tickets / confusion drops on “what is the agent doing?” (qualitative).
---
## 12. Related code (repo anchors)
Use these when implementing:
- Runner session loop + PATCH bridge: `vibn-agent-runner/src/agent-session-runner.ts`
- Runner HTTP: `vibn-agent-runner/src/server.ts` (`/agent/execute`, `/agent/stop`, `/agent/approve`, `/api/agent/run`, `/api/jobs/:id`)
- In-memory jobs: `vibn-agent-runner/src/job-store.ts`
- Next session API + runner callback: `vibn-frontend/app/api/projects/[projectId]/agent/sessions/[sessionId]/route.ts`
- Session create + fire-and-forget execute: `vibn-frontend/app/api/projects/[projectId]/agent/sessions/route.ts`
---
## 13. Open decisions
1. **Single table** for sessions + jobs vs **two tables** (simpler queries vs flexibility).
2. **Seq generation**: DB sequence per `run_id` vs global monotonic with `(run_id, seq)` composite only in app logic.
3. **Idempotency**: runner retries may duplicate events—use **`event_id` UUID** from runner for dedupe on ingest.
4. **Orchestrator chat**: treat as v2 unless you need a **COO run** timeline immediately.
---
*Document version: 1.0 — aligned with discussion of runner ↔ frontend telemetry, SSE-first delivery, Postgres persistence, and future multi-agent event types.*