Theia rip-out (parent):
- Remove theia submodule entry (the local fork, Gitea repo, Coolify app,
Cloud Run services, and Artifact Registry image are all gone)
- Drop README.md + INFRASTRUCTURE.md (obsolete "Project OS" snapshots
that also leaked API tokens) and setup.sh (Theia clone bootstrap)
- Delete UI-DESIGN-GUIDE.md, BACKEND_AGENTS_PLAN.md, VIBN_BUILD_PLAN.md,
VISUAL_EDITOR_PLAN.md, core-packages.md, ai-packages.md, tools-list.md
(all 100% Theia-specific or superseded)
- Surgical scrubs of remaining Theia mentions in
AGENT_EXECUTION_ARCHITECTURE.md and TURBOREPO_MIGRATION_PLAN.md
Submodule bumps:
- vibn-agent-runner: Theia rip-out + MCP refactor (api/wrapper/server
pattern across shell/file/git/memory/prd/search/agent/gitea/coolify)
- vibn-frontend: Theia rip-out + P5.1 attach E2E + Justine UI WIP
Retire platform/ scaffold:
- Remove platform/backend/ (control-plane, executors, mcp-adapter),
platform/client-ide/ (gcp-productos extension), platform/contracts/,
platform/infra/terraform/, platform/scripts/templates/turborepo/
(replaced by vibn-agent-runner + vibn-frontend + Coolify direct)
- Drop architecture.md, technical_spec.md, vision-ext.md,
"1.Generate Control Plane API scaffold.md" (same era)
Docs / planning snapshots (new):
- AI_CAPABILITIES.md, AI_CAPABILITIES_ROADMAP.md
- AGENT_TELEMETRY_STREAMING_PROJECT.md
- VIBN_PRD.md, product-idea-a.md
Design assets (new):
- branding/{coolify,gitea,ux-testing}/ static brand collateral
- justine/ HTML mockups for the new onboarding/build flows
- preview-assist-ui/ Vite scratch app
- master-ai.code-workspace
Infra helpers (new):
- setup-coolify-montreal.sh provisioner
- gitea-docker-compose.yml
- vibn-coolify-schema.sql for the Coolify Postgres extensions
- prd-agent-prompt.pdf, prompt, root.txt, remixed-9edec9e9.tsx scratch
- flatten.sh helper
.gitignore: ignore **/node_modules, **/.next, **/.turbo, **/coverage
Made-with: Cursor
293 lines
12 KiB
Markdown
293 lines
12 KiB
Markdown
# Agent telemetry & live execution stream — project spec
|
||
|
||
This document captures **concrete product and engineering additions** discussed for Vibn: moving from **poll-based session updates** and **in-memory jobs** to a **durable, ordered, push-friendly execution timeline**—the web equivalent of a terminal agent’s clarity (step-by-step visibility, tool boundaries, failures, and later multi-agent signals).
|
||
|
||
---
|
||
|
||
## 1. Why this exists
|
||
|
||
### Current behavior (baseline)
|
||
|
||
| Surface | How progress reaches the user | Limits |
|
||
|--------|------------------------------|--------|
|
||
| **Agent sessions** (`agent_sessions`) | Runner `PATCH`es `output`, `status`, `changed_files` to Next; UI **polls** `GET …/agent/sessions/[id]`. | Latency, reconnect story, no single ordered stream; rich semantics encoded only in `text`. |
|
||
| **Jobs** (`/api/agent/run`, `/api/jobs/:id`) | In-memory `job-store` (`progress`, `toolCalls[]`); UI polls job endpoint. | Lost on restart; not shared across runner replicas; not unified with session UI. |
|
||
| **Orchestrator / Atlas chat** | Request/response to runner; advisor path may be remote URL. | No execution timeline for “long COO run” in-product unless you add the same event layer. |
|
||
|
||
### Product intent
|
||
|
||
- **Trust during long runs**: users see *what* happened, *when*, and *whether something was blocked*—not only a final status.
|
||
- **Differentiation**: “Ink-like” clarity in the browser—structured steps, not a blob of logs.
|
||
- **Foundation for multi-agent**: handoffs, child work, and safety events need a **common event pipe**, not ad-hoc strings.
|
||
|
||
---
|
||
|
||
## 2. Goals
|
||
|
||
1. **Append-only execution events** with **monotonic ordering** (per session or per job), suitable for replay after refresh.
|
||
2. **Server-push to the client** (recommend **SSE** first; WebSocket if you need bi-directional on the same channel).
|
||
3. **Persistence** so reconnect, refresh, and horizontal scaling do not lose history.
|
||
4. **Single conceptual model** (`AgentEvent`) usable by:
|
||
- Build → **Agent** tab (sessions),
|
||
- **Job** flows (create/analyze-style),
|
||
- optionally **orchestrator** long runs later.
|
||
5. **Backward compatibility** during rollout: existing `PATCH` + `output` can remain as a fallback or be fed from the same emitter.
|
||
|
||
### Non-goals (for v1)
|
||
|
||
- Full **OpenTelemetry** export (optional later).
|
||
- **Real-time collaborative** multi-user cursors on the same session.
|
||
- Merging **claude-code-fork**—this spec is **API + UI + persistence** only.
|
||
|
||
---
|
||
|
||
## 3. Concept: `AgentEvent`
|
||
|
||
### Core shape (suggested)
|
||
|
||
```ts
|
||
type AgentEvent = {
|
||
seq: number; // monotonic per stream (session_id or job_id)
|
||
ts: string; // ISO-8601
|
||
runId: string; // session UUID or job id — ties events to a run
|
||
runKind: 'session' | 'job';
|
||
phase: 'queued' | 'running' | 'completed' | 'failed' | 'stopped';
|
||
|
||
type: AgentEventType;
|
||
payload: Record<string, unknown>; // type-specific
|
||
};
|
||
|
||
type AgentEventType =
|
||
| 'run.started'
|
||
| 'run.phase' // e.g. planning, executing, committing
|
||
| 'llm.turn.start'
|
||
| 'llm.turn.end'
|
||
| 'tool.start'
|
||
| 'tool.end'
|
||
| 'tool.output' // chunked stdout/stderr if needed
|
||
| 'safety.block' // policy / protected path / command denied
|
||
| 'file.changed' // maps to today’s changed_files semantics
|
||
| 'git.commit'
|
||
| 'deploy.triggered'
|
||
| 'deploy.status'
|
||
| 'error'
|
||
| 'run.completed'
|
||
| 'handoff' // v2: parent → child agent
|
||
| 'child_job.started' // v2: linked run id
|
||
;
|
||
```
|
||
|
||
### Mapping from today’s session `outputLine`
|
||
|
||
| Today (`outputLine.type`) | Suggested event(s) |
|
||
|---------------------------|--------------------|
|
||
| `step` / `info` | `run.phase` or `llm.turn.*` with summary in `payload.message` |
|
||
| `stdout` / `stderr` | `tool.output` or dedicated stream events |
|
||
| `error` | `error` + optional `safety.block` if policy-driven |
|
||
| `done` | `run.completed` |
|
||
|
||
Keep **human-readable `message`** on events for UI defaults; add **structured fields** (`tool`, `argsSummary`, `durationMs`) for timeline rendering and filters.
|
||
|
||
---
|
||
|
||
## 4. Architecture (high level)
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
subgraph runner [vibn-agent-runner]
|
||
RA[runSessionAgent / runAgent]
|
||
EMIT[emitAgentEvent]
|
||
end
|
||
subgraph api [vibn-frontend Next.js]
|
||
ING[POST internal ingest or PATCH extend]
|
||
DB[(Postgres agent_events)]
|
||
SSE[SSE GET /api/.../stream]
|
||
end
|
||
subgraph browser [Browser]
|
||
UI[Timeline + live log]
|
||
end
|
||
RA --> EMIT
|
||
EMIT -->|HTTPS + secret or mTLS| ING
|
||
ING --> DB
|
||
UI -->|EventSource| SSE
|
||
SSE --> DB
|
||
```
|
||
|
||
**Principles**
|
||
|
||
- **Runner remains stateless** regarding “truth”: it emits events; **Next + DB** are the source of truth for the UI (matches today’s session model).
|
||
- Alternatively, runner could expose **SSE directly**—usually worse for **auth**, **CORS**, and **one domain** for the product. Prefer **Next as SSE endpoint** reading from DB.
|
||
|
||
---
|
||
|
||
## 5. Backend: `vibn-agent-runner`
|
||
|
||
### 5.1 Emit from execution paths
|
||
|
||
| Location | Action |
|
||
|----------|--------|
|
||
| `agent-session-runner.ts` | Replace or supplement `patchSession` output-only updates with **`emitAgentEvent`** each turn / tool / error. |
|
||
| `runAgent` / tool loop (`executeTool`) | Same emitter for **job** runs. |
|
||
| `server.ts` `/agent/execute` | Emit `run.started` after 202; `run.completed` / `error` on exit. |
|
||
| Security / blocked tools (`security.ts` or equivalent) | Emit `safety.block` with reason code (no secrets in payload). |
|
||
|
||
### 5.2 Transport runner → Next
|
||
|
||
**Option A (recommended):** extend existing **PATCH** or add **`POST /api/internal/agent-events`** (or per-session batch append):
|
||
|
||
- Headers: `x-agent-runner-secret` (same as today’s PATCH).
|
||
- Body: single event or small batch `{ events: AgentEvent[] }` with server-assigned `seq` to avoid races.
|
||
|
||
**Option B:** Runner writes to **Redis/Postgres** directly—couples runner to DB credentials; only do if you already run runner inside the same trust zone with DB URL.
|
||
|
||
### 5.3 Jobs store
|
||
|
||
- **Short term:** continue in-memory for job metadata; **persist events** to Postgres keyed by `jobId`.
|
||
- **Medium term:** optional **Redis** for job status + pub/sub to Next for low-latency SSE fanout (only if DB polling becomes a bottleneck).
|
||
|
||
---
|
||
|
||
## 6. Backend: `vibn-frontend` (Next.js)
|
||
|
||
### 6.1 Persistence
|
||
|
||
**New table (example): `agent_run_events`**
|
||
|
||
| Column | Notes |
|
||
|--------|--------|
|
||
| `id` | UUID |
|
||
| `run_id` | Session id or job id (text) |
|
||
| `run_kind` | `'session' \| 'job'` |
|
||
| `seq` | BIGSERIAL or per-run sequence enforced with unique constraint `(run_id, seq)` |
|
||
| `project_id` | Nullable for jobs if not scoped |
|
||
| `event` | JSONB — full `AgentEvent` or `{ type, ts, payload }` |
|
||
| `created_at` | default now() |
|
||
|
||
Index: `(run_id, seq)` for range queries (`WHERE run_id = $1 AND seq > $lastSeen`).
|
||
|
||
**Optional:** migrate legacy `agent_sessions.output` to be **derived** (last N lines for email export) or **dual-write** during transition.
|
||
|
||
### 6.2 SSE route (example contract)
|
||
|
||
- **`GET /api/projects/[projectId]/agent/sessions/[sessionId]/events/stream`**
|
||
- Auth: session cookie / same as GET session (user must own project).
|
||
- Query: `?afterSeq=123` for replay.
|
||
- Response: `text/event-stream`; each message: `data: {JSON}\n\n`.
|
||
- Heartbeat comments every ~15–30s to keep proxies alive.
|
||
|
||
For **jobs** (if not project-scoped): `GET /api/jobs/[jobId]/events/stream` with appropriate auth.
|
||
|
||
### 6.3 Ingest route (runner-only)
|
||
|
||
- **`POST /api/internal/agent-events`** (or nested under project/session as you prefer).
|
||
- Validates `x-agent-runner-secret`.
|
||
- Inserts rows with **server-generated `seq`** (transaction per run or advisory lock per `run_id`).
|
||
|
||
---
|
||
|
||
## 7. Frontend (product UI)
|
||
|
||
### 7.1 Agent tab — timeline
|
||
|
||
- **EventSource** (SSE) subscription when session is `running`; on load, **fetch historical** events (`GET …/events?afterSeq=0` or SSE from 0).
|
||
- **Timeline components**:
|
||
- Group by `llm.turn` / `tool.start`–`tool.end`.
|
||
- Expandable tool args (sanitized).
|
||
- Distinct styling for `safety.block` and `error`.
|
||
- **Reconnect**: on `EventSource` error, reopen with `lastSeq` from last received event.
|
||
|
||
### 7.2 Jobs / analyze flows
|
||
|
||
- Same timeline component keyed by `jobId` if you surface those runs in UI.
|
||
- Unifies mental model: “every run has a stream.”
|
||
|
||
### 7.3 Deprecate slow polling
|
||
|
||
- Reduce `GET …/agent/sessions/[id]` poll interval when SSE connected; keep **single poll** for `status` / `changed_files` if those stay on session row only, or **also** emit `file.changed` events and drive UI from stream + one final consistency read.
|
||
|
||
---
|
||
|
||
## 8. Security & privacy
|
||
|
||
- **Never** put tokens, env values, or full file contents in events by default; use **truncation** and **hashes** where needed.
|
||
- **`safety.block`**: log reason **code** + user-safe message; align with `security.ts` behavior.
|
||
- **Rate limits** on ingest endpoint (per `run_id` / per IP) to avoid abuse if misconfigured.
|
||
|
||
---
|
||
|
||
## 9. Environment variables
|
||
|
||
| Variable | Where | Purpose |
|
||
|----------|--------|---------|
|
||
| `AGENT_RUNNER_SECRET` | Runner + Next | Ingest / extended PATCH auth |
|
||
| `VIBN_API_URL` | Runner | Base URL for callbacks |
|
||
| `AGENT_RUNNER_URL` | Next | Start runs (unchanged) |
|
||
|
||
Add if needed:
|
||
|
||
| Variable | Purpose |
|
||
|----------|---------|
|
||
| `AGENT_EVENTS_INGEST_PATH` | Optional override for ingest URL |
|
||
| `SSE_MAX_BUFFER` | Cap replay batch size |
|
||
|
||
---
|
||
|
||
## 10. Phased roadmap (suggested)
|
||
|
||
### Phase 1 — Foundation
|
||
|
||
- [ ] Define `AgentEvent` TypeScript types in a **shared package** or duplicated minimal types in runner + frontend.
|
||
- [ ] Create `agent_run_events` (or equivalent) + migration.
|
||
- [ ] Implement **ingest** endpoint; wire **runner session path** to emit core events: `run.started`, `tool.start` / `tool.end`, `error`, `run.completed`, `file.changed`.
|
||
- [ ] **Dual-write**: keep existing `PATCH` `outputLine` so nothing breaks.
|
||
|
||
### Phase 2 — Push
|
||
|
||
- [ ] SSE route + **EventSource** in Agent tab.
|
||
- [ ] Backfill UI from DB on mount; then live tail.
|
||
- [ ] Lower or gate polling on `GET` session.
|
||
|
||
### Phase 3 — Jobs + durability
|
||
|
||
- [ ] Emit same events from **job** execution path; persist by `jobId`.
|
||
- [ ] Optional: replace in-memory job list with DB for **multi-instance** runner (later).
|
||
|
||
### Phase 4 — Rich semantics
|
||
|
||
- [ ] `safety.block` from policy layer.
|
||
- [ ] `deploy.*` events if Coolify integration is user-visible.
|
||
- [ ] **Multi-agent**: `handoff`, `child_job.*` with links in payload.
|
||
|
||
---
|
||
|
||
## 11. Success metrics
|
||
|
||
- Time-to-first-visible-step after **Run** < **1s** p95 (SSE).
|
||
- After hard refresh mid-run, user sees **consistent history** (no duplicate seq, no gaps if you guarantee at-least-once ingest with idempotency keys later).
|
||
- Support tickets / confusion drops on “what is the agent doing?” (qualitative).
|
||
|
||
---
|
||
|
||
## 12. Related code (repo anchors)
|
||
|
||
Use these when implementing:
|
||
|
||
- Runner session loop + PATCH bridge: `vibn-agent-runner/src/agent-session-runner.ts`
|
||
- Runner HTTP: `vibn-agent-runner/src/server.ts` (`/agent/execute`, `/agent/stop`, `/agent/approve`, `/api/agent/run`, `/api/jobs/:id`)
|
||
- In-memory jobs: `vibn-agent-runner/src/job-store.ts`
|
||
- Next session API + runner callback: `vibn-frontend/app/api/projects/[projectId]/agent/sessions/[sessionId]/route.ts`
|
||
- Session create + fire-and-forget execute: `vibn-frontend/app/api/projects/[projectId]/agent/sessions/route.ts`
|
||
|
||
---
|
||
|
||
## 13. Open decisions
|
||
|
||
1. **Single table** for sessions + jobs vs **two tables** (simpler queries vs flexibility).
|
||
2. **Seq generation**: DB sequence per `run_id` vs global monotonic with `(run_id, seq)` composite only in app logic.
|
||
3. **Idempotency**: runner retries may duplicate events—use **`event_id` UUID** from runner for dedupe on ingest.
|
||
4. **Orchestrator chat**: treat as v2 unless you need a **COO run** timeline immediately.
|
||
|
||
---
|
||
|
||
*Document version: 1.0 — aligned with discussion of runner ↔ frontend telemetry, SSE-first delivery, Postgres persistence, and future multi-agent event types.*
|