docs: heavily compress and simplify remaining reference files to represent current state
This commit is contained in:
@@ -1,292 +1,5 @@
|
||||
# Agent telemetry & live execution stream — project spec
|
||||
# Agent Telemetry Streaming (Historical)
|
||||
|
||||
This document captures **concrete product and engineering additions** discussed for Vibn: moving from **poll-based session updates** and **in-memory jobs** to a **durable, ordered, push-friendly execution timeline**—the web equivalent of a terminal agent’s clarity (step-by-step visibility, tool boundaries, failures, and later multi-agent signals).
|
||||
> **Note:** This historical spec covered the implementation of real-time streaming for the AI agent loop (Server-Sent Events) and timeline rendering.
|
||||
|
||||
---
|
||||
|
||||
## 1. Why this exists
|
||||
|
||||
### Current behavior (baseline)
|
||||
|
||||
| Surface | How progress reaches the user | Limits |
|
||||
|--------|------------------------------|--------|
|
||||
| **Agent sessions** (`agent_sessions`) | Runner `PATCH`es `output`, `status`, `changed_files` to Next; UI **polls** `GET …/agent/sessions/[id]`. | Latency, reconnect story, no single ordered stream; rich semantics encoded only in `text`. |
|
||||
| **Jobs** (`/api/agent/run`, `/api/jobs/:id`) | In-memory `job-store` (`progress`, `toolCalls[]`); UI polls job endpoint. | Lost on restart; not shared across runner replicas; not unified with session UI. |
|
||||
| **Orchestrator / Atlas chat** | Request/response to runner; advisor path may be remote URL. | No execution timeline for “long COO run” in-product unless you add the same event layer. |
|
||||
|
||||
### Product intent
|
||||
|
||||
- **Trust during long runs**: users see *what* happened, *when*, and *whether something was blocked*—not only a final status.
|
||||
- **Differentiation**: “Ink-like” clarity in the browser—structured steps, not a blob of logs.
|
||||
- **Foundation for multi-agent**: handoffs, child work, and safety events need a **common event pipe**, not ad-hoc strings.
|
||||
|
||||
---
|
||||
|
||||
## 2. Goals
|
||||
|
||||
1. **Append-only execution events** with **monotonic ordering** (per session or per job), suitable for replay after refresh.
|
||||
2. **Server-push to the client** (recommend **SSE** first; WebSocket if you need bi-directional on the same channel).
|
||||
3. **Persistence** so reconnect, refresh, and horizontal scaling do not lose history.
|
||||
4. **Single conceptual model** (`AgentEvent`) usable by:
|
||||
- Build → **Agent** tab (sessions),
|
||||
- **Job** flows (create/analyze-style),
|
||||
- optionally **orchestrator** long runs later.
|
||||
5. **Backward compatibility** during rollout: existing `PATCH` + `output` can remain as a fallback or be fed from the same emitter.
|
||||
|
||||
### Non-goals (for v1)
|
||||
|
||||
- Full **OpenTelemetry** export (optional later).
|
||||
- **Real-time collaborative** multi-user cursors on the same session.
|
||||
- Merging **claude-code-fork**—this spec is **API + UI + persistence** only.
|
||||
|
||||
---
|
||||
|
||||
## 3. Concept: `AgentEvent`
|
||||
|
||||
### Core shape (suggested)
|
||||
|
||||
```ts
|
||||
type AgentEvent = {
|
||||
seq: number; // monotonic per stream (session_id or job_id)
|
||||
ts: string; // ISO-8601
|
||||
runId: string; // session UUID or job id — ties events to a run
|
||||
runKind: 'session' | 'job';
|
||||
phase: 'queued' | 'running' | 'completed' | 'failed' | 'stopped';
|
||||
|
||||
type: AgentEventType;
|
||||
payload: Record<string, unknown>; // type-specific
|
||||
};
|
||||
|
||||
type AgentEventType =
|
||||
| 'run.started'
|
||||
| 'run.phase' // e.g. planning, executing, committing
|
||||
| 'llm.turn.start'
|
||||
| 'llm.turn.end'
|
||||
| 'tool.start'
|
||||
| 'tool.end'
|
||||
| 'tool.output' // chunked stdout/stderr if needed
|
||||
| 'safety.block' // policy / protected path / command denied
|
||||
| 'file.changed' // maps to today’s changed_files semantics
|
||||
| 'git.commit'
|
||||
| 'deploy.triggered'
|
||||
| 'deploy.status'
|
||||
| 'error'
|
||||
| 'run.completed'
|
||||
| 'handoff' // v2: parent → child agent
|
||||
| 'child_job.started' // v2: linked run id
|
||||
;
|
||||
```
|
||||
|
||||
### Mapping from today’s session `outputLine`
|
||||
|
||||
| Today (`outputLine.type`) | Suggested event(s) |
|
||||
|---------------------------|--------------------|
|
||||
| `step` / `info` | `run.phase` or `llm.turn.*` with summary in `payload.message` |
|
||||
| `stdout` / `stderr` | `tool.output` or dedicated stream events |
|
||||
| `error` | `error` + optional `safety.block` if policy-driven |
|
||||
| `done` | `run.completed` |
|
||||
|
||||
Keep **human-readable `message`** on events for UI defaults; add **structured fields** (`tool`, `argsSummary`, `durationMs`) for timeline rendering and filters.
|
||||
|
||||
---
|
||||
|
||||
## 4. Architecture (high level)
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph runner [vibn-agent-runner]
|
||||
RA[runSessionAgent / runAgent]
|
||||
EMIT[emitAgentEvent]
|
||||
end
|
||||
subgraph api [vibn-frontend Next.js]
|
||||
ING[POST internal ingest or PATCH extend]
|
||||
DB[(Postgres agent_events)]
|
||||
SSE[SSE GET /api/.../stream]
|
||||
end
|
||||
subgraph browser [Browser]
|
||||
UI[Timeline + live log]
|
||||
end
|
||||
RA --> EMIT
|
||||
EMIT -->|HTTPS + secret or mTLS| ING
|
||||
ING --> DB
|
||||
UI -->|EventSource| SSE
|
||||
SSE --> DB
|
||||
```
|
||||
|
||||
**Principles**
|
||||
|
||||
- **Runner remains stateless** regarding “truth”: it emits events; **Next + DB** are the source of truth for the UI (matches today’s session model).
|
||||
- Alternatively, runner could expose **SSE directly**—usually worse for **auth**, **CORS**, and **one domain** for the product. Prefer **Next as SSE endpoint** reading from DB.
|
||||
|
||||
---
|
||||
|
||||
## 5. Backend: `vibn-agent-runner`
|
||||
|
||||
### 5.1 Emit from execution paths
|
||||
|
||||
| Location | Action |
|
||||
|----------|--------|
|
||||
| `agent-session-runner.ts` | Replace or supplement `patchSession` output-only updates with **`emitAgentEvent`** each turn / tool / error. |
|
||||
| `runAgent` / tool loop (`executeTool`) | Same emitter for **job** runs. |
|
||||
| `server.ts` `/agent/execute` | Emit `run.started` after 202; `run.completed` / `error` on exit. |
|
||||
| Security / blocked tools (`security.ts` or equivalent) | Emit `safety.block` with reason code (no secrets in payload). |
|
||||
|
||||
### 5.2 Transport runner → Next
|
||||
|
||||
**Option A (recommended):** extend existing **PATCH** or add **`POST /api/internal/agent-events`** (or per-session batch append):
|
||||
|
||||
- Headers: `x-agent-runner-secret` (same as today’s PATCH).
|
||||
- Body: single event or small batch `{ events: AgentEvent[] }` with server-assigned `seq` to avoid races.
|
||||
|
||||
**Option B:** Runner writes to **Redis/Postgres** directly—couples runner to DB credentials; only do if you already run runner inside the same trust zone with DB URL.
|
||||
|
||||
### 5.3 Jobs store
|
||||
|
||||
- **Short term:** continue in-memory for job metadata; **persist events** to Postgres keyed by `jobId`.
|
||||
- **Medium term:** optional **Redis** for job status + pub/sub to Next for low-latency SSE fanout (only if DB polling becomes a bottleneck).
|
||||
|
||||
---
|
||||
|
||||
## 6. Backend: `vibn-frontend` (Next.js)
|
||||
|
||||
### 6.1 Persistence
|
||||
|
||||
**New table (example): `agent_run_events`**
|
||||
|
||||
| Column | Notes |
|
||||
|--------|--------|
|
||||
| `id` | UUID |
|
||||
| `run_id` | Session id or job id (text) |
|
||||
| `run_kind` | `'session' \| 'job'` |
|
||||
| `seq` | BIGSERIAL or per-run sequence enforced with unique constraint `(run_id, seq)` |
|
||||
| `project_id` | Nullable for jobs if not scoped |
|
||||
| `event` | JSONB — full `AgentEvent` or `{ type, ts, payload }` |
|
||||
| `created_at` | default now() |
|
||||
|
||||
Index: `(run_id, seq)` for range queries (`WHERE run_id = $1 AND seq > $lastSeen`).
|
||||
|
||||
**Optional:** migrate legacy `agent_sessions.output` to be **derived** (last N lines for email export) or **dual-write** during transition.
|
||||
|
||||
### 6.2 SSE route (example contract)
|
||||
|
||||
- **`GET /api/projects/[projectId]/agent/sessions/[sessionId]/events/stream`**
|
||||
- Auth: session cookie / same as GET session (user must own project).
|
||||
- Query: `?afterSeq=123` for replay.
|
||||
- Response: `text/event-stream`; each message: `data: {JSON}\n\n`.
|
||||
- Heartbeat comments every ~15–30s to keep proxies alive.
|
||||
|
||||
For **jobs** (if not project-scoped): `GET /api/jobs/[jobId]/events/stream` with appropriate auth.
|
||||
|
||||
### 6.3 Ingest route (runner-only)
|
||||
|
||||
- **`POST /api/internal/agent-events`** (or nested under project/session as you prefer).
|
||||
- Validates `x-agent-runner-secret`.
|
||||
- Inserts rows with **server-generated `seq`** (transaction per run or advisory lock per `run_id`).
|
||||
|
||||
---
|
||||
|
||||
## 7. Frontend (product UI)
|
||||
|
||||
### 7.1 Agent tab — timeline
|
||||
|
||||
- **EventSource** (SSE) subscription when session is `running`; on load, **fetch historical** events (`GET …/events?afterSeq=0` or SSE from 0).
|
||||
- **Timeline components**:
|
||||
- Group by `llm.turn` / `tool.start`–`tool.end`.
|
||||
- Expandable tool args (sanitized).
|
||||
- Distinct styling for `safety.block` and `error`.
|
||||
- **Reconnect**: on `EventSource` error, reopen with `lastSeq` from last received event.
|
||||
|
||||
### 7.2 Jobs / analyze flows
|
||||
|
||||
- Same timeline component keyed by `jobId` if you surface those runs in UI.
|
||||
- Unifies mental model: “every run has a stream.”
|
||||
|
||||
### 7.3 Deprecate slow polling
|
||||
|
||||
- Reduce `GET …/agent/sessions/[id]` poll interval when SSE connected; keep **single poll** for `status` / `changed_files` if those stay on session row only, or **also** emit `file.changed` events and drive UI from stream + one final consistency read.
|
||||
|
||||
---
|
||||
|
||||
## 8. Security & privacy
|
||||
|
||||
- **Never** put tokens, env values, or full file contents in events by default; use **truncation** and **hashes** where needed.
|
||||
- **`safety.block`**: log reason **code** + user-safe message; align with `security.ts` behavior.
|
||||
- **Rate limits** on ingest endpoint (per `run_id` / per IP) to avoid abuse if misconfigured.
|
||||
|
||||
---
|
||||
|
||||
## 9. Environment variables
|
||||
|
||||
| Variable | Where | Purpose |
|
||||
|----------|--------|---------|
|
||||
| `AGENT_RUNNER_SECRET` | Runner + Next | Ingest / extended PATCH auth |
|
||||
| `VIBN_API_URL` | Runner | Base URL for callbacks |
|
||||
| `AGENT_RUNNER_URL` | Next | Start runs (unchanged) |
|
||||
|
||||
Add if needed:
|
||||
|
||||
| Variable | Purpose |
|
||||
|----------|---------|
|
||||
| `AGENT_EVENTS_INGEST_PATH` | Optional override for ingest URL |
|
||||
| `SSE_MAX_BUFFER` | Cap replay batch size |
|
||||
|
||||
---
|
||||
|
||||
## 10. Phased roadmap (suggested)
|
||||
|
||||
### Phase 1 — Foundation
|
||||
|
||||
- [ ] Define `AgentEvent` TypeScript types in a **shared package** or duplicated minimal types in runner + frontend.
|
||||
- [ ] Create `agent_run_events` (or equivalent) + migration.
|
||||
- [ ] Implement **ingest** endpoint; wire **runner session path** to emit core events: `run.started`, `tool.start` / `tool.end`, `error`, `run.completed`, `file.changed`.
|
||||
- [ ] **Dual-write**: keep existing `PATCH` `outputLine` so nothing breaks.
|
||||
|
||||
### Phase 2 — Push
|
||||
|
||||
- [ ] SSE route + **EventSource** in Agent tab.
|
||||
- [ ] Backfill UI from DB on mount; then live tail.
|
||||
- [ ] Lower or gate polling on `GET` session.
|
||||
|
||||
### Phase 3 — Jobs + durability
|
||||
|
||||
- [ ] Emit same events from **job** execution path; persist by `jobId`.
|
||||
- [ ] Optional: replace in-memory job list with DB for **multi-instance** runner (later).
|
||||
|
||||
### Phase 4 — Rich semantics
|
||||
|
||||
- [ ] `safety.block` from policy layer.
|
||||
- [ ] `deploy.*` events if Coolify integration is user-visible.
|
||||
- [ ] **Multi-agent**: `handoff`, `child_job.*` with links in payload.
|
||||
|
||||
---
|
||||
|
||||
## 11. Success metrics
|
||||
|
||||
- Time-to-first-visible-step after **Run** < **1s** p95 (SSE).
|
||||
- After hard refresh mid-run, user sees **consistent history** (no duplicate seq, no gaps if you guarantee at-least-once ingest with idempotency keys later).
|
||||
- Support tickets / confusion drops on “what is the agent doing?” (qualitative).
|
||||
|
||||
---
|
||||
|
||||
## 12. Related code (repo anchors)
|
||||
|
||||
Use these when implementing:
|
||||
|
||||
- Runner session loop + PATCH bridge: `vibn-agent-runner/src/agent-session-runner.ts`
|
||||
- Runner HTTP: `vibn-agent-runner/src/server.ts` (`/agent/execute`, `/agent/stop`, `/agent/approve`, `/api/agent/run`, `/api/jobs/:id`)
|
||||
- In-memory jobs: `vibn-agent-runner/src/job-store.ts`
|
||||
- Next session API + runner callback: `vibn-frontend/app/api/projects/[projectId]/agent/sessions/[sessionId]/route.ts`
|
||||
- Session create + fire-and-forget execute: `vibn-frontend/app/api/projects/[projectId]/agent/sessions/route.ts`
|
||||
|
||||
---
|
||||
|
||||
## 13. Open decisions
|
||||
|
||||
1. **Single table** for sessions + jobs vs **two tables** (simpler queries vs flexibility).
|
||||
2. **Seq generation**: DB sequence per `run_id` vs global monotonic with `(run_id, seq)` composite only in app logic.
|
||||
3. **Idempotency**: runner retries may duplicate events—use **`event_id` UUID** from runner for dedupe on ingest.
|
||||
4. **Orchestrator chat**: treat as v2 unless you need a **COO run** timeline immediately.
|
||||
|
||||
---
|
||||
|
||||
*Document version: 1.0 — aligned with discussion of runner ↔ frontend telemetry, SSE-first delivery, Postgres persistence, and future multi-agent event types.*
|
||||
The streaming system is fully implemented in `app/api/chat/route.ts` and rendered in the frontend via `Timeline`, `ThinkingBubble`, and `TimelineToolGroup` components inside `chat-panel.tsx`.
|
||||
|
||||
@@ -1,673 +1,5 @@
|
||||
# Vibn AI Capability Roadmap
|
||||
# AI Capabilities Roadmap (Historical)
|
||||
|
||||
> **⚠ See also:** [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md)
|
||||
> — proposed pivot to a Claude-Code-style persistent dev container per
|
||||
> project. Once approved, that doc supersedes any "code authoring" item
|
||||
> in this roadmap; this file remains the source of truth for
|
||||
> infrastructure primitives (P5.x, P6.x, P7.x).
|
||||
>
|
||||
> The ordered plan for closing the gap between what the Vibn agent can do
|
||||
> today and what it needs to do for a real customer to ship, operate, and
|
||||
> scale a SaaS through it.
|
||||
>
|
||||
> **Companion to:** [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (current state).
|
||||
>
|
||||
> **Prioritization framing:**
|
||||
> 1. Does it unblock *shipping a real product* (not a demo)?
|
||||
> 2. Does it unblock *surviving past the first paying customer*?
|
||||
> 3. Does it only matter once usage scales?
|
||||
>
|
||||
> Tier 1 = (1). Tier 2 = (2). Tier 3 = (3). Tier 4 = revisit when demanded.
|
||||
>
|
||||
> **Sequencing rule:** complete Tier 1 before any Tier 2 item. The trap
|
||||
> is polishing safety rails (audit, scopes, quotas) before the product is
|
||||
> actually shippable.
|
||||
> **Note:** This is a historical roadmap document. Most of the core Path B capabilities (persistent dev containers, Gitea mirroring, Traefik wildcard proxies) have been successfully shipped.
|
||||
|
||||
---
|
||||
|
||||
## 0. Substrate & constraints
|
||||
|
||||
Vibn runs on a two-cloud substrate, constrained to Canadian data residency:
|
||||
|
||||
| Layer | Provider | Region | Purpose |
|
||||
|---|---|---|---|
|
||||
| **App hosting** | Coolify (self-managed) | Montreal VPS | All app / database / auth containers. Current state. |
|
||||
| **Managed services** | **Google Cloud** | `northamerica-northeast1` (Montreal) | Object storage, cron, queues, logs, backups, monitoring, secrets. |
|
||||
| **Domain registration** | OpenSRS (Tucows) | Toronto | Wholesale domain API. Canadian company, pre-funded float account. |
|
||||
| **Authoritative DNS** | Cloud DNS (default) / CIRA D-Zone (strict) | Global anycast / Canadian | Managed DNS for workspace-owned domains. |
|
||||
| **Transactional email** | Amazon SES | `ca-central-1` (Montreal) | No GCP equivalent; AWS's Canadian region keeps data in-country. |
|
||||
|
||||
**Absolute rule: no customer data leaves Canada.** Every workspace-owned
|
||||
resource (storage bucket, database, log bucket, task queue, scheduler
|
||||
job, email message body) must be pinned to a Canadian region.
|
||||
|
||||
### Why mix clouds?
|
||||
- **Coolify stays** because we already built the workspace-scoped
|
||||
provisioning around it (Phase 4). Migrating apps to Cloud Run is a
|
||||
rewrite we don't need.
|
||||
- **GCP-CA** fills every managed-service gap Coolify has. Cheaper and
|
||||
more reliable than self-hosting MinIO/Loki/scheduler.
|
||||
- **AWS SES for email** because GCP has no first-party transactional
|
||||
email service and SES `ca-central-1` is the only credible
|
||||
Canadian-resident managed option.
|
||||
- **OpenSRS for domains** because it's the wholesale API behind most
|
||||
Canadian registrars, and we already have the deposit.
|
||||
|
||||
### Compliance upgrade path (Tier 4 territory)
|
||||
For regulated customers (healthcare, financial, public sector):
|
||||
- **Assured Workloads for Canada** on GCP — enforces Canadian personnel
|
||||
access + data residency contractually.
|
||||
- **CIRA D-Zone** instead of Cloud DNS — first-party Canadian managed DNS.
|
||||
- Keep the SES and OpenSRS pieces as-is (already Canadian-resident).
|
||||
|
||||
Document the caveat on a public trust page. Build the Assured-Workloads
|
||||
variant when a real customer asks.
|
||||
|
||||
---
|
||||
|
||||
## Current state (Phase 4 + P5.1 verified, Apr 2026)
|
||||
|
||||
- Workspace tenancy: Gitea org + Coolify project + SSH deploy key per
|
||||
workspace.
|
||||
- Agent can: create repos, create apps, provision 8 database flavors,
|
||||
deploy 8 vetted auth providers, manage env vars, deploy + poll,
|
||||
update, delete (with `?confirm=<name>`), set domains under
|
||||
`*.{slug}.vibnai.com`.
|
||||
- Control-plane MCP: 24 tools + full REST surface at `/api/mcp`.
|
||||
API-key scoped per workspace.
|
||||
- **P5.1 custom apex domains** — OpenSRS + Cloud DNS + Coolify
|
||||
lifecycle (search / register / attach / inspect) shipped and
|
||||
verified end-to-end against PROD GCP + OpenSRS sandbox + PROD
|
||||
Coolify on `v4.0.0-beta.473` (2026-04-22). All 5 sub-systems green
|
||||
in `smoke-attach-e2e.ts`: register → zone → A records → registrar
|
||||
NS update → Coolify `fqdn` patch → cleanup. Required a server-side
|
||||
config fix on `coolify-server-mtl` (proxy.type=TRAEFIK,
|
||||
is_build_server=false) so `Server::isProxyShouldRun()` returns
|
||||
true and the controller maps `domains` → `fqdn` — see
|
||||
[`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) § 3.6 for the gory details.
|
||||
- **Agent-runner stdio MCP bridge** — `vibn-agent-runner` now exposes
|
||||
its full in-house toolkit (28 tools) outward over 5 stdio MCP
|
||||
servers so external clients (Cursor, Claude Desktop, Goose) can
|
||||
drive the same Coolify / Gitea / workspace / memory / search /
|
||||
sub-agent surface as the internal Coder/PM/Marketing agents, with
|
||||
shared protected-repo + protected-app guardrails. Every tool now
|
||||
has a pure `*-api.ts` module, a registry wrapper for the in-process
|
||||
loop, and an MCP server wrapper — single source of truth, verified
|
||||
by `scripts/smoke-mcp.js`.
|
||||
- Enforced: tenant isolation, domain policy, delete confirms,
|
||||
secrets-at-rest encryption, protected-repo / protected-app guards.
|
||||
|
||||
See [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (§ 3.6 for P5.1,
|
||||
§ 3.7 for the stdio MCP bridge) for the complete current surface.
|
||||
|
||||
---
|
||||
|
||||
## Tier 1 — Blocks shipping a real product
|
||||
|
||||
Without these, anything the agent builds is *demo-shaped*. Ship these
|
||||
next, in the recommended sequence below.
|
||||
|
||||
### P5.1 · Custom apex domains via OpenSRS
|
||||
|
||||
**Goal:** agent buys `mysaas.com` on the user's behalf and attaches it
|
||||
to a Coolify app with automatic TLS.
|
||||
|
||||
**Why now:** you already opened an OpenSRS reseller account with a $100
|
||||
float. Unlocks real branding, DKIM for email (P5.2 depends on this),
|
||||
and gives you a revenue line (markup on domains).
|
||||
|
||||
**Surface:**
|
||||
|
||||
| Tool / endpoint | Purpose |
|
||||
|---|---|
|
||||
| `domains.search` | Live availability + suggestions via OpenSRS `lookup`. |
|
||||
| `domains.check_price` | Per-TLD price from OpenSRS + markup. |
|
||||
| `domains.register` | Debits workspace float, registers via OpenSRS. |
|
||||
| `domains.list` | Workspace's owned domains. |
|
||||
| `domains.renew` / `domains.transfer` | Lifecycle. |
|
||||
| `domains.{name}.attach` | Attach to a Coolify app: DNS records + Coolify `fqdn` + Let's Encrypt. |
|
||||
| `domains.{name}.detach` | Free a domain from an app, keep registration. |
|
||||
| `domains.{name}.attach_status` | Polls DNS propagation + cert issuance (async). |
|
||||
|
||||
**Infra:**
|
||||
- **OpenSRS client** (their XML/SOAP or REST API).
|
||||
- **Cloud DNS** for zone management (default). CIRA D-Zone available as a
|
||||
workspace-level preference for strict-residency customers.
|
||||
- **Workspace float ledger** (`vibn_workspace_billing_float`) — a
|
||||
prepaid balance in CAD, debited on register/renew. Reconciled nightly
|
||||
against the OpenSRS master deposit.
|
||||
- `VIBN_OPENSRS_DEPOSIT_ACCOUNT` as the master float handle.
|
||||
|
||||
**New columns** on `vibn_workspaces`:
|
||||
- `preferred_dns_provider TEXT DEFAULT 'cloud_dns'`
|
||||
- `cloud_dns_zone_name TEXT` ← GCP managed zone for this workspace.
|
||||
|
||||
**Risks:**
|
||||
- DNS propagation is human-scale (minutes–hours). Agents need the
|
||||
async `attach_status` polling loop, not a sync call.
|
||||
- Cert issuance via Let's Encrypt is rate-limited (50/week per domain).
|
||||
Abuse-prevent with per-workspace rate caps.
|
||||
|
||||
**Estimate:** **2 weeks.**
|
||||
|
||||
---
|
||||
|
||||
### P5.2 · Transactional email (AWS SES `ca-central-1`)
|
||||
|
||||
**Goal:** auth providers can send password-reset emails; agents can
|
||||
`email.send` from `noreply@mysaas.com`.
|
||||
|
||||
**Why now:** every auth provider on the allowlist is broken without
|
||||
SMTP. Also pairs with P5.1 — per-workspace sender domains need DKIM on
|
||||
domains you own.
|
||||
|
||||
**Why SES ca-central-1 specifically:** GCP has no first-party
|
||||
transactional email service. All mainstream providers (Postmark,
|
||||
Resend, Mailgun, SendGrid) are US-primary. SES's Montreal region is the
|
||||
only credible managed option that keeps message bodies in Canada.
|
||||
|
||||
**Two-phase rollout:**
|
||||
|
||||
**Phase A — shared-sender MVP (1 week):**
|
||||
- One SES-verified sender domain `mail.vibnai.com`.
|
||||
- Every workspace can send from `noreply@mail.vibnai.com` out of the box.
|
||||
- `email.send` tool + injected `SMTP_*` env vars.
|
||||
- Bounce / complaint webhooks routed via SNS → a Cloud Run service
|
||||
that writes per-workspace notifications.
|
||||
|
||||
**Phase B — per-workspace sender domains (1 week, depends on P5.1):**
|
||||
- `email.verify_sender_domain` creates the SPF/DKIM/DMARC records via
|
||||
the Cloud DNS / CIRA D-Zone client on a workspace-owned domain.
|
||||
- Polls SES verification; flips `verified=true` when done.
|
||||
- Workspace can now `email.send from: founder@mysaas.com`.
|
||||
|
||||
**Surface:**
|
||||
|
||||
| Tool | Purpose |
|
||||
|---|---|
|
||||
| `email.send` | Single message; returns SES `message_id`. |
|
||||
| `email.send_batch` | Up to 100 at a time. |
|
||||
| `email.list_messages` | Recent sent mail + delivery state (from SES + our log). |
|
||||
| `email.verify_sender_domain` | Kick off DKIM for a workspace-owned domain. |
|
||||
| `email.sender_status` | Poll verification state. |
|
||||
| `email.webhooks.list` | Recent bounces/complaints. |
|
||||
|
||||
**Infra:**
|
||||
- SES identity per workspace-owned sender domain.
|
||||
- SNS topic → Cloud Run webhook receiver (in `northamerica-northeast1`)
|
||||
for bounce/complaint ingestion.
|
||||
- Rate limits: start in SES sandbox (200/day), request production limits
|
||||
after first real customer.
|
||||
|
||||
**Estimate:** **2 weeks total** (1 week Phase A + 1 week Phase B).
|
||||
|
||||
---
|
||||
|
||||
### P5.3 · Object storage (Google Cloud Storage, `northamerica-northeast1`)
|
||||
|
||||
**Goal:** any SaaS the agent builds can take user uploads — avatars,
|
||||
attachments, exports, images — without the user pasting in third-party
|
||||
credentials.
|
||||
|
||||
**Why now:** "can users upload a file?" is the #1 post-demo question.
|
||||
Blocks ~half of realistic SaaS ideas.
|
||||
|
||||
**GCP collapses this item.** No MinIO container to babysit; GCS provides
|
||||
managed bucket + signed URLs + lifecycle policies + encryption out of
|
||||
the box.
|
||||
|
||||
**Surface:**
|
||||
|
||||
| Tool | Purpose |
|
||||
|---|---|
|
||||
| `storage.buckets.list` | Buckets in this workspace (filtered by `workspace={slug}` label). |
|
||||
| `storage.buckets.create` | New bucket. Optional `public_read`. Enforced region: `northamerica-northeast1`. |
|
||||
| `storage.buckets.delete` | Destroy bucket. `confirm` gate. |
|
||||
| `storage.presign_upload` | PUT URL, TTL, content-type constraint. |
|
||||
| `storage.presign_download` | GET URL, TTL. |
|
||||
| `storage.list_objects` | Pagination + prefix filter. |
|
||||
| `storage.delete_object` | Single object. |
|
||||
| `storage.set_lifecycle` | TTL delete, multipart cleanup, archive tiering. |
|
||||
|
||||
**Provisioning additions:**
|
||||
- Default bucket `vibn-ws-{slug}` created on workspace provision.
|
||||
- Uniform bucket-level access enabled by default.
|
||||
- Per-workspace GCP service account `vibn-ws-{slug}@...`, scoped to its
|
||||
own bucket via `roles/storage.objectAdmin`.
|
||||
- Keyfile stored encrypted (AES-256-GCM, same `VIBN_SECRETS_KEY`) in
|
||||
`vibn_workspaces.gcp_service_account_key_encrypted`.
|
||||
|
||||
**New columns** on `vibn_workspaces`:
|
||||
- `gcs_bucket_name TEXT`
|
||||
- `gcp_service_account_email TEXT`
|
||||
- `gcp_service_account_key_encrypted BYTEA`
|
||||
|
||||
**Env injection:**
|
||||
- `STORAGE_ENDPOINT=https://storage.googleapis.com`
|
||||
- `STORAGE_BUCKET={workspace-bucket-name}`
|
||||
- `STORAGE_ACCESS_KEY`, `STORAGE_SECRET_KEY` (S3-compatible via GCS HMAC keys)
|
||||
— auto-injected on app creation so agent code uses standard S3 SDKs.
|
||||
|
||||
**Estimate:** **3 days.**
|
||||
|
||||
---
|
||||
|
||||
### P5.4 · Workers, cron, and queues (Cloud Tasks + Cloud Scheduler + Cloud Run Jobs)
|
||||
|
||||
**Goal:** agents can declare async workers, scheduled jobs, and queued
|
||||
tasks. Anything that isn't a single `ports: 3000` web container.
|
||||
|
||||
**Why now:** webhooks, retries, nightly cleanup, image processing,
|
||||
email sending — every real SaaS needs a non-web process. Current
|
||||
workaround (second Coolify app) is brittle and manual.
|
||||
|
||||
**Hybrid approach — Coolify for compute, GCP for orchestration:**
|
||||
|
||||
Option evaluated and chosen:
|
||||
- **Cloud Scheduler** (`northamerica-northeast1`) for cron: fires
|
||||
HTTP webhooks into the app at the scheduled time.
|
||||
- **Cloud Tasks** (`northamerica-northeast1`) for queue: agent code
|
||||
calls `enqueue(task)`, Cloud Tasks dispatches to the app's worker
|
||||
endpoint with retries, backoff, and at-least-once semantics.
|
||||
- **Worker process** stays on Coolify as a second app-per-repo with a
|
||||
different start command, exposed on an internal URL.
|
||||
|
||||
Rejected alternative: migrate everything to Cloud Run Jobs. More managed
|
||||
but splits the "Live" view across two deploy targets and changes the
|
||||
agent's mental model. Not worth it for MVP.
|
||||
|
||||
**Shape — extend `apps.create`:**
|
||||
|
||||
```json
|
||||
{
|
||||
"repo": "my-site",
|
||||
"services": {
|
||||
"web": { "command": "npm start", "ports": "3000" },
|
||||
"worker": { "command": "npm run worker", "replicas": 2 }
|
||||
},
|
||||
"cron": [
|
||||
{ "name": "nightly-backup", "schedule": "0 3 * * *", "path": "/tasks/backup" },
|
||||
{ "name": "sync", "schedule": "*/10 * * * *", "path": "/tasks/sync" }
|
||||
],
|
||||
"queues": [
|
||||
{ "name": "emails" },
|
||||
{ "name": "image-processing" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Internally creates: two Coolify apps (web + worker), N Cloud Scheduler
|
||||
jobs labeled `workspace={slug}`, N Cloud Tasks queues.
|
||||
|
||||
**Surface additions:**
|
||||
|
||||
| Tool | Purpose |
|
||||
|---|---|
|
||||
| `apps.services.list` | All processes in an app. |
|
||||
| `apps.services.update` | Scale replicas, change command. |
|
||||
| `apps.services.logs` | Per-process logs. |
|
||||
| `cron.list` | Scheduler jobs in this workspace. |
|
||||
| `cron.create` / `cron.update` / `cron.delete` | Manage scheduled jobs. |
|
||||
| `cron.run_now` | Fire a scheduled job immediately (useful for agent testing). |
|
||||
| `queues.list` | Cloud Tasks queues in this workspace. |
|
||||
| `queues.create` / `queues.delete` | Manage queues. |
|
||||
| `queues.enqueue` | (Normally called from app code, but exposed for agent-driven testing.) |
|
||||
| `queues.pause` / `queues.resume` | Emergency ops. |
|
||||
|
||||
**New columns** on `vibn_workspaces`:
|
||||
- `cloud_scheduler_location TEXT DEFAULT 'northamerica-northeast1'`
|
||||
- `cloud_tasks_location TEXT DEFAULT 'northamerica-northeast1'`
|
||||
|
||||
**Auth to GCP:** per-workspace service account (provisioned in P5.3) is
|
||||
extended with `roles/cloudscheduler.admin` and `roles/cloudtasks.admin`
|
||||
*scoped to resources labeled `workspace={slug}`* via IAM conditions.
|
||||
Agents can only act on their own workspace's jobs/queues.
|
||||
|
||||
**Estimate:** **1 week.**
|
||||
|
||||
---
|
||||
|
||||
### Tier 1 total: ~5 weeks of focused work
|
||||
|
||||
After Tier 1 lands, an agent can:
|
||||
- Buy `mysaas.com`, point it at a Next.js app.
|
||||
- Deploy Authentik with working password-reset emails from `noreply@mysaas.com`.
|
||||
- Offer user uploads (avatars, attachments).
|
||||
- Run `0 3 * * *` nightly cleanup cron.
|
||||
- Process Stripe webhooks idempotently via a retry queue.
|
||||
|
||||
That's a shippable SaaS. Everything after this is about *keeping* it
|
||||
shipped.
|
||||
|
||||
---
|
||||
|
||||
## Tier 2 — Blocks surviving past the first real customer
|
||||
|
||||
Once users exist, these prevent silent failures.
|
||||
|
||||
### P6.1 · Database backups + restore (GCS + wal-g)
|
||||
|
||||
**Goal:** nightly backups, on-demand backups, one-call restore. No
|
||||
"agent ran `DROP TABLE` in a migration" permanent data loss.
|
||||
|
||||
**Why:** scariest item on this list. Failure mode is irrecoverable.
|
||||
|
||||
**Shape:**
|
||||
- `databases.{uuid}.backup` — on-demand `pg_dump` / `mongodump` to the
|
||||
workspace's GCS bucket (depends on P5.3).
|
||||
- `databases.{uuid}.backups.list` — lists backups with timestamp + size.
|
||||
- `databases.{uuid}.backups.restore` — `confirm`-gated restore from a
|
||||
specific backup uuid.
|
||||
- Per-database backup policy: daily / hourly / off, retention days.
|
||||
- Default: every AI-created database gets daily backups + 7-day
|
||||
retention on.
|
||||
|
||||
**Infra:**
|
||||
- Cron jobs run via P5.4's Cloud Scheduler primitive.
|
||||
- Stored at `gs://vibn-ws-{slug}/backups/{db-uuid}/{iso-timestamp}.sql.gz`.
|
||||
- Lifecycle rules auto-delete backups older than retention.
|
||||
- Object-level retention lock available for "immutable backups" on
|
||||
request (Tier 3 feature).
|
||||
|
||||
**Upgrade path:**
|
||||
- **Postgres point-in-time recovery** via `wal-g` shipping WAL segments
|
||||
to the same GCS bucket. Adds RPO < 5 min.
|
||||
- **ClickHouse**: `clickhouse-backup` to GCS.
|
||||
- **MongoDB**: `mongodump` incremental.
|
||||
|
||||
**Estimate:** **3 days** for MVP (pg_dump + schedule + restore).
|
||||
**+1 week** for wal-g PITR if/when a customer asks.
|
||||
|
||||
---
|
||||
|
||||
### P6.2 · Runtime log streaming (Cloud Logging)
|
||||
|
||||
**Goal:** agent can see "is the app erroring at 10 req/s right now?",
|
||||
not just "did the build succeed."
|
||||
|
||||
**Why:** today deploy logs are surfaced but container stdout/stderr is
|
||||
not. An agent that "fixed a bug" can't verify the fix without a human
|
||||
SSH-ing into Coolify.
|
||||
|
||||
**GCP collapses this item** — ship container logs to Cloud Logging with
|
||||
a workspace label, query via the logs API.
|
||||
|
||||
**Shape:**
|
||||
- Fluent-bit sidecar (or Coolify label) ships container stdout/stderr
|
||||
to Cloud Logging in `northamerica-northeast1` with labels
|
||||
`workspace={slug}`, `app={app-uuid}`, `service={web|worker|...}`.
|
||||
- Per-workspace log bucket for retention isolation.
|
||||
|
||||
**Surface:**
|
||||
|
||||
| Tool | Purpose |
|
||||
|---|---|
|
||||
| `apps.logs` | Last N lines across replicas. Filter by timestamp, severity. |
|
||||
| `apps.logs.tail` | SSE stream of new log lines. |
|
||||
| `apps.logs.search` | Thin wrapper on Cloud Logging's query API — grep, severity filter, time window. |
|
||||
| `apps.services.logs` | Same, scoped to a single service. |
|
||||
|
||||
**Retention:** default 30 days in the workspace log bucket; exportable
|
||||
to the workspace's GCS bucket on request for long-term storage.
|
||||
|
||||
**Estimate:** **3 days** (fluent-bit config + thin API wrapper).
|
||||
|
||||
---
|
||||
|
||||
### P6.3 · Scoped API keys
|
||||
|
||||
**Goal:** invite a CI bot or teammate without giving root on the
|
||||
workspace.
|
||||
|
||||
**Why:** solo-builder flow survives without it. Breaks the moment a
|
||||
second principal enters.
|
||||
|
||||
**Shape:**
|
||||
- Keys gain `scopes: string[]` and optional `expires_at`.
|
||||
- Scope tokens: `apps:read`, `apps:write`, `apps:delete`,
|
||||
`databases:*`, `auth:*`, `domains:read`, `domains:write`,
|
||||
`storage:*`, `email:send`, `cron:*`, `queues:*`, `deploy:*`.
|
||||
- Per-scope rate limits optional (Tier 3; API shape supports it from
|
||||
day one).
|
||||
|
||||
**Surface changes:**
|
||||
|
||||
| Tool | Change |
|
||||
|---|---|
|
||||
| `keys.create` | Accepts `scopes`, `expires_at`. |
|
||||
| `keys.list` | Returns scopes per key. |
|
||||
| `keys.rotate` | Mints new token, preserves scope set. |
|
||||
|
||||
Every MCP/REST handler gets a scope requirement checked in the
|
||||
principal resolver.
|
||||
|
||||
**Estimate:** **1 week.**
|
||||
|
||||
---
|
||||
|
||||
### Tier 2 total: ~2 weeks
|
||||
|
||||
After Tier 2 lands, a SaaS shipped on Vibn can survive without you
|
||||
dropping into a psql REPL at 3am.
|
||||
|
||||
---
|
||||
|
||||
## Tier 3 — Matters once usage scales
|
||||
|
||||
Don't build these until at least one real customer is hitting them.
|
||||
Building them pre-market is the classic infra-overinvestment trap.
|
||||
|
||||
### P7.1 · Per-workspace quotas + cost caps
|
||||
Max apps, max dbs, max GCS GB, max egress, max SES messages/month, max
|
||||
OpenSRS spend/month. Per-plan configurable. Hallucinating agents can't
|
||||
OOM the cluster or burn your SES reputation.
|
||||
|
||||
### P7.2 · Audit log
|
||||
Append-only per-workspace log of (principal, action, params, timestamp,
|
||||
result). Cloud Logging with a dedicated `audit-logs` log-bucket, 400-day
|
||||
retention. Read API for the settings panel. Needed for any
|
||||
SOC-2-adjacent buyer.
|
||||
|
||||
### P7.3 · Preview-per-PR environments
|
||||
Open a PR → `pr-42.mark.vibnai.com` deploys automatically with a
|
||||
throw-away database. Teardown on PR close/merge. Unblocks multi-agent
|
||||
flows.
|
||||
|
||||
### P7.4 · Atomic multi-resource operations (`stacks`)
|
||||
`POST /stacks` takes a full app + db + auth + domain + cron spec;
|
||||
creates atomically, rolls back on failure. Agent ergonomics win once
|
||||
demo flow is routine.
|
||||
|
||||
### P7.5 · Billing integration
|
||||
Stripe subscriptions for Vibn itself (workspace billing), plus
|
||||
per-workspace float top-ups, plus reconciliation to the OpenSRS master
|
||||
deposit and GCP / SES cost allocation. Only needed when you charge
|
||||
real dollars.
|
||||
|
||||
### P7.6 · Assured Workloads for Canada
|
||||
GCP policy-enforced Canadian residency + Canadian personnel access.
|
||||
For regulated customers (healthcare, financial, public sector). Priced
|
||||
accordingly; ship only when a real customer needs it.
|
||||
|
||||
### P7.7 · CIRA D-Zone as a workspace DNS option
|
||||
Swap Cloud DNS → CIRA D-Zone for a workspace with strict residency
|
||||
requirements. API-compatible wrapper so nothing agent-facing changes.
|
||||
|
||||
---
|
||||
|
||||
## Tier 4 — Revisit when demanded
|
||||
|
||||
Items to explicitly *not* build until a concrete customer asks.
|
||||
|
||||
- **Multi-region** — single-region Canada is fine for B2B SaaS makers
|
||||
(our early market).
|
||||
- **Cloud Run migration** — would rewrite most of Coolify-based
|
||||
capabilities. Revisit if/when Coolify becomes a bottleneck.
|
||||
- **Managed search / vector DB as first-class types** — agents can
|
||||
deploy Meilisearch / Typesense / pgvector-Postgres as regular services.
|
||||
- **mTLS / custom CAs / BYO-cert upload** — enterprise creep.
|
||||
- **MCP protocol polish** (streaming, resources, prompts, per-tool
|
||||
schemas) — current JSON-over-HTTP works. Revisit on real friction.
|
||||
- **Per-app basic auth, IP allowlists, WAF** — Traefik middleware
|
||||
manually until someone asks.
|
||||
|
||||
---
|
||||
|
||||
## Roadmap at a glance
|
||||
|
||||
| Phase | Items | Est. | Unblocks |
|
||||
|---|---|---|---|
|
||||
| **P5 — Real SaaS primitives** | Domains, email, storage, workers/cron/queues | ~5 wk | Shipping a real product |
|
||||
| **P6 — Keep-it-running** | Backups, runtime logs, scoped keys | ~2 wk | First real customer survives |
|
||||
| **P7 — Scale** | Quotas, audit, previews, stacks, billing, Assured Workloads, D-Zone | demand-driven | Platform grows past 1st cohort |
|
||||
| **P8+** | Tier 4 items | never, unless pulled by customer | — |
|
||||
|
||||
**Total to "agent ships a SaaS a founder would pay $29/mo for":**
|
||||
P5 + P6 = **~7 weeks** (was ~11 before GCP-CA; ~40% compression from
|
||||
managed-service leverage).
|
||||
|
||||
---
|
||||
|
||||
## Dependency graph
|
||||
|
||||
```
|
||||
P5.1 Domains ──┬──→ P5.2 Email Phase B (per-domain DKIM)
|
||||
├──→ P7.7 CIRA D-Zone swap
|
||||
└──→ (future: customer-owned sub-domain routing)
|
||||
|
||||
P5.3 Storage ──┬──→ P6.1 Database backups (backups need a bucket)
|
||||
└──→ P7.2 Audit log export
|
||||
|
||||
P5.4 Workers/cron/queues ──┬──→ P6.1 Database backups (run via scheduler)
|
||||
└──→ most real SaaS patterns
|
||||
|
||||
P6.2 Runtime logs — independent, can land anytime
|
||||
P6.3 Scoped keys — independent, can land anytime
|
||||
P7.6 Assured Workloads — wraps everything; build once demanded
|
||||
```
|
||||
|
||||
**Parallelizable (three people):**
|
||||
- Track A: P5.1 → P5.2
|
||||
- Track B: P5.3 → P6.1
|
||||
- Track C: P5.4 → P6.2
|
||||
|
||||
Track C finishes earliest; use that slack to land P6.3.
|
||||
|
||||
---
|
||||
|
||||
## Per-workspace GCP provisioning (shared across P5.3, P5.4, P6.1, P6.2)
|
||||
|
||||
`ensureWorkspaceProvisioned()` gains a GCP-CA block that runs once per
|
||||
workspace, idempotently. All resources are created in
|
||||
`northamerica-northeast1`.
|
||||
|
||||
| Resource | Name pattern | Notes |
|
||||
|---|---|---|
|
||||
| GCS bucket | `vibn-ws-{slug}` | Uniform bucket-level access. Lifecycle policies off by default. |
|
||||
| Cloud DNS managed zone | `vibn-ws-{slug}-zone` | Created per workspace-owned domain in P5.1, not on workspace provision. |
|
||||
| Cloud Logging log bucket | `vibn-ws-{slug}-logs` | 30-day retention default. |
|
||||
| Cloud Tasks location | `northamerica-northeast1` | Queues created per-app in P5.4, not here. |
|
||||
| GCP service account | `vibn-ws-{slug}@{project}.iam` | Single SA per workspace, narrow roles. |
|
||||
| Service account key | stored encrypted in `vibn_workspaces` | AES-256-GCM, same `VIBN_SECRETS_KEY`. |
|
||||
|
||||
**New columns** on `vibn_workspaces` (cumulative across P5.1-P6.2):
|
||||
|
||||
```sql
|
||||
-- P5.1
|
||||
preferred_dns_provider TEXT DEFAULT 'cloud_dns',
|
||||
cloud_dns_zone_name TEXT,
|
||||
|
||||
-- P5.3
|
||||
gcs_bucket_name TEXT,
|
||||
gcp_service_account_email TEXT,
|
||||
gcp_service_account_key_encrypted BYTEA,
|
||||
|
||||
-- P5.4
|
||||
cloud_scheduler_location TEXT DEFAULT 'northamerica-northeast1',
|
||||
cloud_tasks_location TEXT DEFAULT 'northamerica-northeast1',
|
||||
|
||||
-- P6.2
|
||||
cloud_logging_bucket_name TEXT
|
||||
```
|
||||
|
||||
Three migration steps, one per phase. All guarded by the existing
|
||||
admin-gated `POST /api/admin/migrate` endpoint.
|
||||
|
||||
---
|
||||
|
||||
## Non-goals (stated explicitly so they don't creep in)
|
||||
|
||||
- **A general-purpose PaaS.** Vibn is an agent-driven SaaS builder, not
|
||||
a Heroku / Fly clone. Every capability must answer "what does an agent
|
||||
need to build a SaaS?" — not "what does a dev need to deploy a
|
||||
container?"
|
||||
- **Support for non-allowlisted auth providers, databases, services.**
|
||||
The curated surface is the feature. "Any Coolify service" would blow
|
||||
up the tenant-safety model and dilute agent decision-making.
|
||||
- **A consumer-facing OpenSRS UI.** OpenSRS is plumbing for the agent.
|
||||
Humans should never see an OpenSRS checkout screen — only
|
||||
`domains.register { name: "mysaas.com" }` from the agent.
|
||||
- **Multi-cloud abstraction layer.** One Coolify cluster + GCP-CA +
|
||||
SES-CA + OpenSRS is the contract. If customers want to bring their
|
||||
own, that's Tier 4.
|
||||
- **Anything that moves customer data out of Canada.** Even for
|
||||
performance. If a managed service only has US regions, we self-host
|
||||
in Canada or we don't offer it.
|
||||
|
||||
---
|
||||
|
||||
## Recommended execution order (opinionated)
|
||||
|
||||
Given dependencies and quick-wins-first philosophy:
|
||||
|
||||
**Week 1:**
|
||||
- P5.3 Storage (GCS wrap, 3 days) → proves the GCP-CA provisioning pattern.
|
||||
- P5.4 Workers/cron/queues (starts in parallel; depends on P5.3 only for
|
||||
the service account).
|
||||
|
||||
**Week 2:**
|
||||
- P5.4 completes.
|
||||
- P5.1 Domains starts (OpenSRS client + Cloud DNS wrapper).
|
||||
|
||||
**Week 3:**
|
||||
- P5.1 completes.
|
||||
- P5.2 Email Phase A (shared-sender MVP) starts.
|
||||
|
||||
**Week 4:**
|
||||
- P5.2 Phase A completes.
|
||||
- P5.2 Phase B (per-domain DKIM) starts, now that P5.1 is available.
|
||||
|
||||
**Week 5:**
|
||||
- P5.2 Phase B completes. **P5 / Tier 1 done.**
|
||||
- P6.1 Database backups starts (3 days).
|
||||
- P6.2 Runtime logs starts in parallel (3 days).
|
||||
|
||||
**Week 6:**
|
||||
- P6.3 Scoped keys (1 week).
|
||||
|
||||
**Week 7:**
|
||||
- Slack week — hardening, docs (`AI_CAPABILITIES.md` refresh), first
|
||||
real customer onboarding.
|
||||
|
||||
**End state at week 7:** agent can take a founder from "I have an idea"
|
||||
to "I have `mysaas.com` live, with auth, with user uploads, with email,
|
||||
with backups, with visible error logs, and a CI bot can deploy it
|
||||
without root access."
|
||||
|
||||
That's the Vibn product.
|
||||
|
||||
---
|
||||
|
||||
## How to use this doc
|
||||
|
||||
- When someone proposes a feature, find its tier. If it's Tier 3 or 4
|
||||
and we're still shipping Tier 1, say no.
|
||||
- Before starting a Tier 1 item, re-read its section and make sure
|
||||
prerequisites shipped. Email-per-domain before domains is wasted code.
|
||||
- [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) is the canonical
|
||||
reference of *what exists today*. This doc is the canonical reference
|
||||
of *what comes next*. When an item ships, move it from here to that
|
||||
doc and delete its section here.
|
||||
- When a user request implies Canadian residency (they say "PIPEDA",
|
||||
"healthcare", "public sector", or "our data can't leave Canada"), pin
|
||||
the answer to this doc's §0 Substrate & constraints. Don't improvise.
|
||||
Current pending capabilities/roadmap items are tracked in `BETA_LAUNCH_PLAN.md`.
|
||||
|
||||
@@ -1,227 +1,8 @@
|
||||
# AI Harness Gaps — Proposal
|
||||
# AI Harness Stability & Middleware (Shipped)
|
||||
|
||||
> Four gaps in the Vibn AI experience that are **structural, not promptable**.
|
||||
> Each one is responsible for a specific failure pattern visible in real
|
||||
> production chat transcripts. None of them are scoped in
|
||||
> [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md),
|
||||
> [`BETA_LAUNCH_PLAN.md`](./BETA_LAUNCH_PLAN.md),
|
||||
> [`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md), or the
|
||||
> agent-execution / telemetry-streaming designs.
|
||||
>
|
||||
> **Drafted:** 2026-04-30 (after a transcript review of the Dr Dave + Twenty CRM threads).
|
||||
>
|
||||
> **Why these four:** they share a common shape — the model is doing what
|
||||
> the prompt told it to, and still producing a bad outcome. The fix lives
|
||||
> in the *harness around the model*, not in instructions to the model.
|
||||
> **Note:** These middleware stability mechanisms have been shipped.
|
||||
|
||||
---
|
||||
|
||||
## TL;DR
|
||||
|
||||
| # | Gap | Failure pattern in prod | Fix size |
|
||||
|---|---|---|---|
|
||||
| 1 | Tool-error recovery middleware | Orphan twenty-* services (4 shipped). Model keeps delete-and-recreating despite explicit prompt rule against it. | ~2 hr |
|
||||
| 2 | Browser-driver tool for the AI | "Should be live in 10s" — AI ships URLs without ever loading them; user discovers the 502. | ~4 hr |
|
||||
| 3 | Live UI state attached to chat messages | "this isn't working" / "fix the URL" with no signal of which "this". AI guesses, often wrong. | ~3 hr |
|
||||
| 4 | Diff preview / accept-changes gate | `fs_edit` writes straight to the dev container with no review surface. Fine for sub-second iteration; bad for prod-bound edits. | ~6 hr |
|
||||
|
||||
Total: ~15 hr of work. None require new infra.
|
||||
|
||||
---
|
||||
|
||||
## Gap 1 — Tool-error recovery middleware (highest ROI)
|
||||
|
||||
**Failure observed:** in thread `d698ef40-…` ("Hey there, what can you see about this project?"), the AI hit
|
||||
`Conflict. The container name "/postgres-…" is already in use` **three separate times**.
|
||||
On each attempt it responded by *creating a new service with a new name*,
|
||||
not by calling `apps_unstick`. The prompt explicitly tells it not to do
|
||||
this and tells it the recovery sequence. The model still did it.
|
||||
|
||||
**Why prompt rules fail here:** the model treats the system prompt as
|
||||
soft guidance against a 30k-token document; the tool result is concrete
|
||||
and 200ms-fresh. When tool reality contradicts prompt rules, tool
|
||||
reality wins.
|
||||
|
||||
**Proposed fix:** middleware in `executeMcpTool` that pattern-matches
|
||||
known-recoverable errors and **injects a synthetic system message** into
|
||||
the conversation before the next round. The model can't ignore an
|
||||
injected instruction the way it can ignore a static prompt rule.
|
||||
|
||||
```ts
|
||||
// In app/api/chat/route.ts, around the executeMcpTool call:
|
||||
const errorRecovery = detectKnownError(result);
|
||||
if (errorRecovery) {
|
||||
messages.push({
|
||||
role: "system",
|
||||
content: `[RECOVERY] ${errorRecovery.diagnosis}. Required next action: ${errorRecovery.fix}. Do NOT ${errorRecovery.antipattern}.`,
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**Initial recovery rules** (high-confidence, low-false-positive):
|
||||
|
||||
| Error signature | Diagnosis | Fix | Antipattern |
|
||||
|---|---|---|---|
|
||||
| `Conflict. The container name … is already in use` | Orphan container blocking new boot | `apps_unstick { uuid }` then `apps_deploy { uuid }` | Delete and recreate with a new name |
|
||||
| `pull access denied` / `manifest unknown` | Image not on the host yet | `apps_repair { uuid }` | Retry deploy without addressing the cause |
|
||||
| `port … is already allocated` | Another container holds the port | List containers, identify holder, decide | Pick a random different port |
|
||||
|
||||
**Effort:** ~2 hr. New file `lib/ai/error-recovery.ts` with a registry of
|
||||
patterns + the injection in the chat route. Each rule is ~10 lines.
|
||||
|
||||
**Slot into:** `BETA_LAUNCH_PLAN.md` Phase 2 (Stability & visibility) — fits next to 2.4 (deployment-failed webhook).
|
||||
|
||||
---
|
||||
|
||||
## Gap 2 — Browser-driver tool for the AI
|
||||
|
||||
**Failure observed:** in the same Twenty thread, the AI said *"It's
|
||||
fully deployed, healthy, and I've verified it's returning a 200 OK
|
||||
status"* — but the user saw "Unable to Reach Back-end" on the actual
|
||||
page. The AI checked Coolify's status reporting, not the rendered app.
|
||||
Also visible in the Dr Dave thread: *"Note: it might take 10-15 seconds
|
||||
on the very first load for the DNS to propagate"* — the AI hedged
|
||||
because it couldn't load the URL itself.
|
||||
|
||||
**Why this matters for beta:** every "I deployed it" claim is unverified
|
||||
unless the AI can open the URL. Sentry (planned in P2.3) catches
|
||||
errors *after a user hits them*. A browser tool catches errors
|
||||
*before any user hits them*.
|
||||
|
||||
**Proposed fix:** add a `browser.*` MCP tool surface backed by a
|
||||
headless Chromium running on the Coolify host (or in the vibn-dev
|
||||
container). Initial tools:
|
||||
|
||||
| Tool | Purpose |
|
||||
|---|---|
|
||||
| `browser.navigate { url, timeoutMs? }` | Load the URL, return final URL + status code + page title |
|
||||
| `browser.screenshot { url }` | Visual confirmation. Return base64 PNG (or store in GCS) |
|
||||
| `browser.console_logs { url }` | Capture client-side JS errors (the `TypeError: reading 'z'/'j'/'aa'` from BETA P2.2 would be findable this way) |
|
||||
| `browser.fetch { url, headers? }` | HTTP-level smoke test. Subset of `http_fetch` but always from inside Vibn's network |
|
||||
|
||||
**Implementation:** Playwright already has an MCP server (`@modelcontextprotocol/server-playwright`).
|
||||
Wire it as a Coolify service, expose via the same per-workspace MCP
|
||||
token Vibn already issues.
|
||||
|
||||
**Effort:** ~4 hr. ~2 hr to deploy Playwright as a service, ~1 hr to
|
||||
add tool definitions, ~1 hr to wire prompt instructions ("after any
|
||||
deploy or `dev_server.start`, call `browser.navigate` to confirm").
|
||||
|
||||
**Slot into:** Phase 2 (Stability & visibility) — pairs with the
|
||||
runtime error chase (2.1, 2.2) and the Sentry wiring (2.3).
|
||||
|
||||
---
|
||||
|
||||
## Gap 3 — Live UI state attached to chat messages
|
||||
|
||||
**Failure observed:** in the Dr Dave thread, user typed *"are you able
|
||||
to give me a preview url?"* The AI didn't know which port the
|
||||
Next.js dev server would bind to, what was already running, or
|
||||
whether the user was looking at the chat or another tab. It
|
||||
guessed and re-discovered everything from scratch.
|
||||
|
||||
In the Twenty thread, *"can you see the different sections?"* — user
|
||||
meant Plan tab sections (Vision/Tasks/Decisions/Ideas). AI listed
|
||||
metadata. No way to know.
|
||||
|
||||
**Why prompt rules can't fix this:** the AI literally lacks the
|
||||
information.
|
||||
|
||||
**Proposed fix:** the chat panel sends a small `uiContext` object
|
||||
alongside every user message. Inject into the system prompt as a
|
||||
dynamic block (same shape as `activeBlock`):
|
||||
|
||||
```ts
|
||||
{
|
||||
currentRoute: "/mark-account/project/abc/hosting",
|
||||
currentTab: "hosting",
|
||||
visibleResources: [
|
||||
{ kind: "app", uuid: "y4cs…", name: "vibn-frontend" },
|
||||
{ kind: "service", uuid: "igcp…", name: "vibn-dev-twenty-crm" },
|
||||
],
|
||||
lastUserActions: [
|
||||
{ at: "2m ago", action: "opened twenty-crm logs" },
|
||||
{ at: "5m ago", action: "switched to Hosting tab" },
|
||||
],
|
||||
}
|
||||
```
|
||||
|
||||
System-prompt block becomes:
|
||||
|
||||
> The user is currently looking at the **Hosting tab** (route: `…/hosting`).
|
||||
> Visible resources: `vibn-frontend`, `vibn-dev-twenty-crm`.
|
||||
> Recent actions: opened twenty-crm logs (2m ago), switched to Hosting (5m ago).
|
||||
> When the user says "this" / "it" / "the URL" — assume they mean
|
||||
> something visible in the current viewport unless they name something else.
|
||||
|
||||
**Effort:** ~3 hr. ~1 hr to wire the chat panel's
|
||||
`uiContext` collection (existing route + tab state, last 5 actions
|
||||
from a small ring buffer in the panel), ~1 hr to plumb through the
|
||||
chat API, ~1 hr to add the prompt block.
|
||||
|
||||
**Slot into:** Phase 3 (UX surfaces) — pairs with 3.2 (structured
|
||||
errors in chat) and 3.3 (empty-state nudges).
|
||||
|
||||
---
|
||||
|
||||
## Gap 4 — Diff preview / accept-changes gate
|
||||
|
||||
**Failure observed:** none yet, but the surface is exposed today —
|
||||
`fs_edit` writes directly to `/workspace` in the dev container. For
|
||||
ephemeral exploration this is correct (sub-second iteration is the
|
||||
whole Path B point). For changes destined to ship, the user has no
|
||||
review surface; they only see what changed after the AI summarizes.
|
||||
|
||||
**Why this matters for beta:** the moment a paying user wants to
|
||||
"see what the AI changed before it goes live," there's nothing to
|
||||
show them. Cursor's whole UX is built on diffs the user accepts.
|
||||
|
||||
**Proposed fix:** two-mode `fs_edit` / `fs_write`:
|
||||
|
||||
1. **Direct mode (default for dev container):** write immediately. Current
|
||||
behavior. Fine for "make the button blue" iteration.
|
||||
2. **Staged mode (default when `ship` is the next likely action):**
|
||||
write to a shadow path, surface a diff in the chat UI, gate the
|
||||
real write on a one-click "Accept" button.
|
||||
|
||||
The model decides which mode based on context — or simpler: stage when
|
||||
the file is in a "protected" set (e.g. `prisma/schema.prisma`,
|
||||
`Dockerfile`, `package.json`, anything in `prod/` or `migrations/`),
|
||||
direct otherwise.
|
||||
|
||||
**Effort:** ~6 hr. ~2 hr backend (shadow write + apply endpoint),
|
||||
~3 hr UI (diff renderer in the chat panel, accept/reject buttons),
|
||||
~1 hr prompt + tool changes.
|
||||
|
||||
**Slot into:** Phase 4 (Onboarding & safety) — pairs with 4.5 (auth
|
||||
hardening) and 4.6 (compute quotas) as part of "what a stranger
|
||||
needs day 1."
|
||||
|
||||
---
|
||||
|
||||
## Suggested sequencing
|
||||
|
||||
If we ship in priority order:
|
||||
|
||||
1. **Gap 1 first** — kills the worst pattern in prod for ~2 hr of work. Should be ahead of any new feature in Phase 2.
|
||||
2. **Gap 2 second** — closes the verify-deploy loop. Multiplies the value of every subsequent AI-shipped change because it's no longer blind.
|
||||
3. **Gap 3 third** — tighter conversational UX. Once 1 and 2 work, the remaining UX cliff is "AI doesn't know what I'm looking at."
|
||||
4. **Gap 4 last** — only matters once we have paying users editing prod-bound code. Pre-beta optional.
|
||||
|
||||
Total effort to ship 1+2+3 (the meaningful UX wins): **~9 hours.**
|
||||
|
||||
---
|
||||
|
||||
## How this changes BETA_LAUNCH_PLAN.md
|
||||
|
||||
Two new tasks slot in:
|
||||
|
||||
- **P2.8** Tool-error recovery middleware (Gap 1) — block on nothing, ship before P2.4.
|
||||
- **P2.9** Browser-driver MCP tool (Gap 2) — block on nothing.
|
||||
|
||||
One new task in P3:
|
||||
|
||||
- **P3.7** UI-state injection into chat (Gap 3) — block on nothing.
|
||||
|
||||
Gap 4 stays out of beta scope unless eval reveals real damage from
|
||||
unstaged edits.
|
||||
- The chat loop (`app/api/chat/route.ts`) acts as a robust harness that intercepts tool errors and automatically suggests recovery paths (e.g., port conflicts, container collisions).
|
||||
- The maximum tool execution loop is capped (`MAX_TOOL_ROUNDS=30`) to prevent runaway AI loops.
|
||||
- `fs_edit` uses line-number replacements alongside strict `oldString` matching to avoid Aider-style search-and-replace failures.
|
||||
- Sentry and Coolify deployment webhooks automatically pipe deployment/build failures back to the user/AI.
|
||||
|
||||
@@ -1,288 +1,12 @@
|
||||
# Path B Execution Plan — Persistent Dev Container Architecture
|
||||
# AI Path B (Shipped)
|
||||
|
||||
> The plan to replace Vibn's current "API-wrap-every-Coolify-action" agent
|
||||
> surface with a Claude-Code-style architecture: one persistent dev
|
||||
> container per Vibn project, ~10 composable tools, sub-15-second
|
||||
> iteration, and Coolify only touched at "ship it" time.
|
||||
>
|
||||
> **Companion to:** [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (current
|
||||
> state) and [`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md)
|
||||
> (everything else).
|
||||
>
|
||||
> **Status:** week 1 shipped (2026-04-28). Tool surface is live in code; image build on Coolify host + DNS wildcard + Traefik wiring still pending.
|
||||
>
|
||||
> **Why this exists:** today's AI loop is *3–7 min to first preview, 2–4
|
||||
> min per iteration*, because every change goes through a Coolify nixpacks
|
||||
> build. That UX cannot host the marketplace / SaaS / iterative-build
|
||||
> stories Vibn is selling. Path B fixes the floor.
|
||||
> **Note:** This document outlines the architecture for "Path B", which shifted the AI's execution context from Cloud Run to persistent per-project Docker containers hosted on the Coolify server. This architecture was fully successfully shipped in May 2026.
|
||||
|
||||
---
|
||||
## Architecture
|
||||
- Every project has a persistent Gitea repository.
|
||||
- Every project gets a single `vibn-dev` container provisioned as a Coolify service (`ensureDevContainer`).
|
||||
- The AI runs its tools (like `shell_exec` and `fs_*`) *inside* this container using `docker exec` via the Coolify API.
|
||||
- Dev servers (like `npm run dev`) bind to `0.0.0.0:3000` and are exposed to the internet via Traefik wildcard subdomains (`*.preview.vibnai.com`).
|
||||
- When the user is ready, the code is committed to Gitea and deployed to production via `apps_deploy`.
|
||||
|
||||
## 1. The user experience this unlocks
|
||||
|
||||
Reference scenario: a non-technical founder chats *"build me a
|
||||
two-sided marketplace for handmade ceramics."*
|
||||
|
||||
| Phase | Path A (today) | Path B (target) |
|
||||
|---|---|---|
|
||||
| Discovery & OSS pick | OK | OK |
|
||||
| Fork an OSS base (e.g. Sharetribe, 800 files) | ~15 min of single-file commits, 800 webhook fires | `git clone` in 8s |
|
||||
| First live preview | 3–7 min (Coolify build) | ~30s (Vite HMR in dev container) |
|
||||
| Each iteration | 2–4 min (rebuild) | 3–15s (HMR / process restart) |
|
||||
| User makes 10 small decisions | ~40 min of staring at spinners | ~3 min of conversation |
|
||||
| "Ship it" → real domain | already 3 min | 3 min (unchanged — this is the only Coolify build) |
|
||||
| Total time to live, polished marketplace | 30–60 min, often abandoned | ~20 min, mostly the user thinking |
|
||||
|
||||
The asymmetry is structural, not optimisable inside Path A.
|
||||
|
||||
---
|
||||
|
||||
## 2. Architecture overview
|
||||
|
||||
```
|
||||
┌──────────────────────────┐ ┌────────────────────────────────┐
|
||||
│ vibnai.com chat (user) │ ←→ │ /api/mcp │
|
||||
└──────────────────────────┘ │ ├ shell.exec │
|
||||
│ ├ fs.read / fs.edit / fs.glob │
|
||||
│ ├ dev_server.start │
|
||||
│ ├ ship │
|
||||
│ └ apps.* / databases.* / ... │
|
||||
└────────────┬───────────────────┘
|
||||
│
|
||||
▼ (workspace-scoped)
|
||||
┌────────────────────────────────────┐
|
||||
│ Per-Vibn-project Coolify project │
|
||||
│ ├ vibn-dev ← dev container │
|
||||
│ ├ web ← prod app │
|
||||
│ ├ db │
|
||||
│ └ ... │
|
||||
└────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Per-project dev container — the only new piece
|
||||
|
||||
For every active Vibn project, we run **one long-lived Coolify
|
||||
service named `vibn-dev`** inside that project's dedicated Coolify
|
||||
project (Stage 2/3 of per-project isolation already shipped).
|
||||
|
||||
| Property | Value |
|
||||
|---|---|
|
||||
| **Image** | `ghcr.io/vibnai/vibn-dev:latest` (we build & maintain) |
|
||||
| **Base** | Ubuntu 24.04 |
|
||||
| **Pre-installed** | Node 20, bun, pnpm, Python 3.12 + uv, Go 1.23, Rust, git, gh, `tea` (Gitea CLI), ripgrep, fd, jq, curl, tar, openvscode-server |
|
||||
| **Default `cwd`** | `/workspace` (persistent volume containing the Gitea working tree) |
|
||||
| **Persistent volumes** | `/workspace` (git tree), `/cache/{npm,pip,go,cargo}` (package caches) |
|
||||
| **Resource floor** | 512 MB / 0.25 CPU when idle |
|
||||
| **Resource ceiling** | 4 GB / 2 CPU during builds (configurable per workspace plan) |
|
||||
| **Idle suspend** | After 30 min no `shell.exec` activity |
|
||||
| **Re-wake** | Any `shell.exec` / `fs.*` / `dev_server.*` call |
|
||||
| **Ports** | 3000–9999 reserved for the AI's dev server, exposed at `https://preview-{ws}-{project}.vibnai.com` via Traefik wildcard |
|
||||
| **Tenancy** | Inherits per-project Coolify isolation — workspace can never reach into another's dev container |
|
||||
|
||||
### Why this shape (and not e2b / Cloud Run / VM-per-task)
|
||||
|
||||
- We already have Coolify, per-project Coolify projects, and Coolify
|
||||
exec primitives. Adding one service per project is zero new infra.
|
||||
- Persistence (workspace state, package cache, git working tree)
|
||||
matters more than per-task isolation for our user. Founders return
|
||||
to projects across sessions.
|
||||
- Tenant safety is already solved at the Coolify-project layer.
|
||||
- Cost stays bounded: one container per *active* project, idle-suspended.
|
||||
- Upgrade path to e2b / Firecracker exists later if needed (replace the
|
||||
executor, keep the tool surface).
|
||||
|
||||
---
|
||||
|
||||
## 3. Tool surface
|
||||
|
||||
### New tools (the AI's primary working set)
|
||||
|
||||
| Tool | Signature | Purpose |
|
||||
|---|---|---|
|
||||
| `shell.exec` | `{ cmd, cwd?, timeoutSec?, env? }` | Run any shell command in the dev container. Streams stdout/stderr back. Capped 15 min. |
|
||||
| `fs.read` | `{ path, ref? }` | Read a file (or directory listing) from `/workspace`. |
|
||||
| `fs.write` | `{ path, content }` | Create/overwrite a file. |
|
||||
| `fs.edit` | `{ path, oldString, newString, replaceAll? }` | Aider-style search/replace. Fails if `oldString` not found / not unique. |
|
||||
| `fs.glob` | `{ pattern, cwd? }` | List files matching a pattern (e.g. `**/*.tsx`). |
|
||||
| `fs.grep` | `{ pattern, glob?, contextLines? }` | ripgrep-backed code search. |
|
||||
| `fs.delete` | `{ path }` | Delete a file or directory. |
|
||||
| `dev_server.start` | `{ cmd, port, name? }` | Start a long-running process (e.g. `npm run dev`). Returns a public preview URL. |
|
||||
| `dev_server.stop` | `{ id }` | Kill a dev server. |
|
||||
| `dev_server.list` | — | What's running, on what URL. |
|
||||
| `ship` | `{ projectId, commitMsg, deploy? }` | `git add . && git commit && git push` to Gitea, then trigger Coolify deploy of the prod app. The "graduate to production" tool. |
|
||||
|
||||
### Kept (orchestration — these are correctly modeled as APIs)
|
||||
|
||||
- `apps.*` — Coolify app CRUD, logs, domains, env vars, etc.
|
||||
- `databases.*`, `auth.*`, `domains.*`, `storage.*` — infrastructure primitives.
|
||||
- `projects_get`, `projects_list`, `workspace_describe` — context.
|
||||
- `github_search`, `github_file`, `http_fetch` — external lookup.
|
||||
|
||||
### Deprecated (kept for back-compat, banner in docs)
|
||||
|
||||
- `gitea_file_read`, `gitea_file_write`, `gitea_file_delete`,
|
||||
`gitea_branches_list`, `gitea_branch_create`,
|
||||
`gitea_repo_create`, `gitea_repo_get`, `gitea_repos_list` — the
|
||||
AI uses `shell.exec` (`git`/`tea` CLI) and `fs.*` instead.
|
||||
- `apps.exec` — kept (it's still useful for prod-container debugging),
|
||||
but deprecated for *dev-time* code work.
|
||||
|
||||
**Net change:** 53 tools → ~30 tools, but the new ones compose to do
|
||||
everything the old ones did and more.
|
||||
|
||||
---
|
||||
|
||||
## 4. The system prompt rewrite
|
||||
|
||||
The AI's prompt today says *"call gitea_file_write to push code."* It
|
||||
becomes:
|
||||
|
||||
> You have a real Linux dev environment for this project at `/workspace`.
|
||||
> Use `shell.exec` to run any command (npm, git, tea, python, anything).
|
||||
> Use `fs.edit` for surgical changes, `fs.write` for new files.
|
||||
>
|
||||
> Standard loop:
|
||||
> 1. `shell.exec { cmd: "git status" }` to see what's there.
|
||||
> 2. Edit / create files via `fs.edit` / `fs.write`.
|
||||
> 3. `shell.exec { cmd: "npm test" }` (or relevant test runner).
|
||||
> 4. `dev_server.start` to give the user a live preview URL.
|
||||
> 5. When the user says "ship it", call `ship` — that pushes and
|
||||
> triggers the production Coolify deploy.
|
||||
>
|
||||
> NEVER call `apps_create` to deploy code that hasn't been tested via
|
||||
> `shell.exec` first. The dev container is your safety net.
|
||||
|
||||
---
|
||||
|
||||
## 5. Week-by-week execution
|
||||
|
||||
### Week 1 — Foundations (dev container + shell) — **SHIPPED 2026-04-28**
|
||||
|
||||
**Goal:** AI can clone a repo, install deps, run a script.
|
||||
|
||||
- [x] `vibn-dev/Dockerfile` (Ubuntu 24.04 + git + ripgrep + python3 + mise lazy toolchains). `setup-on-coolify.sh` builds it on the host; compose uses `pull_policy: never` to avoid registry round-trips.
|
||||
- [x] `lib/dev-container.ts`: ensure / exec / suspend / resume helpers. Backed by `fs_project_dev_containers` (auto-created).
|
||||
- [x] `devcontainer.{ensure,status,suspend}` MCP tools.
|
||||
- [x] `shell.exec` + `fs.{read,write,edit,list,delete,glob,grep}` MCP tools — all enforce per-workspace tenancy via `fs_projects` ownership lookup, all locked to `/workspace`.
|
||||
- [x] Network isolation: per-project `vibn-dev-net-${slug}` bridge — no route to `vibn-postgres` / `vibn-frontend`.
|
||||
- [x] Kill switch: `/api/admin/path-b/{disable,enable}` flips a feature flag in <10s.
|
||||
- [x] `vibn-tools.ts`: 11 new Gemini tool defs, smoke test passes (63 tools accepted).
|
||||
- [x] System prompt rewritten — shell-first guidance, `gitea_file_*` flagged for hard removal in week 3.
|
||||
|
||||
**Still pending for week 1 exit:** build the image on the live Coolify host (`ssh + setup-on-coolify.sh`), end-to-end verify `devcontainer.ensure → shell.exec ls` against a real project once the frontend deploy lands.
|
||||
|
||||
### Week 2 — Preview URLs + iteration — **PARTIALLY SHIPPED 2026-04-28**
|
||||
|
||||
**Goal:** AI starts a dev server, user clicks a preview URL, sees their app.
|
||||
|
||||
- [ ] DNS: `*.preview.vibnai.com → coolify-host-ip` in OpenSRS. **Manual step, not yet done.**
|
||||
- [ ] Traefik wildcard cert via DNS-01 against OpenSRS. **Config staged in `vibn-dev/PREVIEWS.md`, not yet applied to live Traefik.**
|
||||
- [x] `dev_server.{start,stop,list,logs}` MCP tools. Process is `nohup`'d inside the container, PID/port/preview-url tracked in `fs_dev_servers`. Server is reachable from inside the container today; Traefik label injection is **deferred** (see PREVIEWS.md for the recommended pre-allocated-port-range approach).
|
||||
- [x] `fs.edit` Aider-style (HTTP 404 if missing, 409 if ambiguous, success returns replacement count).
|
||||
- [x] Per-container CPU/RAM caps: 1 vCPU / 1 GiB by default. Tier scaling via env var.
|
||||
- [x] System prompt rewritten with shell-first recipe.
|
||||
|
||||
**Exit criteria progress:** end-to-end works inside the container; preview URL routing is the last mile.
|
||||
|
||||
### Week 3 — Ship-it path + cleanup — **PARTIALLY SHIPPED 2026-04-28**
|
||||
|
||||
**Goal:** the dev container's working tree graduates to production.
|
||||
|
||||
- [x] `ship` MCP tool: `git init` (if needed) → `git add -A && git commit && git push` to Gitea using the workspace bot PAT, then triggers `deployApplication` if the project has a linked Coolify app.
|
||||
- [x] Auto-push autosave to `vibn-autosave/main` branch (force-push, throttled to once per 5 min). Endpoint: `POST /api/admin/path-b/autosave { projectId | sweep:true }`.
|
||||
- [x] Idle-suspend sweep: `POST /api/admin/path-b/idle-sweep[?minutes=30]`. Wire to a 5-min cron once we trust the suspend path.
|
||||
- [ ] Hard-remove `gitea_file_*` from the AI tool list (keep REST endpoints alive 30 days). **Deferred to next week so we can A/B the new tools first.**
|
||||
- [ ] Update `AI_CAPABILITIES.md`. **Deferred — will rewrite once eval data is in.**
|
||||
|
||||
**Exit criteria progress:** ship loop is functionally complete. Outstanding: full prod test against a real project, gitea_file_* hard-remove, docs refresh.
|
||||
|
||||
### Week 4 — Eval, polish, IDE drop-in
|
||||
|
||||
**Goal:** measure that this actually delivers the promised UX, ship the optional graduation path.
|
||||
|
||||
- [ ] **Eval harness:** 10 reference prompts (TODO app, marketplace, blog with auth, kanban, image-uploader, AI chatbot, simple e-commerce, dashboard, REST API + DB, static site). Measure: time-to-first-preview, time-to-shipped, AI tool-call count, success rate. Compare to a baseline run on Path A.
|
||||
- [ ] **Theia drop-in:** expose openvscode-server (already in the image) at `https://ide-{ws}-{project}.vibnai.com`. Optional toggle in chat UI: "Open IDE." Lets a user-becoming-developer drop into the same `/workspace` the AI's been editing.
|
||||
- [ ] **Bug fixes** found during eval.
|
||||
- [ ] **Docs:** update Vibn's user-facing pages to reflect the new "describe → live preview in seconds → iterate → ship" flow.
|
||||
|
||||
**Exit criteria:** eval shows ≥3× speedup on time-to-first-preview vs.
|
||||
Path A, ≥80% success rate on the 10 reference prompts.
|
||||
|
||||
---
|
||||
|
||||
## 6. OSS we will lean on (not reinvent)
|
||||
|
||||
| Need | OSS choice | Notes |
|
||||
|---|---|---|
|
||||
| Dev container image base | Ubuntu 24.04 + toolchains | We bake & maintain. ~1 GB. |
|
||||
| In-browser IDE (week 4 graduation path) | `openvscode-server` (`gitpod-io/openvscode-server`, MIT) | Pre-installed in the image. Optional toggle. |
|
||||
| Edit format | **Aider's search/replace block format** (`Aider-AI/aider`, Apache 2.0) | Borrow the format + error semantics. |
|
||||
| Process supervision inside the container | `tini` (already standard) + a tiny in-house supervisor for `dev_server.*` | No need for full systemd. |
|
||||
| Code search inside the container | `ripgrep` (`BurntSushi/ripgrep`, MIT) | Pre-installed. `fs.grep` is a thin wrapper. |
|
||||
| Git inside the container | `git` + `tea` (Gitea CLI, MIT) | `tea` lets the AI do PR ops without us building gitea_pr_* tools. |
|
||||
| Reference for end-to-end agent loops | `All-Hands-AI/OpenHands` (MIT) | Read their runtime + tool design. Don't import their code. |
|
||||
| Reference for fast iteration UX | `bolt.new` (`stackblitz/bolt.new`) | UX north star, not a code source. |
|
||||
|
||||
---
|
||||
|
||||
## 7. Risks & open questions
|
||||
|
||||
| Risk | Mitigation |
|
||||
|---|---|
|
||||
| **Dev containers eat money.** 100 active projects × 24/7 = ~$50/mo wasted. | Idle-suspend after 30 min. Resume in <5s. Per-plan caps. Auto-delete suspended-and-untouched volumes after 30 days. |
|
||||
| **`shell.exec` is the universal escape hatch — security?** AI inside a single workspace's container can do anything that container can do. | (a) Per-project Coolify isolation. (b) **Network policy: dev containers have NO route to internal Vibn services (vibn-postgres, vibn-frontend, Coolify control plane). Implemented via Docker network rules in week 1, not deferred.** (c) Audit log on every `shell.exec` call. (d) Per-container CPU/RAM caps absorb fork-bomb / coin-mining attempts. |
|
||||
| **Preview URL leaks.** `https://preview-mark-ceramic-market.vibnai.com` is publicly resolvable. | Default: random suffix in subdomain (`preview-mark-ceramic-market-7a3f.vibnai.com`) — ~64 bits of unguessability. Optional Vibn-session-cookie auth as paid-tier feature later. |
|
||||
| **Hot reload through Traefik.** WebSocket / HMR can be finicky over a reverse proxy. | **Spike on week 1, day 1**: bring up a Vite dev server inside vibn-dev, expose via Traefik, edit a file, verify HMR fires. Failure here is the biggest "things look fine until you actually test" risk; de-risk early. |
|
||||
| **Image size / pull time on first project.** ~1 GB pull adds 30–60s to first dev container spin-up. | (a) Pre-pull image on every Coolify host on deploy. (b) **Keep base image small (~500 MB: OS + git + ripgrep + supervisord + IDE server). Lazy-install language toolchains via `mise` on first project use.** Prevents the image from bloating to 4 GB six months from now. |
|
||||
| **Dependency cache poisoning.** Cached `node_modules` from project A bleeds into project B. | Caches are per-project (volume `vibn-dev-cache-{projectId}`). Never share. Take the slower-first-install hit; add a Verdaccio mirror later only if it bothers anyone. |
|
||||
| **AI keeps calling `gitea_file_*` instead of `shell.exec`.** | **Hard removal from AI's tool list in week 3, not soft deprecation.** Keep REST endpoints alive for a 30-day grace period for any external MCP client. After 30 days, return 410 Gone. The AI has no muscle memory; no graceful migration needed. |
|
||||
| **What if the user has no Vibn project yet?** | First chat creates a project + provisions its Coolify project + spins up `vibn-dev` lazily. ~10s overhead, one-time. Stream progress to the chat ("creating workspace... installing tools..."). Same UX bolt.new uses while WebContainers boot. |
|
||||
| **Coolify host disk dies → users lose unshipped `/workspace` work.** | **Auto-push to Gitea `vibn-autosave/main` branch every 5 min of activity, plus before idle-suspend.** Treat Gitea as canonical, container disk as ephemeral. Built in week 1, day 2 (not optional). |
|
||||
| **Path B turns out to be wrong; we need to revert.** | **Kill-switch admin endpoint (`POST /api/admin/path-b/disable`) flips a feature flag — all new chat sessions go back to Path A; existing dev containers drain.** ~10-min revert window. Built week 1. |
|
||||
|
||||
---
|
||||
|
||||
## 8. Success metrics
|
||||
|
||||
We're not done until **all four** are true on the eval harness:
|
||||
|
||||
| Metric | Target | Today (Path A) |
|
||||
|---|---|---|
|
||||
| Time-to-first-preview (10 reference prompts, p50) | ≤ 60 s | ~5 min |
|
||||
| Iteration loop (small edit → user sees change) p50 | ≤ 15 s | ~3 min |
|
||||
| Tool calls per "build me X" task (median) | ≤ 30 | ~80 |
|
||||
| End-to-end success rate (live deployable result) | ≥ 80% | ~50% |
|
||||
|
||||
---
|
||||
|
||||
## 9. What this changes about the existing roadmap
|
||||
|
||||
- **Tier 1.5 ("Code authoring capability") is collapsed into this doc.** C1–C9 mostly disappear (replaced by `shell.exec` + `fs.edit`); C10 ("persistent agent dev workspace") **is** Path B.
|
||||
- **Tier 1 P5.1–P5.4 are unchanged.** Domains, email, storage, workers — still the right next infra primitives. Path B doesn't replace them; it makes the AI capable enough to actually use them.
|
||||
- **Tier 2 P6.x** (backups, runtime logs, scoped keys) — unchanged.
|
||||
- **`gitea_*` tools shipped 2026-04-28** are now legacy. Mark deprecated in week 3. Remove in a future cleanup once telemetry confirms zero usage.
|
||||
|
||||
---
|
||||
|
||||
## 10. Decision needed before week 1 starts
|
||||
|
||||
1. **Approve Path B as the primary architecture for code authoring.** (If no, this doc dies here.)
|
||||
2. **Approve the dev-container-as-Coolify-service implementation choice.** Alternatives: separate dev-host, e2b self-host, Cloud Run jobs. Picked Coolify-service for zero new infra; flag if you want to revisit.
|
||||
3. **Approve the deprecation of `gitea_file_*` tools.** They were shipped today; deprecating them within 3 weeks is fine if the path forward is clearer, embarrassing if we keep them around as half-working alternates.
|
||||
4. **Approve the resource cap defaults** (free: 1 GB / 0.5 CPU, paid: 4 GB / 2 CPU). Or set different numbers.
|
||||
|
||||
Once those four are decided, week 1 starts.
|
||||
|
||||
---
|
||||
|
||||
## How to use this doc
|
||||
|
||||
- This is the *architectural* execution plan. The detailed task list
|
||||
goes into the agent's TodoWrite per-week, not into this file.
|
||||
- When an item ships, **move it from "planned" to "shipped"** in
|
||||
[`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) and link the commit/PR.
|
||||
- When a risk in §7 turns out to be real, document the mitigation
|
||||
outcome inline so future readers see what actually happened.
|
||||
- This doc supersedes the proposed Tier 1.5 in
|
||||
[`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md). Add a
|
||||
one-line pointer there once approved.
|
||||
*(Refer to `lib/ai/vibn-tools.ts` and `app/api/mcp/route.ts` for the live implementation).*
|
||||
|
||||
@@ -1,275 +1,11 @@
|
||||
# Project Page Architecture — Product / Infrastructure / Hosting
|
||||
# Project Page Architecture
|
||||
|
||||
> The plan to collapse the 16-page sidebar mess at
|
||||
> `/[workspace]/project/[projectId]/*` into 3 founder-friendly
|
||||
> sections, and to make `/project/<id>` actually reflect what the AI
|
||||
> is doing in the dev container instead of stale Gitea/prod-Coolify
|
||||
> data.
|
||||
>
|
||||
> **Companion to:** [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md)
|
||||
> (Path B is the engine; this doc is the dashboard for it).
|
||||
>
|
||||
> **Status:** week 1 doc + home-page redesign in flight (2026-04-28).
|
||||
> **Note:** The UI was heavily refactored. The primary surfaces for a project are now:
|
||||
|
||||
---
|
||||
1. **The Plan Tab (`/plan`):** Contains the project's vision/objective document, tasks, decisions, and raw ideas. The AI acts as a scribe here.
|
||||
2. **The Product Tab (`/product`):** Lists the live codebases (Gitea) and running images (Docker containers).
|
||||
3. **The Infrastructure Tab (`/infrastructure`):** Lists the underlying resources (PostgreSQL databases, Redis, etc.) managed by Coolify.
|
||||
4. **The Hosting Tab (`/hosting`):** Lists live runtime environments, logs, and preview URLs.
|
||||
5. **The Chat Panel:** Available on all project surfaces as a slide-out, used to orchestrate work.
|
||||
|
||||
## 1. Why this exists
|
||||
|
||||
Today the project page (`/[workspace]/project/[projectId]`) shows two
|
||||
tiles — Code + Infrastructure — and links to a sidebar with 16
|
||||
sub-routes (`build`, `run`, `infrastructure`, `deployment`,
|
||||
`overview`, `insights`, `analytics`, `prd`, `tasks`, `settings`,
|
||||
`assist`, `design`, `growth`, `grow`, `mvp-setup`, `code` — the last
|
||||
of which doesn't exist as a route, so the home tile is a dead link).
|
||||
|
||||
Two structural problems:
|
||||
|
||||
1. **The sidebar grew without an anchor concept.** Founders have no
|
||||
mental model of what the 16 pages map to; they just see a list
|
||||
and click around hoping for the right one. Half the pages are
|
||||
placeholders ("Coming soon"); the rest overlap.
|
||||
2. **None of the data sources have been updated for Path B.** The
|
||||
Code tile reads the Gitea repo (production master branch), but the
|
||||
AI now writes to the dev container's `/workspace`, often without
|
||||
pushing for hours. The Infrastructure tile reads production
|
||||
Coolify apps; new `dev_server.start` previews don't show up
|
||||
anywhere. So when AI does great work in chat, the project page
|
||||
doesn't update — the user has to tab back to chat to see anything.
|
||||
|
||||
---
|
||||
|
||||
## 2. The framing
|
||||
|
||||
Three sections, founder-friendly names, every project on Vibn maps
|
||||
cleanly into all three:
|
||||
|
||||
| Section | What it is | Founder asks… |
|
||||
|---|---|---|
|
||||
| **Product** | Custom code, design, content built for THIS vision | *"What did I build?"* |
|
||||
| **Infrastructure** | Reusable, swappable third-party services (auth, db, email, payments…) | *"What do I depend on?"* |
|
||||
| **Hosting** | Where the product runs and how people reach it (Coolify, domain, observability, cost) | *"Where does it live?"* |
|
||||
|
||||
### The boundary rule
|
||||
|
||||
> **Custom code = Product. Third-party service = Infrastructure.**
|
||||
> Runtime + reachability = Hosting.
|
||||
|
||||
Concrete edge cases:
|
||||
|
||||
- A custom `/api/upload` endpoint that calls S3 → endpoint is
|
||||
**Product**, S3 bucket + credentials are **Infrastructure**.
|
||||
- Custom job that sends a welcome email → job is **Product**, the
|
||||
job runner (Sidekiq/BullMQ) and email service (Resend) are
|
||||
**Infrastructure**.
|
||||
- Webhook handler that processes Stripe events → handler is
|
||||
**Product**, Stripe is **Infrastructure**.
|
||||
- Coolify scheduled task that runs your code → your code is
|
||||
**Product**, Coolify itself is **Hosting**.
|
||||
|
||||
---
|
||||
|
||||
## 3. Charters
|
||||
|
||||
### Product
|
||||
|
||||
Everything custom-built for this specific vision. The unique IP that
|
||||
wouldn't exist without this product.
|
||||
|
||||
**Includes:**
|
||||
- Frontend web app
|
||||
- Marketing site
|
||||
- Custom backend code & APIs
|
||||
- Custom business logic
|
||||
- Custom jobs / runners (the code, not the runner)
|
||||
- Brand, copy, design system
|
||||
- The repository itself
|
||||
- Customer base — the actual users you've earned
|
||||
|
||||
**Rule:** if you wrote it for this product, it's Product. If it's
|
||||
`node_modules` or a third-party SDK, it's not.
|
||||
|
||||
### Infrastructure
|
||||
|
||||
The reusable, swappable services your product depends on. The
|
||||
annoying multi-vendor world where you have to pick a provider.
|
||||
|
||||
**Includes:**
|
||||
- Auth provider (Clerk, Pocketbase, Authentik, Google OAuth, …)
|
||||
- Database (Postgres, MySQL, MongoDB, Redis, …)
|
||||
- File storage (S3, R2, MinIO)
|
||||
- Email (Resend, SendGrid, SES)
|
||||
- Payments (Stripe, Paddle, Lemon Squeezy)
|
||||
- Analytics (Plausible, PostHog, GA)
|
||||
- Search (Algolia, Meili, Typesense)
|
||||
- LLM provider (OpenAI, Anthropic, Gemini, Vertex)
|
||||
- Queues, maps, SMS, push notifications, …
|
||||
- Secrets and API keys that wire all of the above
|
||||
|
||||
**Rule:** if you could swap the vendor without changing your product
|
||||
code, it's Infrastructure.
|
||||
|
||||
### Hosting
|
||||
|
||||
Where the product physically runs and how people reach it.
|
||||
|
||||
**Includes:**
|
||||
- Container runtime (Coolify in our case)
|
||||
- Domain + DNS + SSL
|
||||
- CDN / edge
|
||||
- Observability (logs, errors, uptime)
|
||||
- Backups
|
||||
- Monthly cost
|
||||
|
||||
**Rule:** it's about *runtime and reachability,* not about what the
|
||||
software does.
|
||||
|
||||
---
|
||||
|
||||
## 4. Future sections (deferred)
|
||||
|
||||
Add as separate top-level cards once they become real concerns:
|
||||
|
||||
- **Models** — for AI-heavy products: which LLMs, which embedding
|
||||
model, prompt versions, eval scores, cost-per-call.
|
||||
- **Analytics** — when there are real users worth measuring.
|
||||
- **Marketing** — campaigns, blog, SEO, social, when there's a
|
||||
growth motion.
|
||||
- **Compliance** — Terms, Privacy, GDPR, SOC2, when shipping to
|
||||
paying customers.
|
||||
- **Support** — helpdesk, chat, status page, when there are
|
||||
customers complaining.
|
||||
- **Team** — when the project has more than one collaborator.
|
||||
|
||||
Same charter template each time. Same rule: code = Product,
|
||||
swappable = Infrastructure, runs/reachable = Hosting, otherwise it
|
||||
needs its own section.
|
||||
|
||||
---
|
||||
|
||||
## 5. Mapping today → tomorrow
|
||||
|
||||
| Today's page | Where it goes | Notes |
|
||||
|---|---|---|
|
||||
| `(home)/page.tsx` | New `(home)/page.tsx` (3-card grid) | Full redesign |
|
||||
| `code` (404) | `product/` (new) | Stub the route, point home tile at it |
|
||||
| `build` | Subroute under `product/files` (later) | Heavy 1626 lines; preserve the file tree component |
|
||||
| `run` | `hosting/` | Production runtime |
|
||||
| `infrastructure` | `hosting/` | Same data, different name |
|
||||
| `deployment` | `hosting/deploys` (later) | Deploy history is Hosting |
|
||||
| `overview` | Subroute under `product/` or merged into home | Decide once we see how home feels |
|
||||
| `prd` | Subroute under `product/` (vision) | Or its own "Define" section if we add one |
|
||||
| `tasks` | Subroute under `product/` (roadmap) | Or its own section later |
|
||||
| `assist` | `product/` (it's emails/chat your product sends) | These ARE product features |
|
||||
| `design` | `product/design` | Custom for this vision |
|
||||
| `growth`, `grow`, `analytics`, `insights`, `mvp-setup` | Defer, probably absorbed into a future "Analytics" or "Marketing" section | Many are placeholders today |
|
||||
| `settings` | Top-right gear (lives outside the 3 sections) | Project-level meta |
|
||||
|
||||
**Net:** 16 routes → 3 sections (+ settings). 8+ pages get rationalized
|
||||
into nothing because they were duplicating their neighbors.
|
||||
|
||||
---
|
||||
|
||||
## 6. Phased delivery
|
||||
|
||||
### Phase 1 — Tab navigation + section stubs (this session)
|
||||
|
||||
The three sections are TABS at the project level, not a card-grid
|
||||
landing page. A founder lands on the project URL and is immediately
|
||||
inside Product (the default tab); flipping to Infrastructure or
|
||||
Hosting is one click and stays in the same view. No
|
||||
intermediate "click a tile to drill in" step.
|
||||
|
||||
URL shape:
|
||||
|
||||
```
|
||||
/[workspace]/project/[id] → 308 redirect to /product
|
||||
/[workspace]/project/[id]/product → Product tab
|
||||
/[workspace]/project/[id]/infrastructure → Infrastructure tab
|
||||
/[workspace]/project/[id]/hosting → Hosting tab
|
||||
```
|
||||
|
||||
A shared layout at the project root renders:
|
||||
|
||||
- Project header (name, vision, stage pill, settings gear)
|
||||
- Tab bar (Product · Infrastructure · Hosting) — active tab
|
||||
highlighted; each tab carries a tiny status dot (green/amber/grey)
|
||||
- Slot for the active tab's page
|
||||
|
||||
The current `(home)/page.tsx` (the two-tile landing) is replaced by
|
||||
the redirect.
|
||||
|
||||
**Don't kill anything in `(workspace)/`.** Existing 16 routes stay
|
||||
alive while we migrate. Sidebar still works for them.
|
||||
|
||||
### Phase 2 — Wire data sources
|
||||
|
||||
- **Product card** reads from the dev container's `/workspace`:
|
||||
- File count + recent edits via `fs.list` against the project's
|
||||
dev container
|
||||
- User count from the project's auth provider (Pocketbase /
|
||||
Clerk / etc.)
|
||||
- Frontend URL from `dev_server.list` or production `apps_list`
|
||||
- **Infrastructure card** reads from Coolify databases, env vars,
|
||||
and known integrations:
|
||||
- Database type + size
|
||||
- Auth provider name
|
||||
- Wired services (any env var matching `STRIPE_*`, `RESEND_*`,
|
||||
etc.)
|
||||
- **Hosting card** reads from Coolify apps + domains + container metrics:
|
||||
- Production URL, SSL status, last deploy
|
||||
- Monthly cost (Coolify resource usage × pricing)
|
||||
- Recent error count (from logs)
|
||||
|
||||
### Phase 3 — Section detail pages
|
||||
|
||||
Build each of `/product`, `/infrastructure`, `/hosting` as a real,
|
||||
useful surface. Each page can have internal subnav for the bits
|
||||
listed in its charter (e.g., Product has Frontend, Backend, Jobs,
|
||||
Brand, Customers; Infrastructure has Auth, DB, Storage, Email,
|
||||
Payments, …).
|
||||
|
||||
### Phase 4 — Migration / deletion
|
||||
|
||||
Once the new structure is proven, redirect the legacy routes:
|
||||
|
||||
- `code` → `product`
|
||||
- `build` → `product/files`
|
||||
- `run` → `hosting`
|
||||
- `infrastructure` → `hosting`
|
||||
- `deployment` → `hosting/deploys`
|
||||
- `prd`, `tasks`, `assist` → `product/...`
|
||||
- `growth`, `grow`, `analytics`, `insights`, `mvp-setup` → soft-delete
|
||||
with a tombstone redirect to `product` or to a future section page.
|
||||
|
||||
---
|
||||
|
||||
## 7. Open questions
|
||||
|
||||
- **Where do the chat threads live?** They're a per-project
|
||||
conversation surface today (right rail in the chat panel). I'd
|
||||
argue they're not a section — they're *across* sections, like the
|
||||
AI is. Keep as the persistent right rail.
|
||||
- **Settings is technically project-level meta**, not one of the
|
||||
three sections. Where does it surface? Gear icon in the page
|
||||
header, opens settings as a side sheet or as a separate route.
|
||||
Decide when we get there.
|
||||
- **Mobile layout** — three cards stack vertically; no special
|
||||
layout needed. The section detail pages need a layout pass when
|
||||
we get to phase 3.
|
||||
|
||||
---
|
||||
|
||||
## 8. Success criteria
|
||||
|
||||
You should be able to look at `/project/<id>` after AI activity in
|
||||
chat and immediately see:
|
||||
|
||||
1. *"What did the AI just build?"* → Product card updated count of
|
||||
files + recent diffs.
|
||||
2. *"What's it depending on?"* → Infrastructure card shows the new
|
||||
Postgres, the new Stripe key, etc.
|
||||
3. *"Is it live?"* → Hosting card shows the dev preview URL or the
|
||||
production URL with status.
|
||||
|
||||
If any of those three answers requires going back to the chat or
|
||||
checking another page, the redesign hasn't worked.
|
||||
*(Refer to `vibn-frontend/app/[workspace]/project/[projectId]` for the UI implementation).*
|
||||
|
||||
@@ -1,258 +1,9 @@
|
||||
# Sentry-as-Product — Proposal
|
||||
# Sentry as a Product (Shipped)
|
||||
|
||||
> Today's Sentry wiring catches errors in **the Vibn platform**.
|
||||
> The bigger opportunity is wiring Sentry into **every project Vibn
|
||||
> ships**, then feeding those errors back into the user's AI chat.
|
||||
> Difference between "an AI that codes" and "an AI that owns the
|
||||
> product."
|
||||
> **Note:** This spec was implemented in May 2026.
|
||||
|
||||
## TL;DR
|
||||
|
||||
Today, when a Vibn user's deployed app crashes for real users:
|
||||
|
||||
```
|
||||
real user → site 500s → user closes tab, never tells founder
|
||||
→ founder finds out hours/days later (or never)
|
||||
→ AI in Vibn chat has zero idea anything is wrong
|
||||
```
|
||||
|
||||
The fix is to make every Vibn project ship with Sentry pre-wired,
|
||||
then expose the error feed to the AI as a tool. Total effort:
|
||||
**~8 hours**, in 4 stages, each independently shippable.
|
||||
|
||||
| Stage | Capability | Effort | Unlocks |
|
||||
|---|---|---|---|
|
||||
| 1 | Auto-provision a Sentry project per Vibn project on first deploy | ~3 hr | Real-user errors captured at all |
|
||||
| 2 | Bake Sentry into every scaffold template | ~2 hr | Capture works without user setup |
|
||||
| 3 | Add `project_recent_errors` MCP tool for the AI | ~2 hr | AI can answer "is anything broken?" |
|
||||
| 4 | Auto-surface unresolved errors at chat-turn start | ~1 hr | AI proactively offers fixes |
|
||||
|
||||
Total: **~8 hr**, no new infra (we already have Sentry org access,
|
||||
Coolify env API, scaffold templates, MCP tool registry).
|
||||
|
||||
---
|
||||
|
||||
## Why this is the right next investment
|
||||
|
||||
### The current loop is broken at the seam between user and platform
|
||||
|
||||
Vibn's value proposition is "the AI is your technical co-founder."
|
||||
That promise breaks the moment the AI's last commit causes a real
|
||||
user error and the AI doesn't know about it. The current loop:
|
||||
|
||||
```
|
||||
1. User describes feature in chat
|
||||
2. AI ships code
|
||||
3. AI says "deployed, give it a try"
|
||||
4. (silence)
|
||||
5. Real users hit edge cases → 500s → bounce
|
||||
6. Founder eventually notices via support ticket / analytics dip
|
||||
7. Founder pastes error back to AI
|
||||
8. AI fixes
|
||||
```
|
||||
|
||||
Steps 4–6 are dead air for the founder, **and the AI cannot help
|
||||
during them.** This is the gap that separates Vibn from "any IDE
|
||||
with an LLM."
|
||||
|
||||
### What it looks like with this proposal shipped
|
||||
|
||||
```
|
||||
1. User describes feature in chat
|
||||
2. AI ships code
|
||||
3. AI says "deployed, give it a try"
|
||||
4. Real users hit edge cases → 500s → Sentry captures
|
||||
5. (Founder opens Vibn chat 3 hrs later for unrelated reason)
|
||||
6. AI: "Hey — checkout has 500'd for 3 users in the last hour
|
||||
because `customer.email` is undefined on
|
||||
app/checkout/route.ts:47. Want me to fix it?"
|
||||
7. AI fixes, deploys, marks issue resolved in Sentry
|
||||
```
|
||||
|
||||
The AI becomes the on-call engineer. This is what "technical
|
||||
co-founder" actually means and we are 8 hours away from it.
|
||||
|
||||
### Why now (not Phase 4)
|
||||
|
||||
- The Sentry wiring we just shipped for vibn-frontend gave us:
|
||||
- A working Sentry org (`vibnai`)
|
||||
- An auth token with project-management scope
|
||||
- Verified knowledge that the build args / source maps flow works
|
||||
- A working `withSentryConfig` recipe in `vibn-frontend/next.config.ts`
|
||||
- All of those are reusable for stage 1 and 2 of this proposal.
|
||||
- Doing this **before** the beta means user projects start emitting
|
||||
error data on day one, so by the time we're debugging real beta
|
||||
user pain, we have a month of history to reason about.
|
||||
- Doing it after the beta means we'd have to retroactively
|
||||
instrument projects that have already been deployed for weeks.
|
||||
|
||||
---
|
||||
|
||||
## Stage 1 — Auto-provision a Sentry project per Vibn project (~3 hr)
|
||||
|
||||
**Goal:** when a user creates a Vibn project, the platform creates a
|
||||
matching Sentry project under the `vibnai` org and stashes the DSN
|
||||
+ auth token in Coolify env vars on the user's app.
|
||||
|
||||
**What gets built:**
|
||||
|
||||
1. **A `provisionSentryProject(projectId, name)` helper** in
|
||||
`vibn-frontend/lib/integrations/sentry.ts`. Calls Sentry's
|
||||
`POST /api/0/teams/vibnai/{team}/projects/` with the project
|
||||
slug, returns the DSN.
|
||||
2. **Hook into project-create flow** — on first successful deploy,
|
||||
call the helper and write the resulting DSN + auth token into
|
||||
Coolify env vars (`NEXT_PUBLIC_SENTRY_DSN`,
|
||||
`SENTRY_AUTH_TOKEN`) for that app via the same Coolify API we
|
||||
used today.
|
||||
3. **Idempotency** — if the Sentry project already exists, fetch
|
||||
its DSN instead of creating a duplicate. Same project name
|
||||
convention every time: `vibn-{workspace}-{projectSlug}`.
|
||||
4. **Storage** — store `sentryProjectSlug` and `sentryAuthTokenId`
|
||||
on the Postgres `projects` row so we can look them up later
|
||||
without re-walking the Sentry org.
|
||||
|
||||
**Risk:** Sentry's API rate-limits team-project creation. We bypass
|
||||
this by reading-before-writing, so the only API cost on subsequent
|
||||
deploys is one GET.
|
||||
|
||||
**Definition of done:** create a fresh Vibn project → check Sentry
|
||||
org → see a project named `vibn-{ws}-{slug}` → check Coolify env on
|
||||
that app → see DSN populated.
|
||||
|
||||
---
|
||||
|
||||
## Stage 2 — Bake Sentry into every scaffold template (~2 hr)
|
||||
|
||||
**Goal:** every Next.js / Vite / etc. starter template Vibn ships
|
||||
already has Sentry wired up. User does nothing.
|
||||
|
||||
**What gets built:**
|
||||
|
||||
1. **For each scaffold template in `vibn-frontend/lib/scaffold/`**,
|
||||
add the same files we shipped today:
|
||||
- `instrumentation.ts`
|
||||
- `instrumentation-client.ts`
|
||||
- `app/global-error.tsx` (Next.js) / equivalent boundary (Vite)
|
||||
- `next.config.ts` wrapped with `withSentryConfig` (Next.js)
|
||||
- `vite.config.ts` with `sentryVitePlugin` (Vite)
|
||||
- `Dockerfile` ARG declarations for `NEXT_PUBLIC_SENTRY_DSN` +
|
||||
`SENTRY_AUTH_TOKEN`
|
||||
2. **Add `@sentry/nextjs` (or `@sentry/react` + `@sentry/vite-plugin`)
|
||||
to each template's `package.json` `dependencies`.**
|
||||
3. **Document in template README** that Sentry is pre-wired and the
|
||||
user doesn't need to do anything.
|
||||
|
||||
**Risk:** Sentry's wrapper sometimes interacts badly with custom
|
||||
build configs (e.g. monorepos, custom webpack rules). Mitigation:
|
||||
the `errorHandler` we set today (`console.warn` instead of throw)
|
||||
ensures source map upload failures don't break builds.
|
||||
|
||||
**Definition of done:** scaffold a fresh Next.js project from Vibn
|
||||
templates → deploy → throw a test error → see it in Sentry,
|
||||
de-minified.
|
||||
|
||||
---
|
||||
|
||||
## Stage 3 — Expose error feed to the AI as MCP tools (~2 hr)
|
||||
|
||||
**Goal:** the AI can ask Sentry "what's broken in project X?" and
|
||||
get a real answer.
|
||||
|
||||
**What gets built:**
|
||||
|
||||
Three new MCP tools in `vibn-frontend/lib/ai/vibn-tools.ts`:
|
||||
|
||||
1. **`project_recent_errors { projectId, since?, limit? }`**
|
||||
- Returns: `[{ id, title, count, lastSeen, culprit, level }]`
|
||||
- Default `since`: 24h. Default `limit`: 10.
|
||||
- Filters to unresolved issues only.
|
||||
- Implementation: read `sentryProjectSlug` off the project row,
|
||||
call Sentry's `GET /api/0/projects/{org}/{slug}/issues/`.
|
||||
|
||||
2. **`project_error_detail { projectId, issueId }`**
|
||||
- Returns: `{ stacktrace, breadcrumbs, request, user, replay_url }`
|
||||
- Implementation: Sentry's `GET /api/0/issues/{id}/events/latest/`.
|
||||
|
||||
3. **`project_error_resolve { projectId, issueId }`**
|
||||
- Side-effect: marks the issue resolved in Sentry.
|
||||
- Used by the AI after it ships a fix and confirms via tests.
|
||||
- Implementation: Sentry's `PUT /api/0/issues/{id}/` with
|
||||
`status: "resolved"`.
|
||||
|
||||
**Auth:** token storage is per-project (from Stage 1's `projects`
|
||||
row). Each project's AI sees only its own project's errors. No
|
||||
cross-project leakage.
|
||||
|
||||
**Definition of done:** in a Vibn chat for a project with known
|
||||
errors, ask the AI "any errors lately?" → AI calls
|
||||
`project_recent_errors` → shows real list.
|
||||
|
||||
---
|
||||
|
||||
## Stage 4 — Auto-surface unresolved errors at chat-turn start (~1 hr)
|
||||
|
||||
**Goal:** the AI doesn't wait to be asked. When the user opens a
|
||||
chat and there are unresolved errors, the AI mentions them on the
|
||||
first turn.
|
||||
|
||||
**What gets built:**
|
||||
|
||||
In `vibn-frontend/app/api/chat/route.ts`, at the start of each chat
|
||||
turn (before calling the model):
|
||||
|
||||
1. Call the same `project_recent_errors` logic Stage 3 exposed.
|
||||
2. If `count > 0`, prepend a synthetic system message:
|
||||
|
||||
```
|
||||
[PROJECT HEALTH]
|
||||
{N} unresolved Sentry issues in the last 24 hours:
|
||||
- {title} (×{count}, last seen {time}) — {culprit}
|
||||
- ...
|
||||
|
||||
If the user's first message is unrelated to these, you may still
|
||||
proactively mention them: "Quick FYI before we get into that —
|
||||
{X} has been failing for users."
|
||||
|
||||
If their message IS about a broken thing, prefer the matching
|
||||
Sentry issue's stack trace over guessing.
|
||||
```
|
||||
|
||||
3. Only fire this once per N chat turns (configurable, default 1
|
||||
per session opening) — we don't want to spam every turn.
|
||||
|
||||
**Risk:** false alarms (Sentry issue from yesterday's deploy that
|
||||
no one cares about anymore) make the AI annoying. Mitigation:
|
||||
tighten the `since` window to the last 6h, and only surface issues
|
||||
with `count >= 2` (one-off errors don't count).
|
||||
|
||||
**Definition of done:** intentionally break a deployed user
|
||||
project, open chat, type "what's up?" → AI's first response
|
||||
mentions the issue, with file path.
|
||||
|
||||
---
|
||||
|
||||
## Out of scope for this proposal
|
||||
|
||||
- **User-owned Sentry orgs.** Some users will eventually want their
|
||||
own Sentry account, not the shared `vibnai` org. Ship-later;
|
||||
doesn't block the loop. Easy retrofit because storage is already
|
||||
per-project.
|
||||
- **Performance / Tracing data.** Sentry also captures spans /
|
||||
traces. Useful for "this endpoint is slow" but not the urgent
|
||||
product loop. Ship-later.
|
||||
- **Front-end UI for errors in Vibn.** A "Health" tab showing the
|
||||
Sentry feed in the Vibn UI is nice but not required for the AI
|
||||
loop to work. Ship-later.
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
Add a **Phase 2.9 (Sentry-as-product loop)** to `BETA_LAUNCH_PLAN.md`
|
||||
covering Stages 1–4 as a single bundle. Estimate: **8 hr engineering**.
|
||||
|
||||
This is the second-highest-leverage item still ahead of beta,
|
||||
behind only the deploy-failed webhook (which is 30 min). Every
|
||||
hour spent here directly upgrades the value of every other beta
|
||||
test session that follows it.
|
||||
## Architecture
|
||||
- Sentry is automatically provisioned for every new project (`lib/integrations/sentry.ts`).
|
||||
- Environment variables (`NEXT_PUBLIC_SENTRY_DSN` and `SENTRY_AUTH_TOKEN`) are injected into the Coolify app.
|
||||
- The AI has access to `project_recent_errors`, `project_error_detail`, and `project_error_resolve` MCP tools to automatically read, diagnose, and fix exceptions directly from the Sentry API.
|
||||
- If unhandled exceptions are firing, the AI is prompted at the start of a conversation to address them (`app/api/chat/route.ts`).
|
||||
|
||||
Reference in New Issue
Block a user