docs: heavily compress and simplify remaining reference files to represent current state

2026-05-07 15:07:31 -07:00
parent 3563b98de1
commit 057115a9fc
8 changed files with 58 additions and 2926 deletions
--- a/AI_CAPABILITIES.md
+++ b/AI_CAPABILITIES.md
@@ -1,904 +1,22 @@
-# Vibn AI Capabilities
-
-> The full set of actions an AI agent can take on behalf of a Vibn workspace,
-> along with the REST endpoints, MCP tools, and safety rails that back them.
->
-> **Audience:** agent authors, Cursor rule writers, MCP tool designers, and
-> anyone building on the Vibn control plane.
->
-> **Scope:** everything an agent sees through `https://vibnai.com/api/*` and
-> the `/api/mcp` bridge. No Firestore, no internal agent orchestration —
-> just the tenant-safe capability surface.
-
---
-
-## 1. Mental model
-
-Every capability in this document operates on a single **workspace**. A
-workspace is Vibn's tenant boundary and maps 1:1 to:
-
-| Vibn concept | External identity | Example (`mark`) |
-|---|---|---|
-| Workspace | `vibn_workspaces.slug` | `mark` |
-| Gitea org | `gitea_org` | `vibn-mark` |
-| Gitea bot user | `gitea_bot_username` | `mark-bot` |
-| SSH deploy keypair | `coolify_private_key_uuid` + `gitea_bot_ssh_key_id` | registered on both sides |
-| Coolify project | `coolify_project_uuid` | `vibn-ws-mark` |
-| Coolify environment | `coolify_environment_name` | `production` |
-| Domain namespace | `*.{slug}.vibnai.com` | `*.mark.vibnai.com` |
-| AI token | `vibn_sk_…` | one per agent/device |
-
-A single agent token can only act on the workspace it was minted for. Cross-
-workspace access is structurally impossible — enforced in
-[`lib/coolify.ts`](./vibn-frontend/lib/coolify.ts) by matching every Coolify
-resource's `environment_id` against the workspace's project environments
-(`ensureResourceInProject`).
-
-### The three views
-
-All capabilities roll up into three user-facing surfaces:
-
- **Code** — every Gitea repo under `vibn-{slug}/`.
- **Live** — every Coolify app/database/service in `vibn-ws-{slug}`, each
-  reachable under `*.{slug}.vibnai.com`.
- **IDE** — Browser-based agent workspace sessions (outside the scope of this doc).
-
---
-
-## 2. Authentication
-
-Every agent-facing endpoint accepts **either**:
-
- `Authorization: Bearer vibn_sk_<base64url>` — a workspace-scoped API key
-  minted in the settings panel. Stored as a sha256 hash server-side; the
-  plaintext is shown exactly once on creation. Can be revoked at any time.
- A NextAuth session cookie — used for the dashboard UI and for browser
-  debugging. Not suitable for long-running agents.
-
-Helper: [`requireWorkspacePrincipal()`](./vibn-frontend/lib/auth/workspace-auth.ts)
-resolves either to a `WorkspacePrincipal { workspace, user?, source }`.
-
-**403 on a tenant mismatch means:** the token is valid, but the resource
-belongs to another workspace. The agent should stop and ask the user.
-
---
-
-## 3. MCP surface
-
-The MCP bridge lives at `POST https://vibnai.com/api/mcp`. It takes
-JSON-over-HTTP bodies shaped like:
-
-```json
-{ "tool": "<tool-name>", "params": { /* tool-specific */ } }
-```
-
-The Cursor / Claude Desktop config block is auto-generated in the settings
-panel and looks like:
-
-```json
-{
-  "mcpServers": {
-    "vibn-mark": {
-      "url": "https://vibnai.com/api/mcp",
-      "headers": { "Authorization": "Bearer vibn_sk_…" }
-    }
-  }
-}
-```
-
-`GET /api/mcp` returns a self-description with the current tool list.
-Version: **2.1.0**.
-
-### 3.1 Workspace & identity tools
-
-| Tool | Purpose | Params |
-|---|---|---|
-| `workspace.describe` | Returns slug, Coolify project uuid, Gitea org, provision status. | — |
-| `gitea.credentials` | Returns the bot's username, PAT, clone URL template, and SSH remote template. Use this for every `git clone`/push — never other credentials. | — |
-
-### 3.2 Project tools
-
-| Tool | Purpose | Params |
-|---|---|---|
-| `projects.list` | Lists Vibn projects (PRDs, imports, etc.) in the workspace. | — |
-| `projects.get` | Single project details. | `{ projectId }` |
-
-### 3.3 Application tools
-
-| Tool | Purpose | Params |
-|---|---|---|
-| `apps.list` | All Coolify apps in the workspace. | — |
-| `apps.get` | Single app details (status, fqdn, domains, git info). | `{ uuid }` |
-| `apps.create` | Create a Coolify app. **Four pathways** — pick the one that matches your source. **(1) Gitea repo** (user's own code): pass `repo`. Clones over HTTPS+PAT; no SSH. **(2) Docker image** (pre-built single-container third-party app, e.g. `nginx:alpine`): pass `image`. **(3) Inline Docker Compose YAML** (custom multi-service stack): pass `composeRaw`. **(4) Coolify one-click template** (RECOMMENDED for popular apps — Twenty, n8n, Supabase, Ghost, etc): pass `template` with a slug from `apps.templates.search`. Templates have battle-tested env defaults, healthchecks, and `depends_on` graphs. **Use pathway 4 over pathway 3 whenever a template exists** — it is dramatically more reliable. Auto-domain `{name}.{slug}.vibnai.com` for all pathways. | **(1) repo:** `{ repo, branch?, name?, ports?, buildPack?, domain?, envs?, instantDeploy?, dockerComposeLocation?, dockerfileLocation?, baseDirectory? }` **(2) image:** `{ image, name?, ports?, domain?, envs?, instantDeploy? }` **(3) composeRaw:** `{ composeRaw, name?, domain?, envs?, instantDeploy? }` **(4) template:** `{ template, name?, domain?, envs?, instantDeploy? }` |
-| `apps.update` | PATCH a whitelisted set of fields (name, description, git branch/commit, ports, build commands, base directory, Dockerfile location, docker-compose location…). Returns `applied`, `ignored`, and `rerouted` arrays so the agent can see exactly what persisted; setting `fqdn`/`domains`/`docker_compose_domains` returns a `rerouted` entry pointing at `apps.domains.set`, and setting `git_repository` returns one pointing at `apps.rewire_git`. | `{ uuid, patch }` |
-| `apps.rewire_git` | Re-point an app's `git_repository` at the canonical HTTPS+PAT clone URL. Use to recover older apps that were created with SSH URLs, or to refresh a rotated bot PAT. | `{ uuid, repo? }` — `repo` optional; inferred from current URL if omitted |
-| `apps.delete` | Destroy the app. Volumes kept by default. | `{ uuid, confirm }` — `confirm` must equal the app's exact name |
-| `apps.deploy` | Trigger a new deployment. | `{ uuid, force? }` |
-| `apps.deployments` | List recent deployments + status. | `{ uuid }` |
-| `apps.logs` | Runtime logs for a running app. Compose-aware: returns per-service logs for `dockercompose` build packs, single stream for `dockerfile`/`nixpacks`. Includes container status and any diagnostic warnings. | `{ uuid, service?, lines? }` — `service` filter (compose only), `lines` default 200, max 5000 |
-| `apps.volumes.list` | List Docker volumes belonging to an app (name + size in bytes). Use before `apps.volumes.wipe` to know exact volume names. | `{ uuid }` |
-| `apps.volumes.wipe` | **Destructive / irreversible.** Stop all app containers, remove a specific volume, leave it ready for a fresh `apps.deploy`. Use to recover from stale DB state on first boot (the most common compose app failure). `confirm` must equal the exact volume name. | `{ uuid, volume, confirm }` |
-| `apps.containers.up` | Run `docker compose up -d` directly on the Coolify host for a compose app or service. Bypasses Coolify's queued-start worker (which routinely fails to actually invoke compose). Use after env or domain changes to recreate containers, or as a recovery path when `apps.create`/`apps.deploy` returned `started: false`. Idempotent — already-running containers are no-op'd. Up to 10 min timeout. Returns `{ ok, code, stdout, stderr, durationMs }`. | `{ uuid }` |
-| `apps.containers.ps` | `docker compose ps -a` against the rendered compose dir. Quick diagnostic for "why isn't my stack running?" — distinguishes `Created` (queued-start failure → use `apps.containers.up`), `Exited` (app crash → use `apps.logs`), `Restarting` (boot loop → use `apps.logs`), and `Up healthy/unhealthy`. | `{ uuid }` |
-| `apps.templates.list` | Browse the full Coolify one-click template catalog (320+ vetted apps: CRMs, AI tools, CMSes, dashboards, databases, …). Each entry is deployable via `apps.create({ template: <slug> })`. Returns `{ total, offset, limit, items: [{ slug, slogan, tags, port, documentation, logo }] }`. Catalog is fetched from upstream and cached for 1h. | `{ limit?, offset?, tag? }` — `limit` default 50, max 500; `tag` substring filter (e.g. `"crm"`, `"ai"`) |
-| `apps.templates.search` | Find templates by name, tag, or slogan. Ranked: exact-slug > slug-starts-with > slug-contains > tag-exact > tag-contains > slogan. Use this **before** `apps.create` to discover the right slug (e.g. `"twenty"`, `"n8n-with-postgres-and-worker"`, `"forgejo-with-postgresql"`). | `{ query, tag?, limit? }` — `limit` default 25, max 100. Either `query` or `tag` must be set |
-| `apps.exec` | Run a one-shot command inside an app container (via `docker exec` on the Coolify host). Compose-aware: pass `service` when the app has >1 container. Returns `{ container, service, code, stdout, stderr, truncated, durationMs, containerHealth }`. Default timeout 60s (max 10 min); default output cap 1 MB (max 5 MB). Command is run through `sh -lc` so shell syntax works. Use this for database migrations, seeds, CLI invocations, and ad-hoc debugging. Every call is audit-logged (command + target, not output). | `{ uuid, command, service?, user?, workdir?, timeout_ms?, max_bytes? }` |
-| `apps.domains.list` | Current domain set. | `{ uuid }` |
-| `apps.domains.set` | Replace the domain set. All entries must end with `.{slug}.vibnai.com`. Compose-aware: for `dockercompose` apps the domain is attached to a specific service (`server` by default; override with `service`). | `{ uuid, domains: string[], service? }` |
-| `apps.envs.list` | List env vars. Values returned are redacted for `shown-once` secrets. | `{ uuid }` |
-| `apps.envs.upsert` | Create or update an env var. `is_build_time` is **ignored** — Coolify derives build-vs-runtime from Dockerfile `ARG` usage. | `{ uuid, key, value, isPreview?, isMultiline?, isLiteral?, isShownOnce? }` |
-| `apps.envs.delete` | Delete an env var. | `{ uuid, key }` |
-
-### 3.4 Database tools
-
-| Tool | Purpose | Params |
-|---|---|---|
-| `databases.list` | All databases in the workspace, across all flavors. | — |
-| `databases.create` | Provision a database. Supported `type`: `postgresql`, `mysql`, `mariadb`, `mongodb`, `redis`, `keydb`, `dragonfly`, `clickhouse`. | `{ type, name?, isPublic?, publicPort?, image?, credentials?, limits? }` |
-| `databases.get` | Details + internal connection URL. | `{ uuid }` |
-| `databases.update` | PATCH name, public visibility, image, limits. | `{ uuid, patch }` |
-| `databases.delete` | Destroy the database. Volumes kept by default. | `{ uuid, confirm }` — `confirm` must equal the db's exact name |
-
-### 3.5 Auth provider tools
-
-Authentication is a first-class capability. An agent cannot spin up arbitrary
-Coolify services — only vetted auth providers from an allowlist.
-
-| Tool | Purpose | Params |
-|---|---|---|
-| `auth.list` | Auth providers currently deployed in the workspace (classified by Coolify's `service_type`). | — |
-| `auth.create` | Provision one of the allowed providers. | `{ provider, name?, description?, instantDeploy? }` |
-| `auth.delete` | Destroy an auth provider. Volumes (user data) kept by default. | `{ uuid, confirm }` — `confirm` must equal the service's exact name |
-
-**Allowed providers** (keys passed as `provider`):
-
- `pocketbase` — lightweight (SQLite) auth + data, single container.
- `authentik` — feature-rich self-hosted IDP.
- `keycloak` / `keycloak-with-postgres` — industry-standard OIDC/SAML.
- `pocket-id` / `pocket-id-with-postgresql` — passkey-first OIDC.
- `logto` — dev-first IDP.
- `supertokens-with-postgresql` — session/auth backend.
-
-Requesting anything outside this list returns 400 with a hint listing the
-allowed ones, so the agent can self-correct.
-
-### 3.6 Domain tools (P5.1 — custom apex domains)
-
-Custom apex domains are owned end-to-end by Vibn: the registrar is OpenSRS
-(Tucows), authoritative DNS is Google Cloud DNS in the Canadian project, and
-domains are pinned to the workspace that registered them. All four lifecycle
-steps — search, register, attach, inspect — are agent-callable.
-
-| Tool | Purpose | Params |
-|---|---|---|
-| `domains.search` | Check availability + price for one or more candidate apex domains via OpenSRS. Stateless; does not reserve anything. | `{ names: string[], period?: number }` — `names` up to 25, `period` in years (auto-bumped for quirky TLDs like `.ai` which requires 2y minimum). |
-| `domains.register` | Register a domain through OpenSRS. Registers unlocked; locking happens automatically after `domains.attach` completes. Idempotent per `(workspace, domain)`. | `{ domain, period?, whoisPrivacy?, contact, nameservers?, ca?: { cprCategory, legalType } }` — `ca.*` required for `.ca`. |
-| `domains.list` | List all domains owned by the workspace with their status, registrar order id, expiry, and DNS provider/zone. | — |
-| `domains.get` | Full record + last 20 lifecycle events. | `{ domain }` |
-| `domains.attach` | Wire a registered domain to a Coolify app (or arbitrary IP/CNAME): create Cloud DNS zone, write A/CNAME rrsets, update registrar-side nameservers, append FQDNs to the Coolify app's domain list. Idempotent; safe to retry. | `{ domain, appUuid? \| ip? \| cname?, subdomains?: string[] (default ["@","www"]), updateRegistrarNs? }` |
-
-### Object storage (GCS via S3-compatible HMAC)
-
-Every workspace gets a Canada-hosted GCS bucket, a dedicated service
-account, and an HMAC keypair so agent-built apps can use any AWS S3
-SDK. The HMAC *secret* is never returned through the API — it's written
-directly into Coolify apps via `storage.inject_env`.
-
-| Tool | Purpose | Params |
-|---|---|---|
-| `storage.describe` | Report the workspace bucket name, region, S3 endpoint, access-key id, and provision status. No secret returned. | — |
-| `storage.provision` | Idempotently create/reconcile the workspace's GCP service account, JSON keyfile, bucket (`vibn-ws-{slug}-{rand}`), IAM binding, and HMAC key. Safe to re-run. | — |
-| `storage.inject_env` | Push `STORAGE_*` env vars (endpoint, region, bucket, access key id, secret access key, force_path_style) into a Coolify app. The secret is written server-side with `is_shown_once=true`; it never transits the response body. | `{ uuid, prefix? }` — `prefix` defaults to `STORAGE_`; use `S3_` for apps that expect AWS-standard names |
-
-The bucket is S3-compatible: point any `aws-sdk` / `@aws-sdk/client-s3`
-/ `boto3` at `STORAGE_ENDPOINT` with `force_path_style=true` (`STORAGE_*`
-env vars are set by `storage.inject_env`).
-
-**Residency note:** Cloud DNS is global anycast — configuration is not
-Canadian-pinned at the storage layer. The workspace-level `dns_provider`
-flag (default `cloud_dns`) will let us swap in CIRA D-Zone for strict
-Canadian residency without touching the MCP surface.
-
-**Billing:** Every successful `domains.register` writes a `debit` row to
-`vibn_billing_ledger` with the OpenSRS order id as `ref_id`. The
-`vibn_domain_events` table keeps an append-only audit of every lifecycle
-call (`register.attempt`, `register.success`, `register.failed`,
-`attach.success`).
-
-**Verified end-to-end (2026-04-22)** against PROD GCP + OpenSRS sandbox +
-PROD Coolify (Coolify `v4.0.0-beta.473`); see
-`vibn-frontend/scripts/smoke-attach-e2e.ts`. **All 5 sub-systems green.**
-
- ✓ OpenSRS register against Horizon (sandbox) returns order id, response 200.
- ✓ Cloud DNS managed zone created in `master-ai-484822` with public anycast NS.
- ✓ A records (`@`, `www`) written to the zone.
- ✓ Registrar-side nameserver update accepts Cloud DNS NS values
-  (trailing-dot normalization in `lib/opensrs.ts`); sandbox returns 480
-  because its mock registry doesn't know real Google NS hosts, which is
-  expected — live mode talks to real registries that accept any resolvable NS.
- ✓ Unlock → update NS → relock fallback path verified (sandbox-recognized
-  nameservers return 200; the unlock/relock sequence is exercised when the
-  registry returns 405 lock-conflict).
- ✓ Coolify domain-list PATCH adds the apex + `www` to the application
-  `fqdn` column and the smoke test re-fetches it to confirm.
-
-> **Operational gotcha — the destination server must be proxy-enabled.**
-> Coolify's `update_by_uuid` controller accepts `domains` as a comma-separated
-> list and only maps it onto the model's `fqdn` column when the destination
-> server's `Server::isProxyShouldRun()` returns `true`. That helper requires
-> **both** `proxy.type ∈ {TRAEFIK, CADDY}` *and* `is_build_server = false`.
-> If either is misconfigured the PATCH returns 200 but the field is silently
-> dropped (Laravel mass-assignment ignores `domains` because it isn't in
-> `$fillable`, and the controller never copies it into `fqdn`). We hit this
-> on `coolify-server-mtl` (`zg4cwgc44ogc08804000gggo`), which had
-> `proxy=null` and `is_build_server=true`. Fixed by:
->
-> ```sql
-> UPDATE servers
->   SET proxy = jsonb_set(coalesce(proxy,'{}'::jsonb), '{type}', '"TRAEFIK"')
->   WHERE uuid = 'zg4cwgc44ogc08804000gggo';
-> UPDATE server_settings
->   SET is_build_server = false
->   WHERE server_id = (SELECT id FROM servers WHERE uuid = 'zg4cwgc44ogc08804000gggo');
-> ```
->
-> followed by `docker restart coolify` to clear Laravel's in-memory config.
-> Sending `fqdn` directly is **not** an alternative — the controller's
-> `$allowedFields` whitelist rejects it with 422 "This field is not allowed."
-
-### 3.7 Agent-side stdio MCP servers (`vibn-agent-runner`)
-
-Separate from the control-plane MCP at `/api/mcp` (which is what external
-agents call *into* Vibn), the `vibn-agent-runner` exposes its own in-house
-tool surface *outward* over stdio MCP. This lets Cursor, Claude Desktop,
-Goose, or any MCP-speaking client drive the same Coolify / Gitea / workspace
-tooling the Coder/PM/Marketing sub-agents use internally — with the same
-protected-repo and protected-app guardrails enforced centrally.
-
-Architecture: every tool now has three touch-points backed by one source of truth:
-
-```
-vibn-agent-runner/src/tools/<domain>-api.ts   ← pure, config-agnostic logic + security guards
-vibn-agent-runner/src/tools/<domain>.ts       ← thin registerTool() wrappers for the in-process agent loop
-vibn-agent-runner/src/mcp/<domain>-server.ts  ← stdio MCP server for external clients
-```
-
-| Server | Tools | Required env |
-|---|---|---|
-| `vibn-coolify-mcp` | 7 — list_projects, list_applications, deploy, get_logs, list_all_apps, get_app_status, deploy_app | `COOLIFY_API_URL`, `COOLIFY_API_TOKEN` |
-| `vibn-gitea-mcp` | 6 — create/list/close issues, list_repos, list_all_issues, read_repo_file | `GITEA_API_URL`, `GITEA_API_TOKEN`, `GITEA_USERNAME` |
-| `vibn-workspace-mcp` | 8 — read/write/replace/list/find/search_code, execute_command, git_commit_and_push | `WORKSPACE_ROOT` (+ Gitea creds for git push) |
-| `vibn-platform-mcp` | 7 — save_memory, list_memory, list_skills, get_skill, finalize_prd, get_prd, web_search | `SESSION_KEY` (optional), Gitea creds (for skills) |
-| `vibn-agent-mcp` | 2 — spawn_agent, get_job_status (dispatches into the runner's HTTP API) | `AGENT_RUNNER_URL` (defaults to `http://localhost:3333`) |
-
-Run locally with `npm run mcp:<name>` (or `:dev` via ts-node) in
-`vibn-agent-runner/`. Smoke-test any server with
-`node scripts/smoke-mcp.js <name>`. The in-process agent loop still sees
-the same 28 registered tools — no behavioral regression.
-
---
-
-## 4. REST surface
-
-Every MCP tool is also exposed as a plain HTTP endpoint under
-`/api/workspaces/{slug}/…`. Agents that prefer curl-style access can use
-these directly; the shape is identical to the MCP `params`. Auth is the
-same bearer header.
-
-### 4.1 Workspace & key management
-
-| Method | Path | Description |
-|---|---|---|
-| GET | `/api/workspaces` | All workspaces the principal has access to. |
-| GET | `/api/workspaces/{slug}` | Workspace details. |
-| POST | `/api/workspaces/{slug}/provision` | Idempotent re-run of Gitea org + bot + SSH keypair + Coolify project setup. |
-| GET | `/api/workspaces/{slug}/keys` | List API keys (metadata only). |
-| POST | `/api/workspaces/{slug}/keys` | Mint a new API key. Full token returned once. |
-| DELETE | `/api/workspaces/{slug}/keys/{keyId}` | Revoke a key. |
-| GET | `/api/workspaces/{slug}/gitea-credentials` | Return bot username, PAT (decrypted), clone/SSH templates. |
-| GET | `/api/workspaces/{slug}/bootstrap.sh` | Shell script that writes `.cursor/rules`, `.cursor/mcp.json`, `.env.local` into the cwd. |
-
-### 4.2 Applications
-
-| Method | Path | Description |
-|---|---|---|
-| GET | `/api/workspaces/{slug}/apps` | List apps. |
-| POST | `/api/workspaces/{slug}/apps` | Create an app from a workspace repo. |
-| GET | `/api/workspaces/{slug}/apps/{uuid}` | App details. |
-| PATCH | `/api/workspaces/{slug}/apps/{uuid}` | Update whitelisted fields. |
-| DELETE | `/api/workspaces/{slug}/apps/{uuid}?confirm=<exact-name>` | Destroy app. |
-| POST | `/api/workspaces/{slug}/apps/{uuid}/deploy` | Trigger deploy. |
-| GET | `/api/workspaces/{slug}/apps/{uuid}/deployments` | List deployments. |
-| GET | `/api/workspaces/{slug}/apps/{uuid}/domains` | List domains. |
-| PATCH | `/api/workspaces/{slug}/apps/{uuid}/domains` | Replace domain set. |
-| GET | `/api/workspaces/{slug}/apps/{uuid}/envs` | List env vars. |
-| PATCH | `/api/workspaces/{slug}/apps/{uuid}/envs` | Upsert env var(s). |
-| DELETE | `/api/workspaces/{slug}/apps/{uuid}/envs?key=FOO` | Delete env var. |
-| GET | `/api/workspaces/{slug}/deployments/{deploymentUuid}/logs` | Deployment logs. |
-
-### 4.3 Databases
-
-| Method | Path | Description |
-|---|---|---|
-| GET | `/api/workspaces/{slug}/databases` | List databases. |
-| POST | `/api/workspaces/{slug}/databases` | Create a database (8 flavors). |
-| GET | `/api/workspaces/{slug}/databases/{uuid}` | Database details + internal connection URL. |
-| PATCH | `/api/workspaces/{slug}/databases/{uuid}` | Update fields. |
-| DELETE | `/api/workspaces/{slug}/databases/{uuid}?confirm=<exact-name>` | Destroy database. |
-
-### 4.4 Auth providers
-
-| Method | Path | Description |
-|---|---|---|
-| GET | `/api/workspaces/{slug}/auth` | List deployed auth providers + the allowlist. |
-| POST | `/api/workspaces/{slug}/auth` | Provision a provider from the allowlist. |
-| GET | `/api/workspaces/{slug}/auth/{uuid}` | Provider details. |
-| DELETE | `/api/workspaces/{slug}/auth/{uuid}?confirm=<exact-name>` | Destroy provider. |
-
-### 4.5 Domains (P5.1)
-
-| Method | Path | Description |
-|---|---|---|
-| POST | `/api/workspaces/{slug}/domains/search` | Availability + pricing for up to 25 candidate names. |
-| GET | `/api/workspaces/{slug}/domains` | List workspace-owned domains. |
-| POST | `/api/workspaces/{slug}/domains` | Register a domain (idempotent per `(workspace, domain)`). |
-| GET | `/api/workspaces/{slug}/domains/{domain}` | Full record + last 20 events. |
-| POST | `/api/workspaces/{slug}/domains/{domain}/attach` | Create Cloud DNS zone, write records, update registrar NS, wire Coolify domain list. |
-
---
-
-## 5. Gitea surface
-
-AI agents **never** talk to the root Gitea admin token. They use the
-workspace's dedicated bot user.
-
-### 5.1 What the bot can do
-
- Fully own the `vibn-{slug}` org (added as the org's owner team).
- Read/write every repo in that org via its PAT.
- Push over SSH using the workspace's ed25519 deploy key (same keypair
-  Coolify uses to pull code).
- What it **cannot** do: touch any other org, the root admin surface, or
-  Gitea's `/admin/*` endpoints.
-
-### 5.2 How to get the bot credentials
-
-```http
-GET /api/workspaces/{slug}/gitea-credentials
-Authorization: Bearer vibn_sk_…
-```
-
-Returns:
-
-```json
-{
-  "bot": { "username": "mark-bot", "token": "…" },
-  "gitea": {
-    "apiBase": "https://git.vibnai.com/api/v1",
-    "host": "git.vibnai.com",
-    "cloneUrlTemplate": "https://mark-bot:{{token}}@git.vibnai.com/vibn-mark/{{repo}}.git",
-    "sshRemoteTemplate": "git@git.vibnai.com:vibn-mark/{{repo}}.git",
-    "webUrlTemplate": "https://git.vibnai.com/vibn-mark/{{repo}}"
-  },
-  "workspace": { "slug": "mark", "giteaOrg": "vibn-mark" }
-}
-```
-
-The PAT is stored **encrypted at rest** using AES-256-GCM with the
-`VIBN_SECRETS_KEY` server secret; the decrypt step runs only on this endpoint.
-
-### 5.3 Gitea operations via the standard Gitea API
-
-Once the agent has `{bot.token, gitea.apiBase}`, it can call any standard
-Gitea v1 endpoint as the bot, scoped to the workspace org. Common ones:
-
- `POST /orgs/{org}/repos` — create a repo.
- `PATCH /repos/{org}/{repo}` — update repo settings.
- `GET /repos/{org}/{repo}/contents/{path}` — read files.
- `PUT /repos/{org}/{repo}/contents/{path}` — write files (commits).
- `POST /repos/{org}/{repo}/pulls` — open PRs.
- `POST /repos/{org}/{repo}/branches` — create branches.
-
---
-
-## 6. Domain policy
-
-Every app gets an auto-generated domain under the workspace's namespace:
-
-```
-{app-slug}.{workspace-slug}.vibnai.com
-```
-
-For example, creating an app named `my-api` in workspace `mark` yields
-`my-api.mark.vibnai.com` automatically — no DNS config, no cert work,
-served by Coolify's wildcard Traefik.
-
-### 6.1 What agents can do
-
- Accept the auto-generated domain (default path).
- Replace the domain set via `PATCH /apps/{uuid}/domains`, provided every
-  entry ends with `.{workspace-slug}.vibnai.com`.
-
-### 6.2 What agents cannot do
-
- Point an app at a domain outside the workspace's namespace. The server
-  rejects this with 403 regardless of DNS state:
-
-  ```json
-  { "error": "Domain evil.com is not allowed; must end with .mark.vibnai.com",
-    "hint": "Use my-api.mark.vibnai.com" }
-  ```
-
-This is enforced by `isDomainUnderWorkspace()` in
-[`lib/naming.ts`](./vibn-frontend/lib/naming.ts).
-
-### 6.3 Custom (external) domains
-
-Not exposed to AI agents. A human can still add them through Coolify
-directly or through a future human-gated UI.
-
---
-
-## 7. Safety model
-
-### 7.1 Tenant enforcement
-
-Every resource-returning helper in `lib/coolify.ts` runs through
-`ensureResourceInProject()`. It:
-
-1. Trusts an explicit `project_uuid` on the resource if present, else
-2. Fetches the project's environment ids via `GET /projects/{uuid}` and
-   verifies the resource's `environment_id` is in that set.
-
-A token for `mark` that tries to read an app in `justine`'s project returns:
-
-```json
-{ "error": "Application <uuid> does not belong to project <mark-project-uuid>" }
-```
-
-with HTTP 403. Cross-workspace enumeration and access are not just
-discouraged — they fail at the helper level.
-
-### 7.2 Destructive operations
-
-Every delete endpoint requires `?confirm=<exact-resource-name>`:
-
-```
-DELETE /apps/{uuid}                → 409 "confirmation required"
-DELETE /apps/{uuid}?confirm=wrong  → 409 "confirmation required"
-DELETE /apps/{uuid}?confirm=my-api → 200 deleted
-```
-
-This means an agent hallucinating a delete call cannot cost you the
-resource — it must first know the exact name, which implies it just listed
-or just created it.
-
-**Volumes are kept by default** on delete. To also remove volumes, pass
-`?volumes=delete` (apps/dbs) — this is opt-in, per-call, never the default.
-
-### 7.3 Creation guardrails
-
- Apps can only be created from repos in the workspace's Gitea org.
- Auth providers can only be created from the allowlist (see §3.5).
- Database flavors are restricted to the 8 Coolify supports.
- Env var keys must match `/^[A-Z_][A-Z0-9_]*$/` (no shell-escape tricks).
-
-### 7.4 Secrets handling
-
- `VIBN_API_KEY` is only shown **once** on mint. Server keeps a sha256 hash.
- Gitea bot PATs are **encrypted at rest** (AES-256-GCM with
-  `VIBN_SECRETS_KEY`).
- The SSH private key is held by Coolify, not by Vibn; the public key is
-  pushed to the Gitea bot user's key list. Rotating is a re-provision.
- Agent prompts and Cursor rules include a "treat VIBN_API_KEY like a
-  password — never print or commit it" directive.
-
---
-
-## 8. Worked examples
-
-### 8.1 "Build me a Next.js app with a Postgres and Pocketbase auth"
-
-From the agent's side, using MCP:
-
-```json
-// 1. Ensure a repo exists in the workspace org (standard Gitea API,
-//    using the bot PAT from gitea.credentials).
-POST https://git.vibnai.com/api/v1/orgs/vibn-mark/repos
-{ "name": "my-site", "private": true, "auto_init": true }
-
-// 2. Create the Coolify app. Auto-domain my-site.mark.vibnai.com.
-{ "tool": "apps.create",
-  "params": { "repo": "my-site", "ports": "3000", "instantDeploy": false } }
-
-// 3. Provision a Postgres.
-{ "tool": "databases.create",
-  "params": { "type": "postgresql", "name": "app-db" } }
-// → returns { internalUrl: "postgres://…@<uuid>:5432/postgres" }
-
-// 4. Wire the db URL into the app as an env var.
-{ "tool": "apps.envs.upsert",
-  "params": { "uuid": "<app-uuid>", "key": "DATABASE_URL",
-              "value": "<internalUrl>" } }
-
-// 5. Deploy Pocketbase as the auth layer.
-{ "tool": "auth.create",
-  "params": { "provider": "pocketbase", "name": "auth" } }
-
-// 6. First real deploy.
-{ "tool": "apps.deploy", "params": { "uuid": "<app-uuid>" } }
-
-// 7. Poll.
-{ "tool": "apps.deployments", "params": { "uuid": "<app-uuid>" } }
-// → [{ uuid, status: "finished" | "in_progress" | "failed" | "queued" }]
-```
-
-The agent hands the user back `https://my-site.mark.vibnai.com`.
-
-### 8.2 "Add an `api` subdomain to my app"
-
-```json
-{ "tool": "apps.domains.set",
-  "params": {
-    "uuid": "<app-uuid>",
-    "domains": ["my-site.mark.vibnai.com", "api.mark.vibnai.com"]
-  } }
-```
-
-Valid — both end with `.mark.vibnai.com`. `evil.com` or `my-site.justine.vibnai.com`
-would return 403.
-
-### 8.3 "Delete the whole thing"
-
-Agent must learn the resource names first (or it'll hit the confirm gate):
-
-```json
-// Learn the name.
-{ "tool": "apps.get", "params": { "uuid": "<app-uuid>" } }
-// → { name: "my-site", ... }
-
-// Delete with matching confirm.
-{ "tool": "apps.delete",
-  "params": { "uuid": "<app-uuid>", "confirm": "my-site" } }
-```
-
-Wrong confirm returns `409 "Confirmation required"`.
-
---
-
-## 9. Error handling reference
-
-| Status | Meaning | What the agent should do |
-|---|---|---|
-| 400 | Bad request body (invalid JSON, missing required field, invalid type). | Fix the body, retry. |
-| 401 | No / bad bearer token. | Ask the user to mint a fresh key. |
-| 403 | **Tenant mismatch** — resource belongs to another workspace, domain outside workspace namespace, or repo not in workspace org. | **Stop.** Do not retry with guessed values. Ask the user. |
-| 404 | Resource not found (app/db/service/repo uuid wrong). | Re-list to find the right uuid. |
-| 409 | Delete confirmation missing or wrong. | Fetch the resource name first, then retry with `confirm=<name>`. |
-| 422 | Coolify validation failure (e.g. malformed domain). | Check the `details` field. |
-| 502 | Upstream Coolify/Gitea error. | Retry with backoff. |
-| 503 | Workspace not fully provisioned yet. | Call `POST /provision`, then retry. |
-
---
-
-## 10. Versioning
-
-The MCP descriptor at `GET /api/mcp` reports a semver `version`. Tool names
-are append-only within a major version — agents can cache the tool list
-safely for the duration of a conversation but should re-fetch on 404.
-
-Current version: **2.4.8**.
-
- **1.x** — session-cookie-only MCP, no tenant keys.
- **2.0** — `vibn_sk_…` keys, workspace-scoped Gitea bot + Coolify project.
- **2.1** — create/update/delete for apps, 8 database flavors, auth
-  provider allowlist, domain policy enforcement, confirm-gated deletes.
- **2.2** — per-workspace GCS object storage (`storage.*`), compose-aware
-  domain routing, runtime log tailing (`apps.logs`), in-container command
-  execution (`apps.exec`), and diagnostic `apps.update` responses.
- **2.3** — `apps.create` Docker-image and inline-composeRaw pathways (no
-  Gitea repo required for third-party apps), `apps.volumes.list` +
-  `apps.volumes.wipe` for self-service volume recovery.
- **2.4** — `apps.create` Coolify-template pathway (`{ template: "twenty" }`
-  etc.) for one-click deploy of 320+ vetted apps, plus `apps.templates.list`
-  / `apps.templates.search` for catalog discovery.
- **2.4.1** — `apps.containers.up` / `apps.containers.ps` to bypass Coolify's
-  unreliable queued-start worker. `apps.create` (template + composeRaw
-  pathways) now auto-falls-back to direct `docker compose up -d` over SSH
-  when Coolify's queue stalls, so a single `apps.create` call really does
-  leave a running stack.
- **2.4.2** — `apps.create` no longer reports `started: false` when only a
-  sidecar (worker / scheduler) failed its `depends_on: service_healthy`
-  gate. We now probe the host with `docker ps` after `compose up -d` and
-  return `started: true` whenever any container of the stack is running,
-  surfacing the compose stderr in `startDiag` so agents can decide whether
-  to re-run `apps.containers.up` later. This matches the real-world
-  behavior of slow-booting apps like Twenty (worker waits ~3 min for
-  twenty's healthcheck, exceeds compose's default depends_on timeout).
- **2.4.3** — Auto-attached stack containers to the `coolify` proxy network
-  after `compose up`, fixing Traefik 503s on third-party apps.
- **2.4.4** — Made the proxy-network attach selective (only `traefik.enable=true`
-  containers) to avoid DNS aliasing collisions where Twenty's `postgres`
-  hostname resolved to `coolify-db`.
- **2.4.5** — Architectural overhaul of `apps.create` for service templates.
-  We no longer run `docker compose up -d` over SSH as a deployment fallback
-  (that bypassed Coolify's compose generation, causing internal services to
-  land on the wrong networks). Instead `apps.create` now:
-  1. Calls Coolify's `start` and lets its queue do the full deploy
-     (volumes, internal networking, env interpolation, healthchecks).
-  2. Polls `service.applications[*].status` (the truthful per-app status
-     field — `service.status` itself routinely lies as
-     `starting:unknown` while containers are healthy).
-  3. Applies three surgical post-deploy fixes that Coolify's own
-     pipeline omits but its REST API does not expose:
-       - rewrites `SERVICE_FQDN_*` / `SERVICE_URL_*` in the rendered
-         `.env` so frontends that bake their backend URL into the SPA
-         bundle (Twenty's `SERVER_URL`, etc.) point at the real
-         custom domain instead of the auto-generated sslip.io URL;
-       - injects the missing
-         `traefik.http.services.<svc>.loadbalancer.server.port` label
-         (Coolify generates the routing rules but forgets the port,
-         so Traefik logs `error: port is missing` and returns 503);
-       - connects `coolify-proxy` to the project's Docker network
-         (Coolify writes a `caddy_ingress_network=<uuid>` hint label
-         but never actually runs `docker network connect`), then
-         force-recreates ONLY the public-facing container so the new
-         env+label apply, and restarts the proxy so Traefik
-         re-discovers.
-
-  The response shape gains:
-    - `reachable` — boolean, true when `https://<fqdn>` answers 2xx/3xx
-    - `appStatus` — the truthful per-application status from Coolify
-    - `postDeploy` — step-by-step diagnostic for each of the three fixes
-  The previous `started`/`startMethod`/`startDiag` fields are kept for
-  back-compat. Internal services (Postgres, Redis, worker) stay on
-  their isolated project network — fixing the `password authentication
-  failed` regression introduced in 2.4.4.
- **2.4.6** — Two fixes for transient Coolify queue lag observed in
-  2.4.5:
-  - **Polling no longer false-fails on early `exited` status.**
-    Coolify's queue worker can take 60-120s to dequeue a `start`
-    request; during that window `service.applications[*].status`
-    returns the stale `exited` (= "never started") state. Previously
-    we treated that as terminal failure after 90s. Now we require
-    *evidence of activity* (`starting:*` or `running:*` was seen at
-    least once) before treating subsequent `exited` reports as
-    terminal. Until activity is observed, the loop just keeps polling
-    up to the 8-min health timeout. Eliminates the case where
-    `apps.create` returned `started: false` on a stack that was
-    actually about to come up healthy.
-  - **`apps.repair`** — new tool. Re-runs the three post-deploy
-    patches (env rewrite, port label, proxy network attach + recreate
-    + proxy restart) against an existing service without recreating
-    it. Useful when a deploy succeeded mechanically but ended up
-    serving Traefik 503 or Mixed Content errors, or whenever a user
-    rotates a custom domain. Params: `{ uuid, fqdn, publicAppName,
-    port? }`. Returns `{ reachable, postDeploy: { steps }, probe }`.
- **2.4.7** — `applyCoolifyPostDeployFixes` now schedules the
-  `coolify-proxy` restart (step 5) as a fire-and-forget background
-  job (`(sleep 3 && docker restart coolify-proxy) &`) instead of
-  blocking on it synchronously. The proxy restart kills any in-flight
-  TCP connection through the gateway — including the very request
-  that's running `apps.repair` / `apps.create` — so doing it inline
-  caused the agent to see a curl framing error (exit 16) right when
-  the work was in fact succeeding. Now the SSH command returns within
-  ~50ms, the HTTP response is delivered, and Traefik re-discovers
-  labels ~3s later.
- **2.4.8** — Massive simplification of post-deploy logic. Coolify's
-  template engine is fully capable of generating correct Traefik
-  labels and `SERVICE_FQDN_<APP>` / `SERVICE_URL_<APP>` env vars **if
-  the URL passed to `setServiceDomains` includes the upstream port**
-  (the "Required Port" hint in Coolify's UI: `https://crm.example.com:3000`,
-  not `https://crm.example.com`). 2.4.5–2.4.7 were missing that
-  detail, which is why they had to re-write the `.env` and inject
-  the loadbalancer port label as a workaround.
-
-  In 2.4.8 `apps.create` reads `template.port` from the catalog and
-  passes `https://<fqdn>:<port>` to `setServiceDomains`. Coolify then:
-    - generates `traefik.http.services.<svc>.loadbalancer.server.port=<port>`
-      automatically;
-    - rewrites `.env` so `SERVICE_FQDN_<APP>=<fqdn>` and
-      `SERVICE_URL_<APP>=https://<fqdn>` (no sslip.io leak);
-    - keeps `SERVICE_FQDN_<APP>_<PORT>` magic placeholders correctly
-      pointed at the user's host:port.
-
-  All that's left is the one thing Coolify still skips: connecting
-  `coolify-proxy` to the resource's project Docker network. So
-  `applyCoolifyPostDeployFixes` is now ~30 lines (down from ~200) and
-  no longer SSH-runs an embedded Python script inside a
-  `python:3-alpine` container. The `CoolifyPostDeployResult.steps`
-  shape gains/keeps `proxyNetwork` + `proxyRestart` only; the old
-  `envRewrite` / `portLabel` / `recreate` step keys are removed.
-  `apps.repair` retains its API (`{ uuid, fqdn, publicAppName, port? }`)
-  but `port` is now informational only (not required for the helper
-  to function).
-
---
-
-## 11. Troubleshooting compose apps
-
-Most real-world app failures fall into a small number of patterns. The
-recipes below are the canonical diagnostic flow for an agent operating
-on behalf of a user.
-
-### 11.1 "Deployment succeeds but the app keeps restarting"
-
-Agents should NOT trust Coolify's deployment status alone. A successful
-build + healthcheck-pending response usually means the containers came
-up but the app logic is crashing. Investigate with:
-
-1. `apps.logs { uuid, lines: 300 }` — look for `warnings` (empty
-   services indicate containers never ran) and per-service stderr.
-2. If the logs show repeated DB errors like `relation "xxx" does not
-   exist` or `pq: no such table`, the app skipped its migration step.
-   This is common for Docker Compose apps whose `server` service only
-   runs migrations on a separate `worker` command.
-3. Run the app's migration CLI via `apps.exec`, e.g. for Twenty:
-
-   ```json
-   {
-     "action": "apps.exec",
-     "params": {
-       "uuid": "<app-uuid>",
-       "service": "server",
-       "command": "yarn command:prod database:migrate:prod",
-       "timeout_ms": 300000
-     }
-   }
-   ```
-
-4. Re-check logs — errors should be gone. Then `apps.deploy` (or just
-   wait for the next restart) and verify the container reports
-   `healthy`.
-
-### 11.2 "`apps.update` returned success but nothing changed"
-
-Check the `applied` / `ignored` / `rerouted` arrays in the response.
-The most common reroutes:
-
- `fqdn`, `domains`, `docker_compose_domains` → use `apps.domains.set`.
- `git_repository` → use `apps.rewire_git` (rewrites the clone URL with
-  the workspace's Gitea PAT embedded).
- `build_pack` — changing this mid-life for an existing app is not
-  supported. Recreate the app.
-
-### 11.3 "Compose app is up but the domain 502s"
-
-Coolify's API treats compose and single-container apps differently:
-compose apps use `docker_compose_domains` (array of `{name, domain}`),
-single-container apps use `domains` (comma-separated string).
-`apps.domains.set` handles both, but if you're seeing a 502:
-
-1. `apps.domains.list { uuid }` — confirm the domain is actually
-   attached to a **service** (not just the app).
-2. `apps.exec { uuid, service: "server", command: "nc -vz localhost <port>" }`
-   — verify the upstream container is listening.
-3. `apps.logs { uuid, service: "server", lines: 200 }` — look for
-   startup errors like `EADDRINUSE` or config failures.
-
-### 11.4 "Choosing the right `apps.create` pathway"
-
-| Situation | Use |
-|---|---|
-| User's own code lives in their Gitea org | `repo` (pathway 1) |
-| Single-container third-party app (nginx, redis, a docker image) | `image` (pathway 2) |
-| Custom multi-service stack (no upstream template exists) | `composeRaw` (pathway 3) |
-| **Popular third-party app (Twenty, n8n, Supabase, Ghost, Wordpress, …)** | **`template` (pathway 4) — strongly preferred** |
-
-**Always check `apps.templates.search { query: "<app name>" }` first.** Coolify ships 320+ vetted one-click templates. Each one has tested env defaults, healthchecks, `depends_on` graphs, and the right volume mounts. The same app deployed via `composeRaw` will hit application-specific quirks (URL validation, DB bootstrap order, secret generation) that the template author already solved.
-
-**Never** create a Gitea repo just to host a third-party app's compose file.
-
-**Recipe — deploying any popular app in 3 calls:**
-
-```json
-// 1. Find the right template slug
-{ "action": "apps.templates.search", "params": { "query": "twenty" } }
-// → { "items": [{ "slug": "twenty", "slogan": "Twenty is a CRM…", "tags": ["crm","self-hosted"], "port": 3000 }] }
-
-// 2. Deploy it
-{ "action": "apps.create", "params": { "template": "twenty", "name": "crm" } }
-// → { "uuid": "...", "domain": "crm.<slug>.vibnai.com", "started": true,
-//      "note": "First boot may take 1-5 min while Coolify pulls images and runs migrations." }
-
-// 3. Watch it come up
-{ "action": "apps.logs", "params": { "uuid": "...", "lines": 200 } }
-```
-
-For `composeRaw` (only when no template exists), fetch the app's official `docker-compose.yml` (from GitHub/DockerHub) and pass it inline. Override any hard-coded image tags with pinned versions for reproducibility.
-
-**Browsing the catalog** with `apps.templates.list { tag: "ai" }` returns all AI/ML templates; `{ tag: "crm" }` returns CRMs; etc. Useful when the user asks "what self-hosted analytics tools can I deploy?" or similar open-ended questions.
-
-### 11.5 "Compose app fails on second+ deploy — relation/table does not exist"
-
-Classic stale volume problem. Sequence of events:
-1. First deploy: Postgres starts and auto-creates an empty `default` database (from `POSTGRES_DB` env var)
-2. App server starts, tries to `CREATE DATABASE` or `DROP DATABASE` inside a transaction → Postgres rejects it
-3. Deploy fails, containers stop — but the volume persists with the half-initialized DB
-4. Second deploy: Postgres finds existing data, skips init — but schema is corrupt/incomplete
-5. Server errors cascade forever
-
-**Fix:**
-
-```json
-// Step 1: find the volume
-{ "action": "apps.volumes.list", "params": { "uuid": "<app-uuid>" } }
-// → { "volumes": [{ "name": "abc123_db-data", "sizeBytes": 8192 }] }
-
-// Step 2: wipe it
-{ "action": "apps.volumes.wipe", "params": { "uuid": "<app-uuid>", "volume": "abc123_db-data", "confirm": "abc123_db-data" } }
-
-// Step 3: redeploy clean
-{ "action": "apps.deploy", "params": { "uuid": "<app-uuid>" } }
-```
-
-If Postgres still auto-creates the database before the app server runs migrations, use `apps.exec` to drop it outside a transaction:
-
-```json
-{ "action": "apps.exec", "params": { "uuid": "<app-uuid>", "service": "db", "command": "psql -U postgres -c 'DROP DATABASE IF EXISTS \"default\";'" } }
-```
-
-Then redeploy.
-
-### 11.7 "Healthcheck times out on first deploy"
-
-Docker Compose healthchecks have a `start_period` grace window. Apps
-that run long-running migrations on first boot (Twenty, Directus,
-older Strapi versions) need a `start_period` that covers the cold
-start, typically 120–600s.
-
- Fix at the compose level: edit the repo's `docker-compose.yml` to
-  set `healthcheck.start_period: 300s` on the affected service, commit,
-  push, `apps.deploy`.
- Alternatively, handle migrations out-of-band via `apps.exec` and let
-  the default healthcheck succeed instantly.
-
-### 11.8 "I can't tell what's inside the container"
-
-`apps.exec` is the escape hatch. Useful shell one-liners:
-
-| Goal | Command |
-|---|---|
-| List running processes | `ps -ef` |
-| Show env vars | `env \| sort` |
-| Check file exists | `ls -la /path/to/file` |
-| Test DB connection | `nc -vz postgres 5432` or `psql $POSTGRES_URL -c 'select 1'` |
-| Tail an app's internal log | `tail -200 /var/log/app.log` |
-| Run a framework CLI | `yarn <script>`, `npm run <script>`, `python manage.py <cmd>` |
-| Inspect filesystem diff vs image | `find /app -newer /tmp/marker -type f 2>/dev/null` |
-
-Output is capped at 1 MB by default (bump with `max_bytes`). Commands
-that could exceed the wall-clock timeout should bump `timeout_ms`
-(max 600000 = 10 minutes).
-
-### 11.9 "The agent wants to run something interactively"
-
-It can't. `apps.exec` is strictly non-interactive: no TTY, no stdin,
-no session resumption. For migrations and CLI invocations this is the
-right shape. For genuinely interactive work (a debug shell), the
-operator needs SSH + `docker exec -it` directly — outside the
-platform's AI surface.
-
---
-
-## 12. Where to look in the code
-
- `lib/auth/workspace-auth.ts` — `requireWorkspacePrincipal`, the gate.
- `lib/auth/secret-box.ts` — AES-256-GCM encryption of Gitea PATs.
- `lib/workspaces.ts` — `ensureWorkspaceProvisioned` (the idempotent setup).
- `lib/gitea.ts` — Gitea client (orgs, users, PATs, SSH keys).
- `lib/coolify.ts` — Coolify client, tenant helpers, all resource CRUD.
- `lib/coolify-ssh.ts` — SSH transport for tools that need host-level
-  docker access (`apps.logs`, `apps.exec`). Uses a dedicated
-  `vibn-logs` user on the Coolify host with docker-group membership
-  and no shell.
- `lib/coolify-containers.ts` — container enumeration + service
-  resolution, shared between logs and exec paths.
- `lib/coolify-logs.ts` — compose-aware log tailing.
- `lib/coolify-exec.ts` — one-shot `docker exec` over SSH with
-  timeout, output caps, and audit logging.
- `lib/naming.ts` — domain policy, slugify, SSH URL templates.
- `lib/ssh-keys.ts` — ed25519 keypair generation + OpenSSH formatting.
- `app/api/workspaces/[slug]/…` — REST surface.
- `app/api/mcp/route.ts` — MCP dispatcher and tool implementations.
- `components/workspace/WorkspaceKeysPanel.tsx` — settings UI.
+# Vibn AI Capabilities (Condensed)
+
+> **Note:** The definitive, ground-truth list of AI capabilities and instructions is maintained in the codebase at `vibn-frontend/lib/ai/vibn-tools.ts`. 
+
+## Core Architecture
+Vibn uses an MCP (Model Context Protocol) adapter to expose backend systems to the AI.
+The primary systems are:
+1. **Coolify:** For orchestrating Docker containers, PostgreSQL databases, reverse proxies (Traefik), and deploying third party apps.
+2. **Gitea:** For hosting source code and managing repositories.
+3. **Dev Containers:** Persistent, per-project Docker environments (`vibn-dev-*`) where the AI can read, write, and execute code interactively before shipping.
+
+## Tool Categories
+- **Workspace & Identity:** Retrieve Gitea credentials and workspace metadata.
+- **Projects & Planning:** Create projects, read/write objective documents (`plan_vision_set`), manage tasks, log decisions.
+- **File System (`fs_*`):** Read, write, edit (with line-number granularity), grep, and tree codebase directories.
+- **Shell (`shell_exec`):** Run terminal commands inside the dev container (e.g. `npm install`).
+- **Dev Servers (`dev_server_*`):** Spin up background processes (like `npm run dev`), view their logs, and return live Preview URLs (`*.preview.vibnai.com`) backed by Traefik.
+- **Apps & Databases:** Create, list, configure, and delete Coolify applications and databases.
+- **Domains & Auth:** Manage DNS records via OpenSRS and deploy auth providers (NextAuth, Supabase, etc).
+- **GitHub & Web (`github_*`, `http_fetch`):** Source open-source reference material, read documentation, and import repositories.
+
+*Refer to the system prompt in `vibn-frontend/app/api/chat/route.ts` for exact rules on how the AI should behave.*
--- a/BETA_LAUNCH_PLAN.md
+++ b/BETA_LAUNCH_PLAN.md
@@ -73,14 +73,6 @@ a slow loop until this lands.

 | # | Task | Owner | Effort | Status |
 |---|---|---|---|---|
-| 1.1 | Sign up for Cloudflare; add `vibnai.com`; verify imported records (MX, SPF, wildcard A, apex A) | Mark | 15 min | ✓ done |
-| 1.2 | Switch Namecheap nameservers to Cloudflare-assigned NS pair | Mark | 2 min | ✓ done |
-| 1.3 | Wait for propagation; verify `dig @1.1.1.1` from multiple resolvers | AI | 30–120 min | ✓ done — `34.19.250.135` from CF + Google resolvers |
-| 1.4 | Generate Cloudflare API token (DNS edit, `vibnai.com` only) | Mark | 2 min | ✓ done — stored in `.coolify.env` |
-| 1.5 | Configure Traefik Let's Encrypt DNS-01 with the Cloudflare token | AI | 20 min | ✓ done — `letsencrypt-dns` resolver wired in `coolify-proxy` |
-| 1.6 | Test wildcard cert issues for `*.preview.vibnai.com` (curl, browser) | AI | 10 min | ✓ done — both `*.vibnai.com` and `*.preview.vibnai.com` certs issued; `curl https://test.preview.vibnai.com` returns valid LE cert |
-| 1.7 | Wire `dev_server.start` to mint Traefik labels with the wildcard host | AI | 1 hr | ✓ done — pre-baked labels for ports 3000–3009 in `vibn-dev` compose; YAML escape bug fixed; cert resolver fixed to `letsencrypt-dns` |
-| 1.8 | Spike: WebSocket / Vite HMR through Traefik against `vibn-dev` container | AI | 30 min | ✓ done — `101 Switching Protocols`, `vite-hmr` subprotocol negotiated, `js-update` messages fire within ~1s of file edit. See verified config below. |

 **Definition of done:** ✅ AI says "open a Vite dev server", user clicks the URL,
 sees Vite's welcome page, edits a file via `fs.edit`, change appears in
@@ -111,13 +103,6 @@ server: {
 |---|---|---|---|---|
 | 2.1 | Reproduce + diagnose `ERR_HTTP_HEADERS_SENT` from prod logs | AI | 1–2 hrs | Likely a server action / API route returning twice |
 | 2.2 | Reproduce + diagnose `TypeError: reading 'z'/'j'/'aa'` in prod bundle | AI | 1–2 hrs | Minified prod error; suspect `react-markdown` server/client boundary |
-| 2.3 | Wire Sentry (or alternative) for both client + server runtime errors | AI | ✓ done 2026-05-01 | `@sentry/nextjs` v10 wired in `vibn-frontend`. `instrumentation.ts` (server+edge), `instrumentation-client.ts` (browser w/ Session Replay free tier, all text masked), `app/global-error.tsx`, `next.config.ts` wrapped with `withSentryConfig`. `NEXT_PUBLIC_SENTRY_DSN` and `SENTRY_AUTH_TOKEN` in Coolify env, with matching `ARG` lines in `vibn-frontend/Dockerfile`. End-to-end verified via `/sentry-example-page` 2026-05-01: client + server errors capture, breadcrumbs work, **stack traces de-minify to real filenames** (`app/sentry-example-page/page.tsx:49`). |
-| 2.4 | Wire deployment-failed Coolify webhook → Slack/email | AI | ✓ done 2026-05-01 | Slack webhook wired into `slack_notification_settings` for both Coolify teams. Defaults: failure events on (deploy, backup, scheduled task, docker cleanup, server unreachable, disk usage), success events off. Tested with a manual webhook ping — confirmed in user's Slack. |
-| 2.5 | Tighten Coolify docker prune to every 6 hrs (vs daily) | AI | ✓ done 2026-05-01 | Already configured: both servers use `docker_cleanup_frequency: "0 */6 * * *"` with `force_docker_cleanup: true`. Verified via `/api/v1/servers`. |
-| 2.6 | Bake `HEALTHCHECK 127.0.0.1` into `vibn-frontend/Dockerfile` so future apps inherit | AI | ✓ done 2026-05-01 | Already in `vibn-frontend/Dockerfile:67-68`; comment explains the IPv6 trap |
-| 2.7 | Audit other Dockerfile-based apps for the same `localhost`/IPv6 trap | AI | ✓ done 2026-05-01 | Audited `vibn-dev/Dockerfile` and `vibn-agent-runner/Dockerfile` — neither defines a HEALTHCHECK, so neither can hit the localhost/IPv6 trap. No action needed today; revisit when either gets a healthcheck added. |
-| 2.8 | **Tool-error recovery middleware** (AI_HARNESS_GAPS.md §1) — pattern-match known-recoverable tool errors and inject synthetic instructions before the model's next round | AI | ✓ done 2026-05-01 | `vibn-frontend/lib/ai/error-recovery.ts`. Initial rules: orphan container conflict, image pull denied, port allocated. Wired into `app/api/chat/route.ts` tool-result loop. |
-| 2.9 | **Sentry-as-product loop** (SENTRY_AS_PRODUCT.md) — auto-provision per-project Sentry, bake into scaffolds, expose error feed to AI as MCP tools, auto-surface unresolved errors at chat-turn start | AI | ✓ done 2026-05-01 | All 4 stages shipped: (1) `lib/integrations/sentry.ts` provisions per-project Sentry under shared `vibnai` org from `POST /api/projects/create` and lazily on `apps.create`; injects `NEXT_PUBLIC_SENTRY_DSN` + `SENTRY_AUTH_TOKEN` into Coolify app env. (2) `lib/scaffold/sentry-snippets.ts` ships canonical Next.js + Vite snippets; AI system prompt instructs it to wire Sentry on every new app; `projects.get` returns `sentry: {slug, dsn}`. (3) Three MCP tools: `project_recent_errors`, `project_error_detail`, `project_error_resolve` (tenant-safe). (4) `app/api/chat/route.ts` injects `[PROJECT HEALTH]` block at chat-turn start when ≥2-occurrence unresolved issues exist in last 6h. End-to-end verification deferred to smoke test (4.1). |

 **Definition of done:** force-fail a route in staging → Sentry alert lands in
 < 1 min. Force-fail a Coolify deploy → notification fires. Reproduce an
@@ -136,13 +121,9 @@ or gets out of the way. No screens that exist "to teach the data model".
 | 3.1 | **Hosting tab rewrite** — focus on the domain (live URL, redeploy, env, logs) instead of master-detail of "live + previews" | AI | 4 hrs | Mark flagged earlier |
 | 3.2 | Replace the chat's "⚠️ Failed to get response. Please try again." with structured errors that show what tool failed and why | AI | 2 hrs | Critical — currently zero feedback |
 | 3.3 | Empty states across Plan/Product/Infrastructure/Hosting that suggest the **next** AI prompt to try (not just "nothing here") | AI | 2 hrs | Vibe coders need a nudge |
-| 3.4 | Project header URL chips: collapse to a "+N" pill when there are >3 endpoints | AI | ✓ done 2026-05-01 | `components/project/project-header-urls.tsx`: bumped MAX_VISIBLE to 3, replaced title-tooltip with click-to-open popover (closes on outside-click + Escape). Each row in the popover is a real clickable link with icon + label + host. |
-| 3.5 | Status pill: tooltip should link directly to Coolify build logs | AI | ✓ done 2026-05-01 | `components/project/project-stage-pill.tsx`: "Logs" affordance now appears on `deploying`, `down`, and `build_failed` (not just failures). Deep-links to `<COOLIFY_URL>/project/<coolifyProjectUuid>` — one click from build logs. (Direct deployment-uuid link blocked on extending anatomy to surface deployment UUIDs; tracked but low priority.) |
 | 3.6 | Product tab: confirm it's actually useful day-to-day. Revise scope if not | Mark + AI | 1 hr | Open question |
 | 3.7 | **Scope-doc upload in Plan tab** — drop a PDF/.md/.docx/.txt as the project brief; server extracts text, stores on `fs_projects.brief_text` + `brief_meta`, exposes via `[PROJECT BRIEF]` block in system prompt and a `project_brief` MCP tool for on-demand grep. New file: `lib/integrations/brief-extract.ts`. Empty state replaces "nothing here" on Plan. | AI | 3 hrs | Came up during smoke test prep — users will arrive with scope docs (PDF/Notion-export/Doc); right now there's no way to hand the AI the source of truth except paste-into-chat. |
 | 3.8 | **"Stop at something tangible" — three layers** | AI | partially done | Came up watching Manifest scaffold — AI stopped at "everything is wired together" with no preview, leaving the user to wonder if any of it was real. Code on disk is invisible; preview URL is the proof. |
-| 3.8a | System-prompt rule: dedicated "Stop at something the user can see" section + tightened build-me-X recipe so `previewUrl` is the explicit stopping point | AI | ✓ done 2026-05-04 | `app/api/chat/route.ts` `buildSystemPrompt`. For multi-service stacks, instructs AI to start the user-facing service first even if other services aren't done. |
-| 3.8b | ~~Persistent quick-action chips above the chat input~~ **REVERTED 2026-05-04** | AI | reverted | Tried it; pulled it. The chip menu was prescriptive ("here's what to type") which conflicts with the principle that the AI should drive toward the goal without presenting the user a menu of homework. Welcome-screen suggested prompts kept (different context — empty conversation, user genuinely needs a starting nudge). The `sendMessage(override)` refactor + welcome-screen auto-send shipped from this work survived; only the composer chip row was removed. |
 | 3.8c | Server-side enforcement: if a turn called `fs_write` ≥10 times for source files but never `dev_server_start` or `apps_deploy`, append a synthetic recovery instruction telling the model to either start a server or explain the blocker | AI | 1 hr | Safety net for when the model ignores the prompt rule under load. Add a tracker in `app/api/chat/route.ts` tool loop, fire the instruction inside the round 2 system message. |

 **Definition of done:** a stranger lands on every tab in turn. None of them
@@ -160,10 +141,8 @@ concrete next action.
 |---|---|---|---|---|
 | 4.1 | End-to-end smoke test on a fresh account: signup → workspace → project → first chat → first preview → first deploy | Mark + AI | 2 hrs | Walk through with an empty cookie jar; fix everything broken. **Runbook below.** |
 | 4.2 | Landing page at `vibnai.com` that explains the product in 30s | Mark + AI | 4 hrs | Currently a login screen |
-| 4.3 | "Delete project" UI in project settings (and underlying Coolify cleanup) | AI | ✓ done 2026-05-04 | `app/api/projects/delete/route.ts` now cascades: stops + deletes the dev container service (with volumes + docker-cleanup), deletes every linked Coolify resource via `fs_project_resources`, deletes the per-project Coolify project shell when no other Vibn project shares it, drops `fs_project_dev_containers` + `fs_project_resources` rows, unlinks `fs_sessions`, then deletes `fs_projects`. Gitea repo + Sentry project are deliberately preserved (returned in the response so the user can recover code/error history). Failure inside cascade is logged but doesn't abort; partial failure leaves the orphan in Coolify for manual cleanup, which is strictly better than rolling back to a half-state. Smoke test 2026-05-04 found 2 ghost containers from previously-deleted projects consuming the user's full quota; cleaned up manually + shipped this fix to prevent recurrence. |
 | 4.4 | "Delete workspace" UI — same | AI | 1 hr | |
 | 4.5 | Auth hardening pass: NextAuth session expiry, CSRF on mutating routes, GitHub OAuth scope review | AI | 2 hrs | |
-| 4.6 | Per-workspace compute quota: max N Coolify projects, max N dev containers, soft cap with friendly error | AI | ✓ done 2026-05-01 | `lib/quotas.ts`: 3 active projects + 3 active dev containers per workspace (suspended containers don't count). Overridable via `VIBN_QUOTA_MAX_PROJECTS_PER_WORKSPACE` / `VIBN_QUOTA_MAX_DEV_CONTAINERS_PER_WORKSPACE` env. Hits return HTTP 402 with structured payload; AI's error-recovery middleware has a `workspace-quota-exceeded` rule that explains the cap to the user without blind retries. Wired into `POST /api/projects/create` and `lib/dev-container.ts` ensure/resume paths. |
 | 4.7 | Per-workspace audit log of mutating MCP calls (apps/databases/services create/delete) | AI | 2 hrs | We need this when something goes wrong |
 | 4.8 | Invite link / waitlist page (manual approval) so we control who joins | Mark + AI | 1 hr | |

@@ -179,13 +158,11 @@ that aren't covered above.

 | # | Task | Owner | Effort | Notes |
 |---|---|---|---|---|
-| 5.1 | Build `ghcr.io/vibnai/vibn-dev:latest` on the live Coolify host (`ssh + setup-on-coolify.sh`) | AI | ✓ done 2026-05-01 | Image `vibn-dev:latest` built 2026-04-30 on Coolify host (589 MB, last Dockerfile change Apr 28 so build is current). Smoke-tested as `vibn` user: ripgrep, git, mise all functional. Toolchains install on demand via mise. |
 | 5.2 | Hard-remove `gitea_file_*` from the AI tool list; keep REST routes alive 30 days with deprecation header | AI | 1 hr | Path B week 3 task |
 | 5.3 | Update `AI_CAPABILITIES.md` to reflect everything that shipped | AI | 1 hr | |
 | 5.4 | Eval harness: 10 reference prompts, measure time-to-first-preview, time-to-shipped, tool-call count, success rate | AI | 1–2 days | The actual proof Path B works |
 | 5.5 | Theia / openvscode-server toggle: "Open IDE" button in chat → `https://ide-{ws}-{project}.vibnai.com` | AI | 4 hrs | Week 4 nice-to-have; gates the "user becomes developer" graduation |
 | 5.6 | Idle-suspend cron — wire `POST /api/admin/path-b/idle-sweep` to a 5-min schedule once we trust it | AI | 30 min | Keeps cost bounded |
-| 5.7 | **Persistent dev container ↔ Gitea wiring** — auto-clone repo into `/workspace/<slug>/` on first chat turn; auto-commit + push at end of every turn so AI work surfaces in the Product tab without manual `gitea_*` calls | AI | ✓ done 2026-05-04 | `lib/dev-container-git.ts` (`ensureProjectRepoCloned`, `commitAndPushIfDirty`) wired into `app/api/chat/route.ts` pre-loop + turn-end. Tri-state probe (`git` / `dir` / `absent`) so projects with files-but-no-git auto-heal on next turn. Production fix shipped today: `GITEA_USERNAME` was missing from prod env so `isGiteaConfigured()` silently no-op'd; added the env value AND a defensive fallback to `GITEA_ADMIN_USER` in code. Backfilled `vibn-mark/manifest` repo manually from the dev container after the env fix. Smoke-tested by inspecting `/workspace/manifest/` over SSH bridge — 64 tracked files pushed, all 6 phase directories present. |

 **Definition of done:** eval harness reports ≥3× speedup on time-to-first-preview
 vs. Path A baseline, ≥80% success rate across the 10 reference prompts.
--- a/docs/AGENT_TELEMETRY_STREAMING_PROJECT.md
+++ b/docs/AGENT_TELEMETRY_STREAMING_PROJECT.md
@@ -1,292 +1,5 @@
-# Agent telemetry & live execution stream — project spec
+# Agent Telemetry Streaming (Historical)

-This document captures **concrete product and engineering additions** discussed for Vibn: moving from **poll-based session updates** and **in-memory jobs** to a **durable, ordered, push-friendly execution timeline**—the web equivalent of a terminal agent’s clarity (step-by-step visibility, tool boundaries, failures, and later multi-agent signals).
+> **Note:** This historical spec covered the implementation of real-time streaming for the AI agent loop (Server-Sent Events) and timeline rendering.

---
-
-## 1. Why this exists
-
-### Current behavior (baseline)
-
-| Surface | How progress reaches the user | Limits |
-|--------|------------------------------|--------|
-| **Agent sessions** (`agent_sessions`) | Runner `PATCH`es `output`, `status`, `changed_files` to Next; UI **polls** `GET …/agent/sessions/[id]`. | Latency, reconnect story, no single ordered stream; rich semantics encoded only in `text`. |
-| **Jobs** (`/api/agent/run`, `/api/jobs/:id`) | In-memory `job-store` (`progress`, `toolCalls[]`); UI polls job endpoint. | Lost on restart; not shared across runner replicas; not unified with session UI. |
-| **Orchestrator / Atlas chat** | Request/response to runner; advisor path may be remote URL. | No execution timeline for “long COO run” in-product unless you add the same event layer. |
-
-### Product intent
-
- **Trust during long runs**: users see *what* happened, *when*, and *whether something was blocked*—not only a final status.
- **Differentiation**: “Ink-like” clarity in the browser—structured steps, not a blob of logs.
- **Foundation for multi-agent**: handoffs, child work, and safety events need a **common event pipe**, not ad-hoc strings.
-
---
-
-## 2. Goals
-
-1. **Append-only execution events** with **monotonic ordering** (per session or per job), suitable for replay after refresh.
-2. **Server-push to the client** (recommend **SSE** first; WebSocket if you need bi-directional on the same channel).
-3. **Persistence** so reconnect, refresh, and horizontal scaling do not lose history.
-4. **Single conceptual model** (`AgentEvent`) usable by:
-   - Build → **Agent** tab (sessions),
-   - **Job** flows (create/analyze-style),
-   - optionally **orchestrator** long runs later.
-5. **Backward compatibility** during rollout: existing `PATCH` + `output` can remain as a fallback or be fed from the same emitter.
-
-### Non-goals (for v1)
-
- Full **OpenTelemetry** export (optional later).
- **Real-time collaborative** multi-user cursors on the same session.
- Merging **claude-code-fork**—this spec is **API + UI + persistence** only.
-
---
-
-## 3. Concept: `AgentEvent`
-
-### Core shape (suggested)
-
-```ts
-type AgentEvent = {
-  seq: number;           // monotonic per stream (session_id or job_id)
-  ts: string;            // ISO-8601
-  runId: string;         // session UUID or job id — ties events to a run
-  runKind: 'session' | 'job';
-  phase: 'queued' | 'running' | 'completed' | 'failed' | 'stopped';
-
-  type: AgentEventType;
-  payload: Record<string, unknown>;  // type-specific
-};
-
-type AgentEventType =
-  | 'run.started'
-  | 'run.phase'              // e.g. planning, executing, committing
-  | 'llm.turn.start'
-  | 'llm.turn.end'
-  | 'tool.start'
-  | 'tool.end'
-  | 'tool.output'            // chunked stdout/stderr if needed
-  | 'safety.block'           // policy / protected path / command denied
-  | 'file.changed'           // maps to today’s changed_files semantics
-  | 'git.commit'
-  | 'deploy.triggered'
-  | 'deploy.status'
-  | 'error'
-  | 'run.completed'
-  | 'handoff'                // v2: parent → child agent
-  | 'child_job.started'      // v2: linked run id
-  ;
-```
-
-### Mapping from today’s session `outputLine`
-
-| Today (`outputLine.type`) | Suggested event(s) |
-|---------------------------|--------------------|
-| `step` / `info` | `run.phase` or `llm.turn.*` with summary in `payload.message` |
-| `stdout` / `stderr` | `tool.output` or dedicated stream events |
-| `error` | `error` + optional `safety.block` if policy-driven |
-| `done` | `run.completed` |
-
-Keep **human-readable `message`** on events for UI defaults; add **structured fields** (`tool`, `argsSummary`, `durationMs`) for timeline rendering and filters.
-
---
-
-## 4. Architecture (high level)
-
-```mermaid
-flowchart LR
-  subgraph runner [vibn-agent-runner]
-    RA[runSessionAgent / runAgent]
-    EMIT[emitAgentEvent]
-  end
-  subgraph api [vibn-frontend Next.js]
-    ING[POST internal ingest or PATCH extend]
-    DB[(Postgres agent_events)]
-    SSE[SSE GET /api/.../stream]
-  end
-  subgraph browser [Browser]
-    UI[Timeline + live log]
-  end
-  RA --> EMIT
-  EMIT -->|HTTPS + secret or mTLS| ING
-  ING --> DB
-  UI -->|EventSource| SSE
-  SSE --> DB
-```
-
-**Principles**
-
- **Runner remains stateless** regarding “truth”: it emits events; **Next + DB** are the source of truth for the UI (matches today’s session model).
- Alternatively, runner could expose **SSE directly**—usually worse for **auth**, **CORS**, and **one domain** for the product. Prefer **Next as SSE endpoint** reading from DB.
-
---
-
-## 5. Backend: `vibn-agent-runner`
-
-### 5.1 Emit from execution paths
-
-| Location | Action |
-|----------|--------|
-| `agent-session-runner.ts` | Replace or supplement `patchSession` output-only updates with **`emitAgentEvent`** each turn / tool / error. |
-| `runAgent` / tool loop (`executeTool`) | Same emitter for **job** runs. |
-| `server.ts` `/agent/execute` | Emit `run.started` after 202; `run.completed` / `error` on exit. |
-| Security / blocked tools (`security.ts` or equivalent) | Emit `safety.block` with reason code (no secrets in payload). |
-
-### 5.2 Transport runner → Next
-
-**Option A (recommended):** extend existing **PATCH** or add **`POST /api/internal/agent-events`** (or per-session batch append):
-
- Headers: `x-agent-runner-secret` (same as today’s PATCH).
- Body: single event or small batch `{ events: AgentEvent[] }` with server-assigned `seq` to avoid races.
-
-**Option B:** Runner writes to **Redis/Postgres** directly—couples runner to DB credentials; only do if you already run runner inside the same trust zone with DB URL.
-
-### 5.3 Jobs store
-
- **Short term:** continue in-memory for job metadata; **persist events** to Postgres keyed by `jobId`.
- **Medium term:** optional **Redis** for job status + pub/sub to Next for low-latency SSE fanout (only if DB polling becomes a bottleneck).
-
---
-
-## 6. Backend: `vibn-frontend` (Next.js)
-
-### 6.1 Persistence
-
-**New table (example): `agent_run_events`**
-
-| Column | Notes |
-|--------|--------|
-| `id` | UUID |
-| `run_id` | Session id or job id (text) |
-| `run_kind` | `'session' \| 'job'` |
-| `seq` | BIGSERIAL or per-run sequence enforced with unique constraint `(run_id, seq)` |
-| `project_id` | Nullable for jobs if not scoped |
-| `event` | JSONB — full `AgentEvent` or `{ type, ts, payload }` |
-| `created_at` | default now() |
-
-Index: `(run_id, seq)` for range queries (`WHERE run_id = $1 AND seq > $lastSeen`).
-
-**Optional:** migrate legacy `agent_sessions.output` to be **derived** (last N lines for email export) or **dual-write** during transition.
-
-### 6.2 SSE route (example contract)
-
- **`GET /api/projects/[projectId]/agent/sessions/[sessionId]/events/stream`**
-  - Auth: session cookie / same as GET session (user must own project).
-  - Query: `?afterSeq=123` for replay.
-  - Response: `text/event-stream`; each message: `data: {JSON}\n\n`.
-  - Heartbeat comments every ~15–30s to keep proxies alive.
-
-For **jobs** (if not project-scoped): `GET /api/jobs/[jobId]/events/stream` with appropriate auth.
-
-### 6.3 Ingest route (runner-only)
-
- **`POST /api/internal/agent-events`** (or nested under project/session as you prefer).
- Validates `x-agent-runner-secret`.
- Inserts rows with **server-generated `seq`** (transaction per run or advisory lock per `run_id`).
-
---
-
-## 7. Frontend (product UI)
-
-### 7.1 Agent tab — timeline
-
- **EventSource** (SSE) subscription when session is `running`; on load, **fetch historical** events (`GET …/events?afterSeq=0` or SSE from 0).
- **Timeline components**:
-  - Group by `llm.turn` / `tool.start`–`tool.end`.
-  - Expandable tool args (sanitized).
-  - Distinct styling for `safety.block` and `error`.
- **Reconnect**: on `EventSource` error, reopen with `lastSeq` from last received event.
-
-### 7.2 Jobs / analyze flows
-
- Same timeline component keyed by `jobId` if you surface those runs in UI.
- Unifies mental model: “every run has a stream.”
-
-### 7.3 Deprecate slow polling
-
- Reduce `GET …/agent/sessions/[id]` poll interval when SSE connected; keep **single poll** for `status` / `changed_files` if those stay on session row only, or **also** emit `file.changed` events and drive UI from stream + one final consistency read.
-
---
-
-## 8. Security & privacy
-
- **Never** put tokens, env values, or full file contents in events by default; use **truncation** and **hashes** where needed.
- **`safety.block`**: log reason **code** + user-safe message; align with `security.ts` behavior.
- **Rate limits** on ingest endpoint (per `run_id` / per IP) to avoid abuse if misconfigured.
-
---
-
-## 9. Environment variables
-
-| Variable | Where | Purpose |
-|----------|--------|---------|
-| `AGENT_RUNNER_SECRET` | Runner + Next | Ingest / extended PATCH auth |
-| `VIBN_API_URL` | Runner | Base URL for callbacks |
-| `AGENT_RUNNER_URL` | Next | Start runs (unchanged) |
-
-Add if needed:
-
-| Variable | Purpose |
-|----------|---------|
-| `AGENT_EVENTS_INGEST_PATH` | Optional override for ingest URL |
-| `SSE_MAX_BUFFER` | Cap replay batch size |
-
---
-
-## 10. Phased roadmap (suggested)
-
-### Phase 1 — Foundation
-
- [ ] Define `AgentEvent` TypeScript types in a **shared package** or duplicated minimal types in runner + frontend.
- [ ] Create `agent_run_events` (or equivalent) + migration.
- [ ] Implement **ingest** endpoint; wire **runner session path** to emit core events: `run.started`, `tool.start` / `tool.end`, `error`, `run.completed`, `file.changed`.
- [ ] **Dual-write**: keep existing `PATCH` `outputLine` so nothing breaks.
-
-### Phase 2 — Push
-
- [ ] SSE route + **EventSource** in Agent tab.
- [ ] Backfill UI from DB on mount; then live tail.
- [ ] Lower or gate polling on `GET` session.
-
-### Phase 3 — Jobs + durability
-
- [ ] Emit same events from **job** execution path; persist by `jobId`.
- [ ] Optional: replace in-memory job list with DB for **multi-instance** runner (later).
-
-### Phase 4 — Rich semantics
-
- [ ] `safety.block` from policy layer.
- [ ] `deploy.*` events if Coolify integration is user-visible.
- [ ] **Multi-agent**: `handoff`, `child_job.*` with links in payload.
-
---
-
-## 11. Success metrics
-
- Time-to-first-visible-step after **Run** &lt; **1s** p95 (SSE).
- After hard refresh mid-run, user sees **consistent history** (no duplicate seq, no gaps if you guarantee at-least-once ingest with idempotency keys later).
- Support tickets / confusion drops on “what is the agent doing?” (qualitative).
-
---
-
-## 12. Related code (repo anchors)
-
-Use these when implementing:
-
- Runner session loop + PATCH bridge: `vibn-agent-runner/src/agent-session-runner.ts`
- Runner HTTP: `vibn-agent-runner/src/server.ts` (`/agent/execute`, `/agent/stop`, `/agent/approve`, `/api/agent/run`, `/api/jobs/:id`)
- In-memory jobs: `vibn-agent-runner/src/job-store.ts`
- Next session API + runner callback: `vibn-frontend/app/api/projects/[projectId]/agent/sessions/[sessionId]/route.ts`
- Session create + fire-and-forget execute: `vibn-frontend/app/api/projects/[projectId]/agent/sessions/route.ts`
-
---
-
-## 13. Open decisions
-
-1. **Single table** for sessions + jobs vs **two tables** (simpler queries vs flexibility).
-2. **Seq generation**: DB sequence per `run_id` vs global monotonic with `(run_id, seq)` composite only in app logic.
-3. **Idempotency**: runner retries may duplicate events—use **`event_id` UUID** from runner for dedupe on ingest.
-4. **Orchestrator chat**: treat as v2 unless you need a **COO run** timeline immediately.
-
---
-
-*Document version: 1.0 — aligned with discussion of runner ↔ frontend telemetry, SSE-first delivery, Postgres persistence, and future multi-agent event types.*
+The streaming system is fully implemented in `app/api/chat/route.ts` and rendered in the frontend via `Timeline`, `ThinkingBubble`, and `TimelineToolGroup` components inside `chat-panel.tsx`.
--- a/docs/AI_CAPABILITIES_ROADMAP.md
+++ b/docs/AI_CAPABILITIES_ROADMAP.md
@@ -1,673 +1,5 @@
-# Vibn AI Capability Roadmap
+# AI Capabilities Roadmap (Historical)

-> **⚠ See also:** [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md)
-> — proposed pivot to a Claude-Code-style persistent dev container per
-> project. Once approved, that doc supersedes any "code authoring" item
-> in this roadmap; this file remains the source of truth for
-> infrastructure primitives (P5.x, P6.x, P7.x).
->
-> The ordered plan for closing the gap between what the Vibn agent can do
-> today and what it needs to do for a real customer to ship, operate, and
-> scale a SaaS through it.
->
-> **Companion to:** [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (current state).
->
-> **Prioritization framing:**
-> 1. Does it unblock *shipping a real product* (not a demo)?
-> 2. Does it unblock *surviving past the first paying customer*?
-> 3. Does it only matter once usage scales?
->
-> Tier 1 = (1). Tier 2 = (2). Tier 3 = (3). Tier 4 = revisit when demanded.
->
-> **Sequencing rule:** complete Tier 1 before any Tier 2 item. The trap
-> is polishing safety rails (audit, scopes, quotas) before the product is
-> actually shippable.
+> **Note:** This is a historical roadmap document. Most of the core Path B capabilities (persistent dev containers, Gitea mirroring, Traefik wildcard proxies) have been successfully shipped.

---
-
-## 0. Substrate & constraints
-
-Vibn runs on a two-cloud substrate, constrained to Canadian data residency:
-
-| Layer | Provider | Region | Purpose |
-|---|---|---|---|
-| **App hosting** | Coolify (self-managed) | Montreal VPS | All app / database / auth containers. Current state. |
-| **Managed services** | **Google Cloud** | `northamerica-northeast1` (Montreal) | Object storage, cron, queues, logs, backups, monitoring, secrets. |
-| **Domain registration** | OpenSRS (Tucows) | Toronto | Wholesale domain API. Canadian company, pre-funded float account. |
-| **Authoritative DNS** | Cloud DNS (default) / CIRA D-Zone (strict) | Global anycast / Canadian | Managed DNS for workspace-owned domains. |
-| **Transactional email** | Amazon SES | `ca-central-1` (Montreal) | No GCP equivalent; AWS's Canadian region keeps data in-country. |
-
-**Absolute rule: no customer data leaves Canada.** Every workspace-owned
-resource (storage bucket, database, log bucket, task queue, scheduler
-job, email message body) must be pinned to a Canadian region.
-
-### Why mix clouds?
- **Coolify stays** because we already built the workspace-scoped
-  provisioning around it (Phase 4). Migrating apps to Cloud Run is a
-  rewrite we don't need.
- **GCP-CA** fills every managed-service gap Coolify has. Cheaper and
-  more reliable than self-hosting MinIO/Loki/scheduler.
- **AWS SES for email** because GCP has no first-party transactional
-  email service and SES `ca-central-1` is the only credible
-  Canadian-resident managed option.
- **OpenSRS for domains** because it's the wholesale API behind most
-  Canadian registrars, and we already have the deposit.
-
-### Compliance upgrade path (Tier 4 territory)
-For regulated customers (healthcare, financial, public sector):
- **Assured Workloads for Canada** on GCP — enforces Canadian personnel
-  access + data residency contractually.
- **CIRA D-Zone** instead of Cloud DNS — first-party Canadian managed DNS.
- Keep the SES and OpenSRS pieces as-is (already Canadian-resident).
-
-Document the caveat on a public trust page. Build the Assured-Workloads
-variant when a real customer asks.
-
---
-
-## Current state (Phase 4 + P5.1 verified, Apr 2026)
-
- Workspace tenancy: Gitea org + Coolify project + SSH deploy key per
-  workspace.
- Agent can: create repos, create apps, provision 8 database flavors,
-  deploy 8 vetted auth providers, manage env vars, deploy + poll,
-  update, delete (with `?confirm=<name>`), set domains under
-  `*.{slug}.vibnai.com`.
- Control-plane MCP: 24 tools + full REST surface at `/api/mcp`.
-  API-key scoped per workspace.
- **P5.1 custom apex domains** — OpenSRS + Cloud DNS + Coolify
-  lifecycle (search / register / attach / inspect) shipped and
-  verified end-to-end against PROD GCP + OpenSRS sandbox + PROD
-  Coolify on `v4.0.0-beta.473` (2026-04-22). All 5 sub-systems green
-  in `smoke-attach-e2e.ts`: register → zone → A records → registrar
-  NS update → Coolify `fqdn` patch → cleanup. Required a server-side
-  config fix on `coolify-server-mtl` (proxy.type=TRAEFIK,
-  is_build_server=false) so `Server::isProxyShouldRun()` returns
-  true and the controller maps `domains` → `fqdn` — see
-  [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) § 3.6 for the gory details.
- **Agent-runner stdio MCP bridge** — `vibn-agent-runner` now exposes
-  its full in-house toolkit (28 tools) outward over 5 stdio MCP
-  servers so external clients (Cursor, Claude Desktop, Goose) can
-  drive the same Coolify / Gitea / workspace / memory / search /
-  sub-agent surface as the internal Coder/PM/Marketing agents, with
-  shared protected-repo + protected-app guardrails. Every tool now
-  has a pure `*-api.ts` module, a registry wrapper for the in-process
-  loop, and an MCP server wrapper — single source of truth, verified
-  by `scripts/smoke-mcp.js`.
- Enforced: tenant isolation, domain policy, delete confirms,
-  secrets-at-rest encryption, protected-repo / protected-app guards.
-
-See [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (§ 3.6 for P5.1,
-§ 3.7 for the stdio MCP bridge) for the complete current surface.
-
---
-
-## Tier 1 — Blocks shipping a real product
-
-Without these, anything the agent builds is *demo-shaped*. Ship these
-next, in the recommended sequence below.
-
-### P5.1 · Custom apex domains via OpenSRS
-
-**Goal:** agent buys `mysaas.com` on the user's behalf and attaches it
-to a Coolify app with automatic TLS.
-
-**Why now:** you already opened an OpenSRS reseller account with a $100
-float. Unlocks real branding, DKIM for email (P5.2 depends on this),
-and gives you a revenue line (markup on domains).
-
-**Surface:**
-
-| Tool / endpoint | Purpose |
-|---|---|
-| `domains.search` | Live availability + suggestions via OpenSRS `lookup`. |
-| `domains.check_price` | Per-TLD price from OpenSRS + markup. |
-| `domains.register` | Debits workspace float, registers via OpenSRS. |
-| `domains.list` | Workspace's owned domains. |
-| `domains.renew` / `domains.transfer` | Lifecycle. |
-| `domains.{name}.attach` | Attach to a Coolify app: DNS records + Coolify `fqdn` + Let's Encrypt. |
-| `domains.{name}.detach` | Free a domain from an app, keep registration. |
-| `domains.{name}.attach_status` | Polls DNS propagation + cert issuance (async). |
-
-**Infra:**
- **OpenSRS client** (their XML/SOAP or REST API).
- **Cloud DNS** for zone management (default). CIRA D-Zone available as a
-  workspace-level preference for strict-residency customers.
- **Workspace float ledger** (`vibn_workspace_billing_float`) — a
-  prepaid balance in CAD, debited on register/renew. Reconciled nightly
-  against the OpenSRS master deposit.
- `VIBN_OPENSRS_DEPOSIT_ACCOUNT` as the master float handle.
-
-**New columns** on `vibn_workspaces`:
- `preferred_dns_provider TEXT DEFAULT 'cloud_dns'`
- `cloud_dns_zone_name TEXT`  ← GCP managed zone for this workspace.
-
-**Risks:**
- DNS propagation is human-scale (minutes–hours). Agents need the
-  async `attach_status` polling loop, not a sync call.
- Cert issuance via Let's Encrypt is rate-limited (50/week per domain).
-  Abuse-prevent with per-workspace rate caps.
-
-**Estimate:** **2 weeks.**
-
---
-
-### P5.2 · Transactional email (AWS SES `ca-central-1`)
-
-**Goal:** auth providers can send password-reset emails; agents can
-`email.send` from `noreply@mysaas.com`.
-
-**Why now:** every auth provider on the allowlist is broken without
-SMTP. Also pairs with P5.1 — per-workspace sender domains need DKIM on
-domains you own.
-
-**Why SES ca-central-1 specifically:** GCP has no first-party
-transactional email service. All mainstream providers (Postmark,
-Resend, Mailgun, SendGrid) are US-primary. SES's Montreal region is the
-only credible managed option that keeps message bodies in Canada.
-
-**Two-phase rollout:**
-
-**Phase A — shared-sender MVP (1 week):**
- One SES-verified sender domain `mail.vibnai.com`.
- Every workspace can send from `noreply@mail.vibnai.com` out of the box.
- `email.send` tool + injected `SMTP_*` env vars.
- Bounce / complaint webhooks routed via SNS → a Cloud Run service
-  that writes per-workspace notifications.
-
-**Phase B — per-workspace sender domains (1 week, depends on P5.1):**
- `email.verify_sender_domain` creates the SPF/DKIM/DMARC records via
-  the Cloud DNS / CIRA D-Zone client on a workspace-owned domain.
- Polls SES verification; flips `verified=true` when done.
- Workspace can now `email.send from: founder@mysaas.com`.
-
-**Surface:**
-
-| Tool | Purpose |
-|---|---|
-| `email.send` | Single message; returns SES `message_id`. |
-| `email.send_batch` | Up to 100 at a time. |
-| `email.list_messages` | Recent sent mail + delivery state (from SES + our log). |
-| `email.verify_sender_domain` | Kick off DKIM for a workspace-owned domain. |
-| `email.sender_status` | Poll verification state. |
-| `email.webhooks.list` | Recent bounces/complaints. |
-
-**Infra:**
- SES identity per workspace-owned sender domain.
- SNS topic → Cloud Run webhook receiver (in `northamerica-northeast1`)
-  for bounce/complaint ingestion.
- Rate limits: start in SES sandbox (200/day), request production limits
-  after first real customer.
-
-**Estimate:** **2 weeks total** (1 week Phase A + 1 week Phase B).
-
---
-
-### P5.3 · Object storage (Google Cloud Storage, `northamerica-northeast1`)
-
-**Goal:** any SaaS the agent builds can take user uploads — avatars,
-attachments, exports, images — without the user pasting in third-party
-credentials.
-
-**Why now:** "can users upload a file?" is the #1 post-demo question.
-Blocks ~half of realistic SaaS ideas.
-
-**GCP collapses this item.** No MinIO container to babysit; GCS provides
-managed bucket + signed URLs + lifecycle policies + encryption out of
-the box.
-
-**Surface:**
-
-| Tool | Purpose |
-|---|---|
-| `storage.buckets.list` | Buckets in this workspace (filtered by `workspace={slug}` label). |
-| `storage.buckets.create` | New bucket. Optional `public_read`. Enforced region: `northamerica-northeast1`. |
-| `storage.buckets.delete` | Destroy bucket. `confirm` gate. |
-| `storage.presign_upload` | PUT URL, TTL, content-type constraint. |
-| `storage.presign_download` | GET URL, TTL. |
-| `storage.list_objects` | Pagination + prefix filter. |
-| `storage.delete_object` | Single object. |
-| `storage.set_lifecycle` | TTL delete, multipart cleanup, archive tiering. |
-
-**Provisioning additions:**
- Default bucket `vibn-ws-{slug}` created on workspace provision.
- Uniform bucket-level access enabled by default.
- Per-workspace GCP service account `vibn-ws-{slug}@...`, scoped to its
-  own bucket via `roles/storage.objectAdmin`.
- Keyfile stored encrypted (AES-256-GCM, same `VIBN_SECRETS_KEY`) in
-  `vibn_workspaces.gcp_service_account_key_encrypted`.
-
-**New columns** on `vibn_workspaces`:
- `gcs_bucket_name TEXT`
- `gcp_service_account_email TEXT`
- `gcp_service_account_key_encrypted BYTEA`
-
-**Env injection:**
- `STORAGE_ENDPOINT=https://storage.googleapis.com`
- `STORAGE_BUCKET={workspace-bucket-name}`
- `STORAGE_ACCESS_KEY`, `STORAGE_SECRET_KEY` (S3-compatible via GCS HMAC keys)
-  — auto-injected on app creation so agent code uses standard S3 SDKs.
-
-**Estimate:** **3 days.**
-
---
-
-### P5.4 · Workers, cron, and queues (Cloud Tasks + Cloud Scheduler + Cloud Run Jobs)
-
-**Goal:** agents can declare async workers, scheduled jobs, and queued
-tasks. Anything that isn't a single `ports: 3000` web container.
-
-**Why now:** webhooks, retries, nightly cleanup, image processing,
-email sending — every real SaaS needs a non-web process. Current
-workaround (second Coolify app) is brittle and manual.
-
-**Hybrid approach — Coolify for compute, GCP for orchestration:**
-
-Option evaluated and chosen:
- **Cloud Scheduler** (`northamerica-northeast1`) for cron: fires
-  HTTP webhooks into the app at the scheduled time.
- **Cloud Tasks** (`northamerica-northeast1`) for queue: agent code
-  calls `enqueue(task)`, Cloud Tasks dispatches to the app's worker
-  endpoint with retries, backoff, and at-least-once semantics.
- **Worker process** stays on Coolify as a second app-per-repo with a
-  different start command, exposed on an internal URL.
-
-Rejected alternative: migrate everything to Cloud Run Jobs. More managed
-but splits the "Live" view across two deploy targets and changes the
-agent's mental model. Not worth it for MVP.
-
-**Shape — extend `apps.create`:**
-
-```json
-{
-  "repo": "my-site",
-  "services": {
-    "web":    { "command": "npm start",      "ports": "3000" },
-    "worker": { "command": "npm run worker", "replicas": 2 }
-  },
-  "cron": [
-    { "name": "nightly-backup", "schedule": "0 3 * * *", "path": "/tasks/backup" },
-    { "name": "sync",           "schedule": "*/10 * * * *", "path": "/tasks/sync" }
-  ],
-  "queues": [
-    { "name": "emails" },
-    { "name": "image-processing" }
-  ]
-}
-```
-
-Internally creates: two Coolify apps (web + worker), N Cloud Scheduler
-jobs labeled `workspace={slug}`, N Cloud Tasks queues.
-
-**Surface additions:**
-
-| Tool | Purpose |
-|---|---|
-| `apps.services.list` | All processes in an app. |
-| `apps.services.update` | Scale replicas, change command. |
-| `apps.services.logs` | Per-process logs. |
-| `cron.list` | Scheduler jobs in this workspace. |
-| `cron.create` / `cron.update` / `cron.delete` | Manage scheduled jobs. |
-| `cron.run_now` | Fire a scheduled job immediately (useful for agent testing). |
-| `queues.list` | Cloud Tasks queues in this workspace. |
-| `queues.create` / `queues.delete` | Manage queues. |
-| `queues.enqueue` | (Normally called from app code, but exposed for agent-driven testing.) |
-| `queues.pause` / `queues.resume` | Emergency ops. |
-
-**New columns** on `vibn_workspaces`:
- `cloud_scheduler_location TEXT DEFAULT 'northamerica-northeast1'`
- `cloud_tasks_location TEXT DEFAULT 'northamerica-northeast1'`
-
-**Auth to GCP:** per-workspace service account (provisioned in P5.3) is
-extended with `roles/cloudscheduler.admin` and `roles/cloudtasks.admin`
-*scoped to resources labeled `workspace={slug}`* via IAM conditions.
-Agents can only act on their own workspace's jobs/queues.
-
-**Estimate:** **1 week.**
-
---
-
-### Tier 1 total: ~5 weeks of focused work
-
-After Tier 1 lands, an agent can:
- Buy `mysaas.com`, point it at a Next.js app.
- Deploy Authentik with working password-reset emails from `noreply@mysaas.com`.
- Offer user uploads (avatars, attachments).
- Run `0 3 * * *` nightly cleanup cron.
- Process Stripe webhooks idempotently via a retry queue.
-
-That's a shippable SaaS. Everything after this is about *keeping* it
-shipped.
-
---
-
-## Tier 2 — Blocks surviving past the first real customer
-
-Once users exist, these prevent silent failures.
-
-### P6.1 · Database backups + restore (GCS + wal-g)
-
-**Goal:** nightly backups, on-demand backups, one-call restore. No
-"agent ran `DROP TABLE` in a migration" permanent data loss.
-
-**Why:** scariest item on this list. Failure mode is irrecoverable.
-
-**Shape:**
- `databases.{uuid}.backup` — on-demand `pg_dump` / `mongodump` to the
-  workspace's GCS bucket (depends on P5.3).
- `databases.{uuid}.backups.list` — lists backups with timestamp + size.
- `databases.{uuid}.backups.restore` — `confirm`-gated restore from a
-  specific backup uuid.
- Per-database backup policy: daily / hourly / off, retention days.
- Default: every AI-created database gets daily backups + 7-day
-  retention on.
-
-**Infra:**
- Cron jobs run via P5.4's Cloud Scheduler primitive.
- Stored at `gs://vibn-ws-{slug}/backups/{db-uuid}/{iso-timestamp}.sql.gz`.
- Lifecycle rules auto-delete backups older than retention.
- Object-level retention lock available for "immutable backups" on
-  request (Tier 3 feature).
-
-**Upgrade path:**
- **Postgres point-in-time recovery** via `wal-g` shipping WAL segments
-  to the same GCS bucket. Adds RPO < 5 min.
- **ClickHouse**: `clickhouse-backup` to GCS.
- **MongoDB**: `mongodump` incremental.
-
-**Estimate:** **3 days** for MVP (pg_dump + schedule + restore).
-**+1 week** for wal-g PITR if/when a customer asks.
-
---
-
-### P6.2 · Runtime log streaming (Cloud Logging)
-
-**Goal:** agent can see "is the app erroring at 10 req/s right now?",
-not just "did the build succeed."
-
-**Why:** today deploy logs are surfaced but container stdout/stderr is
-not. An agent that "fixed a bug" can't verify the fix without a human
-SSH-ing into Coolify.
-
-**GCP collapses this item** — ship container logs to Cloud Logging with
-a workspace label, query via the logs API.
-
-**Shape:**
- Fluent-bit sidecar (or Coolify label) ships container stdout/stderr
-  to Cloud Logging in `northamerica-northeast1` with labels
-  `workspace={slug}`, `app={app-uuid}`, `service={web|worker|...}`.
- Per-workspace log bucket for retention isolation.
-
-**Surface:**
-
-| Tool | Purpose |
-|---|---|
-| `apps.logs` | Last N lines across replicas. Filter by timestamp, severity. |
-| `apps.logs.tail` | SSE stream of new log lines. |
-| `apps.logs.search` | Thin wrapper on Cloud Logging's query API — grep, severity filter, time window. |
-| `apps.services.logs` | Same, scoped to a single service. |
-
-**Retention:** default 30 days in the workspace log bucket; exportable
-to the workspace's GCS bucket on request for long-term storage.
-
-**Estimate:** **3 days** (fluent-bit config + thin API wrapper).
-
---
-
-### P6.3 · Scoped API keys
-
-**Goal:** invite a CI bot or teammate without giving root on the
-workspace.
-
-**Why:** solo-builder flow survives without it. Breaks the moment a
-second principal enters.
-
-**Shape:**
- Keys gain `scopes: string[]` and optional `expires_at`.
- Scope tokens: `apps:read`, `apps:write`, `apps:delete`,
-  `databases:*`, `auth:*`, `domains:read`, `domains:write`,
-  `storage:*`, `email:send`, `cron:*`, `queues:*`, `deploy:*`.
- Per-scope rate limits optional (Tier 3; API shape supports it from
-  day one).
-
-**Surface changes:**
-
-| Tool | Change |
-|---|---|
-| `keys.create` | Accepts `scopes`, `expires_at`. |
-| `keys.list` | Returns scopes per key. |
-| `keys.rotate` | Mints new token, preserves scope set. |
-
-Every MCP/REST handler gets a scope requirement checked in the
-principal resolver.
-
-**Estimate:** **1 week.**
-
---
-
-### Tier 2 total: ~2 weeks
-
-After Tier 2 lands, a SaaS shipped on Vibn can survive without you
-dropping into a psql REPL at 3am.
-
---
-
-## Tier 3 — Matters once usage scales
-
-Don't build these until at least one real customer is hitting them.
-Building them pre-market is the classic infra-overinvestment trap.
-
-### P7.1 · Per-workspace quotas + cost caps
-Max apps, max dbs, max GCS GB, max egress, max SES messages/month, max
-OpenSRS spend/month. Per-plan configurable. Hallucinating agents can't
-OOM the cluster or burn your SES reputation.
-
-### P7.2 · Audit log
-Append-only per-workspace log of (principal, action, params, timestamp,
-result). Cloud Logging with a dedicated `audit-logs` log-bucket, 400-day
-retention. Read API for the settings panel. Needed for any
-SOC-2-adjacent buyer.
-
-### P7.3 · Preview-per-PR environments
-Open a PR → `pr-42.mark.vibnai.com` deploys automatically with a
-throw-away database. Teardown on PR close/merge. Unblocks multi-agent
-flows.
-
-### P7.4 · Atomic multi-resource operations (`stacks`)
-`POST /stacks` takes a full app + db + auth + domain + cron spec;
-creates atomically, rolls back on failure. Agent ergonomics win once
-demo flow is routine.
-
-### P7.5 · Billing integration
-Stripe subscriptions for Vibn itself (workspace billing), plus
-per-workspace float top-ups, plus reconciliation to the OpenSRS master
-deposit and GCP / SES cost allocation. Only needed when you charge
-real dollars.
-
-### P7.6 · Assured Workloads for Canada
-GCP policy-enforced Canadian residency + Canadian personnel access.
-For regulated customers (healthcare, financial, public sector). Priced
-accordingly; ship only when a real customer needs it.
-
-### P7.7 · CIRA D-Zone as a workspace DNS option
-Swap Cloud DNS → CIRA D-Zone for a workspace with strict residency
-requirements. API-compatible wrapper so nothing agent-facing changes.
-
---
-
-## Tier 4 — Revisit when demanded
-
-Items to explicitly *not* build until a concrete customer asks.
-
- **Multi-region** — single-region Canada is fine for B2B SaaS makers
-  (our early market).
- **Cloud Run migration** — would rewrite most of Coolify-based
-  capabilities. Revisit if/when Coolify becomes a bottleneck.
- **Managed search / vector DB as first-class types** — agents can
-  deploy Meilisearch / Typesense / pgvector-Postgres as regular services.
- **mTLS / custom CAs / BYO-cert upload** — enterprise creep.
- **MCP protocol polish** (streaming, resources, prompts, per-tool
-  schemas) — current JSON-over-HTTP works. Revisit on real friction.
- **Per-app basic auth, IP allowlists, WAF** — Traefik middleware
-  manually until someone asks.
-
---
-
-## Roadmap at a glance
-
-| Phase | Items | Est. | Unblocks |
-|---|---|---|---|
-| **P5 — Real SaaS primitives** | Domains, email, storage, workers/cron/queues | ~5 wk | Shipping a real product |
-| **P6 — Keep-it-running** | Backups, runtime logs, scoped keys | ~2 wk | First real customer survives |
-| **P7 — Scale** | Quotas, audit, previews, stacks, billing, Assured Workloads, D-Zone | demand-driven | Platform grows past 1st cohort |
-| **P8+** | Tier 4 items | never, unless pulled by customer | — |
-
-**Total to "agent ships a SaaS a founder would pay $29/mo for":**
-P5 + P6 = **~7 weeks** (was ~11 before GCP-CA; ~40% compression from
-managed-service leverage).
-
---
-
-## Dependency graph
-
-```
-P5.1 Domains ──┬──→ P5.2 Email Phase B (per-domain DKIM)
-               ├──→ P7.7 CIRA D-Zone swap
-               └──→ (future: customer-owned sub-domain routing)
-
-P5.3 Storage ──┬──→ P6.1 Database backups (backups need a bucket)
-               └──→ P7.2 Audit log export
-
-P5.4 Workers/cron/queues ──┬──→ P6.1 Database backups (run via scheduler)
-                           └──→ most real SaaS patterns
-
-P6.2 Runtime logs — independent, can land anytime
-P6.3 Scoped keys — independent, can land anytime
-P7.6 Assured Workloads — wraps everything; build once demanded
-```
-
-**Parallelizable (three people):**
- Track A: P5.1 → P5.2
- Track B: P5.3 → P6.1
- Track C: P5.4 → P6.2
-
-Track C finishes earliest; use that slack to land P6.3.
-
---
-
-## Per-workspace GCP provisioning (shared across P5.3, P5.4, P6.1, P6.2)
-
-`ensureWorkspaceProvisioned()` gains a GCP-CA block that runs once per
-workspace, idempotently. All resources are created in
-`northamerica-northeast1`.
-
-| Resource | Name pattern | Notes |
-|---|---|---|
-| GCS bucket | `vibn-ws-{slug}` | Uniform bucket-level access. Lifecycle policies off by default. |
-| Cloud DNS managed zone | `vibn-ws-{slug}-zone` | Created per workspace-owned domain in P5.1, not on workspace provision. |
-| Cloud Logging log bucket | `vibn-ws-{slug}-logs` | 30-day retention default. |
-| Cloud Tasks location | `northamerica-northeast1` | Queues created per-app in P5.4, not here. |
-| GCP service account | `vibn-ws-{slug}@{project}.iam` | Single SA per workspace, narrow roles. |
-| Service account key | stored encrypted in `vibn_workspaces` | AES-256-GCM, same `VIBN_SECRETS_KEY`. |
-
-**New columns** on `vibn_workspaces` (cumulative across P5.1-P6.2):
-
-```sql
-- P5.1
-preferred_dns_provider TEXT DEFAULT 'cloud_dns',
-cloud_dns_zone_name   TEXT,
-
-- P5.3
-gcs_bucket_name                   TEXT,
-gcp_service_account_email         TEXT,
-gcp_service_account_key_encrypted BYTEA,
-
-- P5.4
-cloud_scheduler_location TEXT DEFAULT 'northamerica-northeast1',
-cloud_tasks_location     TEXT DEFAULT 'northamerica-northeast1',
-
-- P6.2
-cloud_logging_bucket_name TEXT
-```
-
-Three migration steps, one per phase. All guarded by the existing
-admin-gated `POST /api/admin/migrate` endpoint.
-
---
-
-## Non-goals (stated explicitly so they don't creep in)
-
- **A general-purpose PaaS.** Vibn is an agent-driven SaaS builder, not
-  a Heroku / Fly clone. Every capability must answer "what does an agent
-  need to build a SaaS?" — not "what does a dev need to deploy a
-  container?"
- **Support for non-allowlisted auth providers, databases, services.**
-  The curated surface is the feature. "Any Coolify service" would blow
-  up the tenant-safety model and dilute agent decision-making.
- **A consumer-facing OpenSRS UI.** OpenSRS is plumbing for the agent.
-  Humans should never see an OpenSRS checkout screen — only
-  `domains.register { name: "mysaas.com" }` from the agent.
- **Multi-cloud abstraction layer.** One Coolify cluster + GCP-CA +
-  SES-CA + OpenSRS is the contract. If customers want to bring their
-  own, that's Tier 4.
- **Anything that moves customer data out of Canada.** Even for
-  performance. If a managed service only has US regions, we self-host
-  in Canada or we don't offer it.
-
---
-
-## Recommended execution order (opinionated)
-
-Given dependencies and quick-wins-first philosophy:
-
-**Week 1:**
- P5.3 Storage (GCS wrap, 3 days) → proves the GCP-CA provisioning pattern.
- P5.4 Workers/cron/queues (starts in parallel; depends on P5.3 only for
-  the service account).
-
-**Week 2:**
- P5.4 completes.
- P5.1 Domains starts (OpenSRS client + Cloud DNS wrapper).
-
-**Week 3:**
- P5.1 completes.
- P5.2 Email Phase A (shared-sender MVP) starts.
-
-**Week 4:**
- P5.2 Phase A completes.
- P5.2 Phase B (per-domain DKIM) starts, now that P5.1 is available.
-
-**Week 5:**
- P5.2 Phase B completes. **P5 / Tier 1 done.**
- P6.1 Database backups starts (3 days).
- P6.2 Runtime logs starts in parallel (3 days).
-
-**Week 6:**
- P6.3 Scoped keys (1 week).
-
-**Week 7:**
- Slack week — hardening, docs (`AI_CAPABILITIES.md` refresh), first
-  real customer onboarding.
-
-**End state at week 7:** agent can take a founder from "I have an idea"
-to "I have `mysaas.com` live, with auth, with user uploads, with email,
-with backups, with visible error logs, and a CI bot can deploy it
-without root access."
-
-That's the Vibn product.
-
---
-
-## How to use this doc
-
- When someone proposes a feature, find its tier. If it's Tier 3 or 4
-  and we're still shipping Tier 1, say no.
- Before starting a Tier 1 item, re-read its section and make sure
-  prerequisites shipped. Email-per-domain before domains is wasted code.
- [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) is the canonical
-  reference of *what exists today*. This doc is the canonical reference
-  of *what comes next*. When an item ships, move it from here to that
-  doc and delete its section here.
- When a user request implies Canadian residency (they say "PIPEDA",
-  "healthcare", "public sector", or "our data can't leave Canada"), pin
-  the answer to this doc's §0 Substrate & constraints. Don't improvise.
+Current pending capabilities/roadmap items are tracked in `BETA_LAUNCH_PLAN.md`.
--- a/docs/AI_HARNESS_GAPS.md
+++ b/docs/AI_HARNESS_GAPS.md
@@ -1,227 +1,8 @@
-# AI Harness Gaps — Proposal
+# AI Harness Stability & Middleware (Shipped)

-> Four gaps in the Vibn AI experience that are **structural, not promptable**.
-> Each one is responsible for a specific failure pattern visible in real
-> production chat transcripts. None of them are scoped in
-> [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md),
-> [`BETA_LAUNCH_PLAN.md`](./BETA_LAUNCH_PLAN.md),
-> [`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md), or the
-> agent-execution / telemetry-streaming designs.
->
-> **Drafted:** 2026-04-30 (after a transcript review of the Dr Dave + Twenty CRM threads).
->
-> **Why these four:** they share a common shape — the model is doing what
-> the prompt told it to, and still producing a bad outcome. The fix lives
-> in the *harness around the model*, not in instructions to the model.
+> **Note:** These middleware stability mechanisms have been shipped.

---
-
-## TL;DR
-
-| # | Gap | Failure pattern in prod | Fix size |
-|---|---|---|---|
-| 1 | Tool-error recovery middleware | Orphan twenty-* services (4 shipped). Model keeps delete-and-recreating despite explicit prompt rule against it. | ~2 hr |
-| 2 | Browser-driver tool for the AI | "Should be live in 10s" — AI ships URLs without ever loading them; user discovers the 502. | ~4 hr |
-| 3 | Live UI state attached to chat messages | "this isn't working" / "fix the URL" with no signal of which "this". AI guesses, often wrong. | ~3 hr |
-| 4 | Diff preview / accept-changes gate | `fs_edit` writes straight to the dev container with no review surface. Fine for sub-second iteration; bad for prod-bound edits. | ~6 hr |
-
-Total: ~15 hr of work. None require new infra.
-
---
-
-## Gap 1 — Tool-error recovery middleware (highest ROI)
-
-**Failure observed:** in thread `d698ef40-…` ("Hey there, what can you see about this project?"), the AI hit
-`Conflict. The container name "/postgres-…" is already in use` **three separate times**.
-On each attempt it responded by *creating a new service with a new name*,
-not by calling `apps_unstick`. The prompt explicitly tells it not to do
-this and tells it the recovery sequence. The model still did it.
-
-**Why prompt rules fail here:** the model treats the system prompt as
-soft guidance against a 30k-token document; the tool result is concrete
-and 200ms-fresh. When tool reality contradicts prompt rules, tool
-reality wins.
-
-**Proposed fix:** middleware in `executeMcpTool` that pattern-matches
-known-recoverable errors and **injects a synthetic system message** into
-the conversation before the next round. The model can't ignore an
-injected instruction the way it can ignore a static prompt rule.
-
-```ts
-// In app/api/chat/route.ts, around the executeMcpTool call:
-const errorRecovery = detectKnownError(result);
-if (errorRecovery) {
-  messages.push({
-    role: "system",
-    content: `[RECOVERY] ${errorRecovery.diagnosis}. Required next action: ${errorRecovery.fix}. Do NOT ${errorRecovery.antipattern}.`,
-  });
-}
-```
-
-**Initial recovery rules** (high-confidence, low-false-positive):
-
-| Error signature | Diagnosis | Fix | Antipattern |
-|---|---|---|---|
-| `Conflict. The container name … is already in use` | Orphan container blocking new boot | `apps_unstick { uuid }` then `apps_deploy { uuid }` | Delete and recreate with a new name |
-| `pull access denied` / `manifest unknown` | Image not on the host yet | `apps_repair { uuid }` | Retry deploy without addressing the cause |
-| `port … is already allocated` | Another container holds the port | List containers, identify holder, decide | Pick a random different port |
-
-**Effort:** ~2 hr. New file `lib/ai/error-recovery.ts` with a registry of
-patterns + the injection in the chat route. Each rule is ~10 lines.
-
-**Slot into:** `BETA_LAUNCH_PLAN.md` Phase 2 (Stability & visibility) — fits next to 2.4 (deployment-failed webhook).
-
---
-
-## Gap 2 — Browser-driver tool for the AI
-
-**Failure observed:** in the same Twenty thread, the AI said *"It's
-fully deployed, healthy, and I've verified it's returning a 200 OK
-status"* — but the user saw "Unable to Reach Back-end" on the actual
-page. The AI checked Coolify's status reporting, not the rendered app.
-Also visible in the Dr Dave thread: *"Note: it might take 10-15 seconds
-on the very first load for the DNS to propagate"* — the AI hedged
-because it couldn't load the URL itself.
-
-**Why this matters for beta:** every "I deployed it" claim is unverified
-unless the AI can open the URL. Sentry (planned in P2.3) catches
-errors *after a user hits them*. A browser tool catches errors
-*before any user hits them*.
-
-**Proposed fix:** add a `browser.*` MCP tool surface backed by a
-headless Chromium running on the Coolify host (or in the vibn-dev
-container). Initial tools:
-
-| Tool | Purpose |
-|---|---|
-| `browser.navigate { url, timeoutMs? }` | Load the URL, return final URL + status code + page title |
-| `browser.screenshot { url }` | Visual confirmation. Return base64 PNG (or store in GCS) |
-| `browser.console_logs { url }` | Capture client-side JS errors (the `TypeError: reading 'z'/'j'/'aa'` from BETA P2.2 would be findable this way) |
-| `browser.fetch { url, headers? }` | HTTP-level smoke test. Subset of `http_fetch` but always from inside Vibn's network |
-
-**Implementation:** Playwright already has an MCP server (`@modelcontextprotocol/server-playwright`).
-Wire it as a Coolify service, expose via the same per-workspace MCP
-token Vibn already issues.
-
-**Effort:** ~4 hr. ~2 hr to deploy Playwright as a service, ~1 hr to
-add tool definitions, ~1 hr to wire prompt instructions ("after any
-deploy or `dev_server.start`, call `browser.navigate` to confirm").
-
-**Slot into:** Phase 2 (Stability & visibility) — pairs with the
-runtime error chase (2.1, 2.2) and the Sentry wiring (2.3).
-
---
-
-## Gap 3 — Live UI state attached to chat messages
-
-**Failure observed:** in the Dr Dave thread, user typed *"are you able
-to give me a preview url?"* The AI didn't know which port the
-Next.js dev server would bind to, what was already running, or
-whether the user was looking at the chat or another tab. It
-guessed and re-discovered everything from scratch.
-
-In the Twenty thread, *"can you see the different sections?"* — user
-meant Plan tab sections (Vision/Tasks/Decisions/Ideas). AI listed
-metadata. No way to know.
-
-**Why prompt rules can't fix this:** the AI literally lacks the
-information.
-
-**Proposed fix:** the chat panel sends a small `uiContext` object
-alongside every user message. Inject into the system prompt as a
-dynamic block (same shape as `activeBlock`):
-
-```ts
-{
-  currentRoute: "/mark-account/project/abc/hosting",
-  currentTab: "hosting",
-  visibleResources: [
-    { kind: "app", uuid: "y4cs…", name: "vibn-frontend" },
-    { kind: "service", uuid: "igcp…", name: "vibn-dev-twenty-crm" },
-  ],
-  lastUserActions: [
-    { at: "2m ago", action: "opened twenty-crm logs" },
-    { at: "5m ago", action: "switched to Hosting tab" },
-  ],
-}
-```
-
-System-prompt block becomes:
-
-> The user is currently looking at the **Hosting tab** (route: `…/hosting`).
-> Visible resources: `vibn-frontend`, `vibn-dev-twenty-crm`.
-> Recent actions: opened twenty-crm logs (2m ago), switched to Hosting (5m ago).
-> When the user says "this" / "it" / "the URL" — assume they mean
-> something visible in the current viewport unless they name something else.
-
-**Effort:** ~3 hr. ~1 hr to wire the chat panel's
-`uiContext` collection (existing route + tab state, last 5 actions
-from a small ring buffer in the panel), ~1 hr to plumb through the
-chat API, ~1 hr to add the prompt block.
-
-**Slot into:** Phase 3 (UX surfaces) — pairs with 3.2 (structured
-errors in chat) and 3.3 (empty-state nudges).
-
---
-
-## Gap 4 — Diff preview / accept-changes gate
-
-**Failure observed:** none yet, but the surface is exposed today —
-`fs_edit` writes directly to `/workspace` in the dev container. For
-ephemeral exploration this is correct (sub-second iteration is the
-whole Path B point). For changes destined to ship, the user has no
-review surface; they only see what changed after the AI summarizes.
-
-**Why this matters for beta:** the moment a paying user wants to
-"see what the AI changed before it goes live," there's nothing to
-show them. Cursor's whole UX is built on diffs the user accepts.
-
-**Proposed fix:** two-mode `fs_edit` / `fs_write`:
-
-1. **Direct mode (default for dev container):** write immediately. Current
-   behavior. Fine for "make the button blue" iteration.
-2. **Staged mode (default when `ship` is the next likely action):**
-   write to a shadow path, surface a diff in the chat UI, gate the
-   real write on a one-click "Accept" button.
-
-The model decides which mode based on context — or simpler: stage when
-the file is in a "protected" set (e.g. `prisma/schema.prisma`,
-`Dockerfile`, `package.json`, anything in `prod/` or `migrations/`),
-direct otherwise.
-
-**Effort:** ~6 hr. ~2 hr backend (shadow write + apply endpoint),
-~3 hr UI (diff renderer in the chat panel, accept/reject buttons),
-~1 hr prompt + tool changes.
-
-**Slot into:** Phase 4 (Onboarding & safety) — pairs with 4.5 (auth
-hardening) and 4.6 (compute quotas) as part of "what a stranger
-needs day 1."
-
---
-
-## Suggested sequencing
-
-If we ship in priority order:
-
-1. **Gap 1 first** — kills the worst pattern in prod for ~2 hr of work. Should be ahead of any new feature in Phase 2.
-2. **Gap 2 second** — closes the verify-deploy loop. Multiplies the value of every subsequent AI-shipped change because it's no longer blind.
-3. **Gap 3 third** — tighter conversational UX. Once 1 and 2 work, the remaining UX cliff is "AI doesn't know what I'm looking at."
-4. **Gap 4 last** — only matters once we have paying users editing prod-bound code. Pre-beta optional.
-
-Total effort to ship 1+2+3 (the meaningful UX wins): **~9 hours.**
-
---
-
-## How this changes BETA_LAUNCH_PLAN.md
-
-Two new tasks slot in:
-
- **P2.8** Tool-error recovery middleware (Gap 1) — block on nothing, ship before P2.4.
- **P2.9** Browser-driver MCP tool (Gap 2) — block on nothing.
-
-One new task in P3:
-
- **P3.7** UI-state injection into chat (Gap 3) — block on nothing.
-
-Gap 4 stays out of beta scope unless eval reveals real damage from
-unstaged edits.
+- The chat loop (`app/api/chat/route.ts`) acts as a robust harness that intercepts tool errors and automatically suggests recovery paths (e.g., port conflicts, container collisions).
+- The maximum tool execution loop is capped (`MAX_TOOL_ROUNDS=30`) to prevent runaway AI loops.
+- `fs_edit` uses line-number replacements alongside strict `oldString` matching to avoid Aider-style search-and-replace failures.
+- Sentry and Coolify deployment webhooks automatically pipe deployment/build failures back to the user/AI.
--- a/docs/AI_PATH_B_EXECUTION_PLAN.md
+++ b/docs/AI_PATH_B_EXECUTION_PLAN.md
@@ -1,288 +1,12 @@
-# Path B Execution Plan — Persistent Dev Container Architecture
+# AI Path B (Shipped)

-> The plan to replace Vibn's current "API-wrap-every-Coolify-action" agent
-> surface with a Claude-Code-style architecture: one persistent dev
-> container per Vibn project, ~10 composable tools, sub-15-second
-> iteration, and Coolify only touched at "ship it" time.
->
-> **Companion to:** [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (current
-> state) and [`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md)
-> (everything else).
->
-> **Status:** week 1 shipped (2026-04-28). Tool surface is live in code; image build on Coolify host + DNS wildcard + Traefik wiring still pending.
->
-> **Why this exists:** today's AI loop is *3–7 min to first preview, 2–4
-> min per iteration*, because every change goes through a Coolify nixpacks
-> build. That UX cannot host the marketplace / SaaS / iterative-build
-> stories Vibn is selling. Path B fixes the floor.
+> **Note:** This document outlines the architecture for "Path B", which shifted the AI's execution context from Cloud Run to persistent per-project Docker containers hosted on the Coolify server. This architecture was fully successfully shipped in May 2026.

---
+## Architecture
+- Every project has a persistent Gitea repository.
+- Every project gets a single `vibn-dev` container provisioned as a Coolify service (`ensureDevContainer`).
+- The AI runs its tools (like `shell_exec` and `fs_*`) *inside* this container using `docker exec` via the Coolify API.
+- Dev servers (like `npm run dev`) bind to `0.0.0.0:3000` and are exposed to the internet via Traefik wildcard subdomains (`*.preview.vibnai.com`).
+- When the user is ready, the code is committed to Gitea and deployed to production via `apps_deploy`.

-## 1. The user experience this unlocks
-
-Reference scenario: a non-technical founder chats *"build me a
-two-sided marketplace for handmade ceramics."*
-
-| Phase | Path A (today) | Path B (target) |
-|---|---|---|
-| Discovery & OSS pick | OK | OK |
-| Fork an OSS base (e.g. Sharetribe, 800 files) | ~15 min of single-file commits, 800 webhook fires | `git clone` in 8s |
-| First live preview | 3–7 min (Coolify build) | ~30s (Vite HMR in dev container) |
-| Each iteration | 2–4 min (rebuild) | 3–15s (HMR / process restart) |
-| User makes 10 small decisions | ~40 min of staring at spinners | ~3 min of conversation |
-| "Ship it" → real domain | already 3 min | 3 min (unchanged — this is the only Coolify build) |
-| Total time to live, polished marketplace | 30–60 min, often abandoned | ~20 min, mostly the user thinking |
-
-The asymmetry is structural, not optimisable inside Path A.
-
---
-
-## 2. Architecture overview
-
-```
-┌──────────────────────────┐     ┌────────────────────────────────┐
-│  vibnai.com chat (user)  │ ←→  │  /api/mcp                       │
-└──────────────────────────┘     │   ├ shell.exec                  │
-                                 │   ├ fs.read / fs.edit / fs.glob │
-                                 │   ├ dev_server.start            │
-                                 │   ├ ship                        │
-                                 │   └ apps.* / databases.* / ...  │
-                                 └────────────┬───────────────────┘
-                                              │
-                                              ▼ (workspace-scoped)
-                          ┌────────────────────────────────────┐
-                          │  Per-Vibn-project Coolify project  │
-                          │   ├ vibn-dev   ← dev container     │
-                          │   ├ web         ← prod app         │
-                          │   ├ db                              │
-                          │   └ ...                             │
-                          └────────────────────────────────────┘
-```
-
-### Per-project dev container — the only new piece
-
-For every active Vibn project, we run **one long-lived Coolify
-service named `vibn-dev`** inside that project's dedicated Coolify
-project (Stage 2/3 of per-project isolation already shipped).
-
-| Property | Value |
-|---|---|
-| **Image** | `ghcr.io/vibnai/vibn-dev:latest` (we build & maintain) |
-| **Base** | Ubuntu 24.04 |
-| **Pre-installed** | Node 20, bun, pnpm, Python 3.12 + uv, Go 1.23, Rust, git, gh, `tea` (Gitea CLI), ripgrep, fd, jq, curl, tar, openvscode-server |
-| **Default `cwd`** | `/workspace` (persistent volume containing the Gitea working tree) |
-| **Persistent volumes** | `/workspace` (git tree), `/cache/{npm,pip,go,cargo}` (package caches) |
-| **Resource floor** | 512 MB / 0.25 CPU when idle |
-| **Resource ceiling** | 4 GB / 2 CPU during builds (configurable per workspace plan) |
-| **Idle suspend** | After 30 min no `shell.exec` activity |
-| **Re-wake** | Any `shell.exec` / `fs.*` / `dev_server.*` call |
-| **Ports** | 3000–9999 reserved for the AI's dev server, exposed at `https://preview-{ws}-{project}.vibnai.com` via Traefik wildcard |
-| **Tenancy** | Inherits per-project Coolify isolation — workspace can never reach into another's dev container |
-
-### Why this shape (and not e2b / Cloud Run / VM-per-task)
-
- We already have Coolify, per-project Coolify projects, and Coolify
-  exec primitives. Adding one service per project is zero new infra.
- Persistence (workspace state, package cache, git working tree)
-  matters more than per-task isolation for our user. Founders return
-  to projects across sessions.
- Tenant safety is already solved at the Coolify-project layer.
- Cost stays bounded: one container per *active* project, idle-suspended.
- Upgrade path to e2b / Firecracker exists later if needed (replace the
-  executor, keep the tool surface).
-
---
-
-## 3. Tool surface
-
-### New tools (the AI's primary working set)
-
-| Tool | Signature | Purpose |
-|---|---|---|
-| `shell.exec` | `{ cmd, cwd?, timeoutSec?, env? }` | Run any shell command in the dev container. Streams stdout/stderr back. Capped 15 min. |
-| `fs.read` | `{ path, ref? }` | Read a file (or directory listing) from `/workspace`. |
-| `fs.write` | `{ path, content }` | Create/overwrite a file. |
-| `fs.edit` | `{ path, oldString, newString, replaceAll? }` | Aider-style search/replace. Fails if `oldString` not found / not unique. |
-| `fs.glob` | `{ pattern, cwd? }` | List files matching a pattern (e.g. `**/*.tsx`). |
-| `fs.grep` | `{ pattern, glob?, contextLines? }` | ripgrep-backed code search. |
-| `fs.delete` | `{ path }` | Delete a file or directory. |
-| `dev_server.start` | `{ cmd, port, name? }` | Start a long-running process (e.g. `npm run dev`). Returns a public preview URL. |
-| `dev_server.stop` | `{ id }` | Kill a dev server. |
-| `dev_server.list` | — | What's running, on what URL. |
-| `ship` | `{ projectId, commitMsg, deploy? }` | `git add . && git commit && git push` to Gitea, then trigger Coolify deploy of the prod app. The "graduate to production" tool. |
-
-### Kept (orchestration — these are correctly modeled as APIs)
-
- `apps.*` — Coolify app CRUD, logs, domains, env vars, etc.
- `databases.*`, `auth.*`, `domains.*`, `storage.*` — infrastructure primitives.
- `projects_get`, `projects_list`, `workspace_describe` — context.
- `github_search`, `github_file`, `http_fetch` — external lookup.
-
-### Deprecated (kept for back-compat, banner in docs)
-
- `gitea_file_read`, `gitea_file_write`, `gitea_file_delete`,
-  `gitea_branches_list`, `gitea_branch_create`,
-  `gitea_repo_create`, `gitea_repo_get`, `gitea_repos_list` — the
-  AI uses `shell.exec` (`git`/`tea` CLI) and `fs.*` instead.
- `apps.exec` — kept (it's still useful for prod-container debugging),
-  but deprecated for *dev-time* code work.
-
-**Net change:** 53 tools → ~30 tools, but the new ones compose to do
-everything the old ones did and more.
-
---
-
-## 4. The system prompt rewrite
-
-The AI's prompt today says *"call gitea_file_write to push code."* It
-becomes:
-
-> You have a real Linux dev environment for this project at `/workspace`.
-> Use `shell.exec` to run any command (npm, git, tea, python, anything).
-> Use `fs.edit` for surgical changes, `fs.write` for new files.
->
-> Standard loop:
-> 1. `shell.exec { cmd: "git status" }` to see what's there.
-> 2. Edit / create files via `fs.edit` / `fs.write`.
-> 3. `shell.exec { cmd: "npm test" }` (or relevant test runner).
-> 4. `dev_server.start` to give the user a live preview URL.
-> 5. When the user says "ship it", call `ship` — that pushes and
->    triggers the production Coolify deploy.
->
-> NEVER call `apps_create` to deploy code that hasn't been tested via
-> `shell.exec` first. The dev container is your safety net.
-
---
-
-## 5. Week-by-week execution
-
-### Week 1 — Foundations (dev container + shell) — **SHIPPED 2026-04-28**
-
-**Goal:** AI can clone a repo, install deps, run a script.
-
- [x] `vibn-dev/Dockerfile` (Ubuntu 24.04 + git + ripgrep + python3 + mise lazy toolchains). `setup-on-coolify.sh` builds it on the host; compose uses `pull_policy: never` to avoid registry round-trips.
- [x] `lib/dev-container.ts`: ensure / exec / suspend / resume helpers. Backed by `fs_project_dev_containers` (auto-created).
- [x] `devcontainer.{ensure,status,suspend}` MCP tools.
- [x] `shell.exec` + `fs.{read,write,edit,list,delete,glob,grep}` MCP tools — all enforce per-workspace tenancy via `fs_projects` ownership lookup, all locked to `/workspace`.
- [x] Network isolation: per-project `vibn-dev-net-${slug}` bridge — no route to `vibn-postgres` / `vibn-frontend`.
- [x] Kill switch: `/api/admin/path-b/{disable,enable}` flips a feature flag in <10s.
- [x] `vibn-tools.ts`: 11 new Gemini tool defs, smoke test passes (63 tools accepted).
- [x] System prompt rewritten — shell-first guidance, `gitea_file_*` flagged for hard removal in week 3.
-
-**Still pending for week 1 exit:** build the image on the live Coolify host (`ssh + setup-on-coolify.sh`), end-to-end verify `devcontainer.ensure → shell.exec ls` against a real project once the frontend deploy lands.
-
-### Week 2 — Preview URLs + iteration — **PARTIALLY SHIPPED 2026-04-28**
-
-**Goal:** AI starts a dev server, user clicks a preview URL, sees their app.
-
- [ ] DNS: `*.preview.vibnai.com → coolify-host-ip` in OpenSRS. **Manual step, not yet done.**
- [ ] Traefik wildcard cert via DNS-01 against OpenSRS. **Config staged in `vibn-dev/PREVIEWS.md`, not yet applied to live Traefik.**
- [x] `dev_server.{start,stop,list,logs}` MCP tools. Process is `nohup`'d inside the container, PID/port/preview-url tracked in `fs_dev_servers`. Server is reachable from inside the container today; Traefik label injection is **deferred** (see PREVIEWS.md for the recommended pre-allocated-port-range approach).
- [x] `fs.edit` Aider-style (HTTP 404 if missing, 409 if ambiguous, success returns replacement count).
- [x] Per-container CPU/RAM caps: 1 vCPU / 1 GiB by default. Tier scaling via env var.
- [x] System prompt rewritten with shell-first recipe.
-
-**Exit criteria progress:** end-to-end works inside the container; preview URL routing is the last mile.
-
-### Week 3 — Ship-it path + cleanup — **PARTIALLY SHIPPED 2026-04-28**
-
-**Goal:** the dev container's working tree graduates to production.
-
- [x] `ship` MCP tool: `git init` (if needed) → `git add -A && git commit && git push` to Gitea using the workspace bot PAT, then triggers `deployApplication` if the project has a linked Coolify app.
- [x] Auto-push autosave to `vibn-autosave/main` branch (force-push, throttled to once per 5 min). Endpoint: `POST /api/admin/path-b/autosave { projectId | sweep:true }`.
- [x] Idle-suspend sweep: `POST /api/admin/path-b/idle-sweep[?minutes=30]`. Wire to a 5-min cron once we trust the suspend path.
- [ ] Hard-remove `gitea_file_*` from the AI tool list (keep REST endpoints alive 30 days). **Deferred to next week so we can A/B the new tools first.**
- [ ] Update `AI_CAPABILITIES.md`. **Deferred — will rewrite once eval data is in.**
-
-**Exit criteria progress:** ship loop is functionally complete. Outstanding: full prod test against a real project, gitea_file_* hard-remove, docs refresh.
-
-### Week 4 — Eval, polish, IDE drop-in
-
-**Goal:** measure that this actually delivers the promised UX, ship the optional graduation path.
-
- [ ] **Eval harness:** 10 reference prompts (TODO app, marketplace, blog with auth, kanban, image-uploader, AI chatbot, simple e-commerce, dashboard, REST API + DB, static site). Measure: time-to-first-preview, time-to-shipped, AI tool-call count, success rate. Compare to a baseline run on Path A.
- [ ] **Theia drop-in:** expose openvscode-server (already in the image) at `https://ide-{ws}-{project}.vibnai.com`. Optional toggle in chat UI: "Open IDE." Lets a user-becoming-developer drop into the same `/workspace` the AI's been editing.
- [ ] **Bug fixes** found during eval.
- [ ] **Docs:** update Vibn's user-facing pages to reflect the new "describe → live preview in seconds → iterate → ship" flow.
-
-**Exit criteria:** eval shows ≥3× speedup on time-to-first-preview vs.
-Path A, ≥80% success rate on the 10 reference prompts.
-
---
-
-## 6. OSS we will lean on (not reinvent)
-
-| Need | OSS choice | Notes |
-|---|---|---|
-| Dev container image base | Ubuntu 24.04 + toolchains | We bake & maintain. ~1 GB. |
-| In-browser IDE (week 4 graduation path) | `openvscode-server` (`gitpod-io/openvscode-server`, MIT) | Pre-installed in the image. Optional toggle. |
-| Edit format | **Aider's search/replace block format** (`Aider-AI/aider`, Apache 2.0) | Borrow the format + error semantics. |
-| Process supervision inside the container | `tini` (already standard) + a tiny in-house supervisor for `dev_server.*` | No need for full systemd. |
-| Code search inside the container | `ripgrep` (`BurntSushi/ripgrep`, MIT) | Pre-installed. `fs.grep` is a thin wrapper. |
-| Git inside the container | `git` + `tea` (Gitea CLI, MIT) | `tea` lets the AI do PR ops without us building gitea_pr_* tools. |
-| Reference for end-to-end agent loops | `All-Hands-AI/OpenHands` (MIT) | Read their runtime + tool design. Don't import their code. |
-| Reference for fast iteration UX | `bolt.new` (`stackblitz/bolt.new`) | UX north star, not a code source. |
-
---
-
-## 7. Risks & open questions
-
-| Risk | Mitigation |
-|---|---|
-| **Dev containers eat money.** 100 active projects × 24/7 = ~$50/mo wasted. | Idle-suspend after 30 min. Resume in <5s. Per-plan caps. Auto-delete suspended-and-untouched volumes after 30 days. |
-| **`shell.exec` is the universal escape hatch — security?** AI inside a single workspace's container can do anything that container can do. | (a) Per-project Coolify isolation. (b) **Network policy: dev containers have NO route to internal Vibn services (vibn-postgres, vibn-frontend, Coolify control plane). Implemented via Docker network rules in week 1, not deferred.** (c) Audit log on every `shell.exec` call. (d) Per-container CPU/RAM caps absorb fork-bomb / coin-mining attempts. |
-| **Preview URL leaks.** `https://preview-mark-ceramic-market.vibnai.com` is publicly resolvable. | Default: random suffix in subdomain (`preview-mark-ceramic-market-7a3f.vibnai.com`) — ~64 bits of unguessability. Optional Vibn-session-cookie auth as paid-tier feature later. |
-| **Hot reload through Traefik.** WebSocket / HMR can be finicky over a reverse proxy. | **Spike on week 1, day 1**: bring up a Vite dev server inside vibn-dev, expose via Traefik, edit a file, verify HMR fires. Failure here is the biggest "things look fine until you actually test" risk; de-risk early. |
-| **Image size / pull time on first project.** ~1 GB pull adds 30–60s to first dev container spin-up. | (a) Pre-pull image on every Coolify host on deploy. (b) **Keep base image small (~500 MB: OS + git + ripgrep + supervisord + IDE server). Lazy-install language toolchains via `mise` on first project use.** Prevents the image from bloating to 4 GB six months from now. |
-| **Dependency cache poisoning.** Cached `node_modules` from project A bleeds into project B. | Caches are per-project (volume `vibn-dev-cache-{projectId}`). Never share. Take the slower-first-install hit; add a Verdaccio mirror later only if it bothers anyone. |
-| **AI keeps calling `gitea_file_*` instead of `shell.exec`.** | **Hard removal from AI's tool list in week 3, not soft deprecation.** Keep REST endpoints alive for a 30-day grace period for any external MCP client. After 30 days, return 410 Gone. The AI has no muscle memory; no graceful migration needed. |
-| **What if the user has no Vibn project yet?** | First chat creates a project + provisions its Coolify project + spins up `vibn-dev` lazily. ~10s overhead, one-time. Stream progress to the chat ("creating workspace... installing tools..."). Same UX bolt.new uses while WebContainers boot. |
-| **Coolify host disk dies → users lose unshipped `/workspace` work.** | **Auto-push to Gitea `vibn-autosave/main` branch every 5 min of activity, plus before idle-suspend.** Treat Gitea as canonical, container disk as ephemeral. Built in week 1, day 2 (not optional). |
-| **Path B turns out to be wrong; we need to revert.** | **Kill-switch admin endpoint (`POST /api/admin/path-b/disable`) flips a feature flag — all new chat sessions go back to Path A; existing dev containers drain.** ~10-min revert window. Built week 1. |
-
---
-
-## 8. Success metrics
-
-We're not done until **all four** are true on the eval harness:
-
-| Metric | Target | Today (Path A) |
-|---|---|---|
-| Time-to-first-preview (10 reference prompts, p50) | ≤ 60 s | ~5 min |
-| Iteration loop (small edit → user sees change) p50 | ≤ 15 s | ~3 min |
-| Tool calls per "build me X" task (median) | ≤ 30 | ~80 |
-| End-to-end success rate (live deployable result) | ≥ 80% | ~50% |
-
---
-
-## 9. What this changes about the existing roadmap
-
- **Tier 1.5 ("Code authoring capability") is collapsed into this doc.** C1–C9 mostly disappear (replaced by `shell.exec` + `fs.edit`); C10 ("persistent agent dev workspace") **is** Path B.
- **Tier 1 P5.1–P5.4 are unchanged.** Domains, email, storage, workers — still the right next infra primitives. Path B doesn't replace them; it makes the AI capable enough to actually use them.
- **Tier 2 P6.x** (backups, runtime logs, scoped keys) — unchanged.
- **`gitea_*` tools shipped 2026-04-28** are now legacy. Mark deprecated in week 3. Remove in a future cleanup once telemetry confirms zero usage.
-
---
-
-## 10. Decision needed before week 1 starts
-
-1. **Approve Path B as the primary architecture for code authoring.** (If no, this doc dies here.)
-2. **Approve the dev-container-as-Coolify-service implementation choice.** Alternatives: separate dev-host, e2b self-host, Cloud Run jobs. Picked Coolify-service for zero new infra; flag if you want to revisit.
-3. **Approve the deprecation of `gitea_file_*` tools.** They were shipped today; deprecating them within 3 weeks is fine if the path forward is clearer, embarrassing if we keep them around as half-working alternates.
-4. **Approve the resource cap defaults** (free: 1 GB / 0.5 CPU, paid: 4 GB / 2 CPU). Or set different numbers.
-
-Once those four are decided, week 1 starts.
-
---
-
-## How to use this doc
-
- This is the *architectural* execution plan. The detailed task list
-  goes into the agent's TodoWrite per-week, not into this file.
- When an item ships, **move it from "planned" to "shipped"** in
-  [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) and link the commit/PR.
- When a risk in §7 turns out to be real, document the mitigation
-  outcome inline so future readers see what actually happened.
- This doc supersedes the proposed Tier 1.5 in
-  [`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md). Add a
-  one-line pointer there once approved.
+*(Refer to `lib/ai/vibn-tools.ts` and `app/api/mcp/route.ts` for the live implementation).*
--- a/docs/PROJECT_PAGE_ARCHITECTURE.md
+++ b/docs/PROJECT_PAGE_ARCHITECTURE.md
@@ -1,275 +1,11 @@
-# Project Page Architecture — Product / Infrastructure / Hosting
+# Project Page Architecture

-> The plan to collapse the 16-page sidebar mess at
-> `/[workspace]/project/[projectId]/*` into 3 founder-friendly
-> sections, and to make `/project/<id>` actually reflect what the AI
-> is doing in the dev container instead of stale Gitea/prod-Coolify
-> data.
->
-> **Companion to:** [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md)
-> (Path B is the engine; this doc is the dashboard for it).
->
-> **Status:** week 1 doc + home-page redesign in flight (2026-04-28).
+> **Note:** The UI was heavily refactored. The primary surfaces for a project are now:

---
+1. **The Plan Tab (`/plan`):** Contains the project's vision/objective document, tasks, decisions, and raw ideas. The AI acts as a scribe here.
+2. **The Product Tab (`/product`):** Lists the live codebases (Gitea) and running images (Docker containers).
+3. **The Infrastructure Tab (`/infrastructure`):** Lists the underlying resources (PostgreSQL databases, Redis, etc.) managed by Coolify.
+4. **The Hosting Tab (`/hosting`):** Lists live runtime environments, logs, and preview URLs.
+5. **The Chat Panel:** Available on all project surfaces as a slide-out, used to orchestrate work.

-## 1. Why this exists
-
-Today the project page (`/[workspace]/project/[projectId]`) shows two
-tiles — Code + Infrastructure — and links to a sidebar with 16
-sub-routes (`build`, `run`, `infrastructure`, `deployment`,
-`overview`, `insights`, `analytics`, `prd`, `tasks`, `settings`,
-`assist`, `design`, `growth`, `grow`, `mvp-setup`, `code` — the last
-of which doesn't exist as a route, so the home tile is a dead link).
-
-Two structural problems:
-
-1. **The sidebar grew without an anchor concept.** Founders have no
-   mental model of what the 16 pages map to; they just see a list
-   and click around hoping for the right one. Half the pages are
-   placeholders ("Coming soon"); the rest overlap.
-2. **None of the data sources have been updated for Path B.** The
-   Code tile reads the Gitea repo (production master branch), but the
-   AI now writes to the dev container's `/workspace`, often without
-   pushing for hours. The Infrastructure tile reads production
-   Coolify apps; new `dev_server.start` previews don't show up
-   anywhere. So when AI does great work in chat, the project page
-   doesn't update — the user has to tab back to chat to see anything.
-
---
-
-## 2. The framing
-
-Three sections, founder-friendly names, every project on Vibn maps
-cleanly into all three:
-
-| Section | What it is | Founder asks… |
-|---|---|---|
-| **Product** | Custom code, design, content built for THIS vision | *"What did I build?"* |
-| **Infrastructure** | Reusable, swappable third-party services (auth, db, email, payments…) | *"What do I depend on?"* |
-| **Hosting** | Where the product runs and how people reach it (Coolify, domain, observability, cost) | *"Where does it live?"* |
-
-### The boundary rule
-
-> **Custom code = Product. Third-party service = Infrastructure.**
-> Runtime + reachability = Hosting.
-
-Concrete edge cases:
-
- A custom `/api/upload` endpoint that calls S3 → endpoint is
-  **Product**, S3 bucket + credentials are **Infrastructure**.
- Custom job that sends a welcome email → job is **Product**, the
-  job runner (Sidekiq/BullMQ) and email service (Resend) are
-  **Infrastructure**.
- Webhook handler that processes Stripe events → handler is
-  **Product**, Stripe is **Infrastructure**.
- Coolify scheduled task that runs your code → your code is
-  **Product**, Coolify itself is **Hosting**.
-
---
-
-## 3. Charters
-
-### Product
-
-Everything custom-built for this specific vision. The unique IP that
-wouldn't exist without this product.
-
-**Includes:**
- Frontend web app
- Marketing site
- Custom backend code & APIs
- Custom business logic
- Custom jobs / runners (the code, not the runner)
- Brand, copy, design system
- The repository itself
- Customer base — the actual users you've earned
-
-**Rule:** if you wrote it for this product, it's Product. If it's
-`node_modules` or a third-party SDK, it's not.
-
-### Infrastructure
-
-The reusable, swappable services your product depends on. The
-annoying multi-vendor world where you have to pick a provider.
-
-**Includes:**
- Auth provider (Clerk, Pocketbase, Authentik, Google OAuth, …)
- Database (Postgres, MySQL, MongoDB, Redis, …)
- File storage (S3, R2, MinIO)
- Email (Resend, SendGrid, SES)
- Payments (Stripe, Paddle, Lemon Squeezy)
- Analytics (Plausible, PostHog, GA)
- Search (Algolia, Meili, Typesense)
- LLM provider (OpenAI, Anthropic, Gemini, Vertex)
- Queues, maps, SMS, push notifications, …
- Secrets and API keys that wire all of the above
-
-**Rule:** if you could swap the vendor without changing your product
-code, it's Infrastructure.
-
-### Hosting
-
-Where the product physically runs and how people reach it.
-
-**Includes:**
- Container runtime (Coolify in our case)
- Domain + DNS + SSL
- CDN / edge
- Observability (logs, errors, uptime)
- Backups
- Monthly cost
-
-**Rule:** it's about *runtime and reachability,* not about what the
-software does.
-
---
-
-## 4. Future sections (deferred)
-
-Add as separate top-level cards once they become real concerns:
-
- **Models** — for AI-heavy products: which LLMs, which embedding
-  model, prompt versions, eval scores, cost-per-call.
- **Analytics** — when there are real users worth measuring.
- **Marketing** — campaigns, blog, SEO, social, when there's a
-  growth motion.
- **Compliance** — Terms, Privacy, GDPR, SOC2, when shipping to
-  paying customers.
- **Support** — helpdesk, chat, status page, when there are
-  customers complaining.
- **Team** — when the project has more than one collaborator.
-
-Same charter template each time. Same rule: code = Product,
-swappable = Infrastructure, runs/reachable = Hosting, otherwise it
-needs its own section.
-
---
-
-## 5. Mapping today → tomorrow
-
-| Today's page | Where it goes | Notes |
-|---|---|---|
-| `(home)/page.tsx` | New `(home)/page.tsx` (3-card grid) | Full redesign |
-| `code` (404) | `product/` (new) | Stub the route, point home tile at it |
-| `build` | Subroute under `product/files` (later) | Heavy 1626 lines; preserve the file tree component |
-| `run` | `hosting/` | Production runtime |
-| `infrastructure` | `hosting/` | Same data, different name |
-| `deployment` | `hosting/deploys` (later) | Deploy history is Hosting |
-| `overview` | Subroute under `product/` or merged into home | Decide once we see how home feels |
-| `prd` | Subroute under `product/` (vision) | Or its own "Define" section if we add one |
-| `tasks` | Subroute under `product/` (roadmap) | Or its own section later |
-| `assist` | `product/` (it's emails/chat your product sends) | These ARE product features |
-| `design` | `product/design` | Custom for this vision |
-| `growth`, `grow`, `analytics`, `insights`, `mvp-setup` | Defer, probably absorbed into a future "Analytics" or "Marketing" section | Many are placeholders today |
-| `settings` | Top-right gear (lives outside the 3 sections) | Project-level meta |
-
-**Net:** 16 routes → 3 sections (+ settings). 8+ pages get rationalized
-into nothing because they were duplicating their neighbors.
-
---
-
-## 6. Phased delivery
-
-### Phase 1 — Tab navigation + section stubs (this session)
-
-The three sections are TABS at the project level, not a card-grid
-landing page. A founder lands on the project URL and is immediately
-inside Product (the default tab); flipping to Infrastructure or
-Hosting is one click and stays in the same view. No
-intermediate "click a tile to drill in" step.
-
-URL shape:
-
-```
-/[workspace]/project/[id]                 → 308 redirect to /product
-/[workspace]/project/[id]/product         → Product tab
-/[workspace]/project/[id]/infrastructure  → Infrastructure tab
-/[workspace]/project/[id]/hosting         → Hosting tab
-```
-
-A shared layout at the project root renders:
-
- Project header (name, vision, stage pill, settings gear)
- Tab bar (Product · Infrastructure · Hosting) — active tab
-  highlighted; each tab carries a tiny status dot (green/amber/grey)
- Slot for the active tab's page
-
-The current `(home)/page.tsx` (the two-tile landing) is replaced by
-the redirect.
-
-**Don't kill anything in `(workspace)/`.** Existing 16 routes stay
-alive while we migrate. Sidebar still works for them.
-
-### Phase 2 — Wire data sources
-
- **Product card** reads from the dev container's `/workspace`:
-  - File count + recent edits via `fs.list` against the project's
-    dev container
-  - User count from the project's auth provider (Pocketbase /
-    Clerk / etc.)
-  - Frontend URL from `dev_server.list` or production `apps_list`
- **Infrastructure card** reads from Coolify databases, env vars,
-  and known integrations:
-  - Database type + size
-  - Auth provider name
-  - Wired services (any env var matching `STRIPE_*`, `RESEND_*`,
-    etc.)
- **Hosting card** reads from Coolify apps + domains + container metrics:
-  - Production URL, SSL status, last deploy
-  - Monthly cost (Coolify resource usage × pricing)
-  - Recent error count (from logs)
-
-### Phase 3 — Section detail pages
-
-Build each of `/product`, `/infrastructure`, `/hosting` as a real,
-useful surface. Each page can have internal subnav for the bits
-listed in its charter (e.g., Product has Frontend, Backend, Jobs,
-Brand, Customers; Infrastructure has Auth, DB, Storage, Email,
-Payments, …).
-
-### Phase 4 — Migration / deletion
-
-Once the new structure is proven, redirect the legacy routes:
-
- `code` → `product`
- `build` → `product/files`
- `run` → `hosting`
- `infrastructure` → `hosting`
- `deployment` → `hosting/deploys`
- `prd`, `tasks`, `assist` → `product/...`
- `growth`, `grow`, `analytics`, `insights`, `mvp-setup` → soft-delete
-  with a tombstone redirect to `product` or to a future section page.
-
---
-
-## 7. Open questions
-
- **Where do the chat threads live?** They're a per-project
-  conversation surface today (right rail in the chat panel). I'd
-  argue they're not a section — they're *across* sections, like the
-  AI is. Keep as the persistent right rail.
- **Settings is technically project-level meta**, not one of the
-  three sections. Where does it surface? Gear icon in the page
-  header, opens settings as a side sheet or as a separate route.
-  Decide when we get there.
- **Mobile layout** — three cards stack vertically; no special
-  layout needed. The section detail pages need a layout pass when
-  we get to phase 3.
-
---
-
-## 8. Success criteria
-
-You should be able to look at `/project/<id>` after AI activity in
-chat and immediately see:
-
-1. *"What did the AI just build?"* → Product card updated count of
-   files + recent diffs.
-2. *"What's it depending on?"* → Infrastructure card shows the new
-   Postgres, the new Stripe key, etc.
-3. *"Is it live?"* → Hosting card shows the dev preview URL or the
-   production URL with status.
-
-If any of those three answers requires going back to the chat or
-checking another page, the redesign hasn't worked.
+*(Refer to `vibn-frontend/app/[workspace]/project/[projectId]` for the UI implementation).*
--- a/docs/SENTRY_AS_PRODUCT.md
+++ b/docs/SENTRY_AS_PRODUCT.md
@@ -1,258 +1,9 @@
-# Sentry-as-Product — Proposal
+# Sentry as a Product (Shipped)

-> Today's Sentry wiring catches errors in **the Vibn platform**.
-> The bigger opportunity is wiring Sentry into **every project Vibn
-> ships**, then feeding those errors back into the user's AI chat.
-> Difference between "an AI that codes" and "an AI that owns the
-> product."
+> **Note:** This spec was implemented in May 2026.

-## TL;DR
-
-Today, when a Vibn user's deployed app crashes for real users:
-
-```
-real user → site 500s → user closes tab, never tells founder
-                    → founder finds out hours/days later (or never)
-                    → AI in Vibn chat has zero idea anything is wrong
-```
-
-The fix is to make every Vibn project ship with Sentry pre-wired,
-then expose the error feed to the AI as a tool. Total effort:
-**~8 hours**, in 4 stages, each independently shippable.
-
-| Stage | Capability | Effort | Unlocks |
-|---|---|---|---|
-| 1 | Auto-provision a Sentry project per Vibn project on first deploy | ~3 hr | Real-user errors captured at all |
-| 2 | Bake Sentry into every scaffold template | ~2 hr | Capture works without user setup |
-| 3 | Add `project_recent_errors` MCP tool for the AI | ~2 hr | AI can answer "is anything broken?" |
-| 4 | Auto-surface unresolved errors at chat-turn start | ~1 hr | AI proactively offers fixes |
-
-Total: **~8 hr**, no new infra (we already have Sentry org access,
-Coolify env API, scaffold templates, MCP tool registry).
-
---
-
-## Why this is the right next investment
-
-### The current loop is broken at the seam between user and platform
-
-Vibn's value proposition is "the AI is your technical co-founder."
-That promise breaks the moment the AI's last commit causes a real
-user error and the AI doesn't know about it. The current loop:
-
-```
-1. User describes feature in chat
-2. AI ships code
-3. AI says "deployed, give it a try"
-4. (silence)
-5. Real users hit edge cases → 500s → bounce
-6. Founder eventually notices via support ticket / analytics dip
-7. Founder pastes error back to AI
-8. AI fixes
-```
-
-Steps 4–6 are dead air for the founder, **and the AI cannot help
-during them.** This is the gap that separates Vibn from "any IDE
-with an LLM."
-
-### What it looks like with this proposal shipped
-
-```
-1. User describes feature in chat
-2. AI ships code
-3. AI says "deployed, give it a try"
-4. Real users hit edge cases → 500s → Sentry captures
-5. (Founder opens Vibn chat 3 hrs later for unrelated reason)
-6. AI: "Hey — checkout has 500'd for 3 users in the last hour
-        because `customer.email` is undefined on
-        app/checkout/route.ts:47. Want me to fix it?"
-7. AI fixes, deploys, marks issue resolved in Sentry
-```
-
-The AI becomes the on-call engineer. This is what "technical
-co-founder" actually means and we are 8 hours away from it.
-
-### Why now (not Phase 4)
-
- The Sentry wiring we just shipped for vibn-frontend gave us:
-  - A working Sentry org (`vibnai`)
-  - An auth token with project-management scope
-  - Verified knowledge that the build args / source maps flow works
-  - A working `withSentryConfig` recipe in `vibn-frontend/next.config.ts`
- All of those are reusable for stage 1 and 2 of this proposal.
- Doing this **before** the beta means user projects start emitting
-  error data on day one, so by the time we're debugging real beta
-  user pain, we have a month of history to reason about.
- Doing it after the beta means we'd have to retroactively
-  instrument projects that have already been deployed for weeks.
-
---
-
-## Stage 1 — Auto-provision a Sentry project per Vibn project (~3 hr)
-
-**Goal:** when a user creates a Vibn project, the platform creates a
-matching Sentry project under the `vibnai` org and stashes the DSN
-+ auth token in Coolify env vars on the user's app.
-
-**What gets built:**
-
-1. **A `provisionSentryProject(projectId, name)` helper** in
-   `vibn-frontend/lib/integrations/sentry.ts`. Calls Sentry's
-   `POST /api/0/teams/vibnai/{team}/projects/` with the project
-   slug, returns the DSN.
-2. **Hook into project-create flow** — on first successful deploy,
-   call the helper and write the resulting DSN + auth token into
-   Coolify env vars (`NEXT_PUBLIC_SENTRY_DSN`,
-   `SENTRY_AUTH_TOKEN`) for that app via the same Coolify API we
-   used today.
-3. **Idempotency** — if the Sentry project already exists, fetch
-   its DSN instead of creating a duplicate. Same project name
-   convention every time: `vibn-{workspace}-{projectSlug}`.
-4. **Storage** — store `sentryProjectSlug` and `sentryAuthTokenId`
-   on the Postgres `projects` row so we can look them up later
-   without re-walking the Sentry org.
-
-**Risk:** Sentry's API rate-limits team-project creation. We bypass
-this by reading-before-writing, so the only API cost on subsequent
-deploys is one GET.
-
-**Definition of done:** create a fresh Vibn project → check Sentry
-org → see a project named `vibn-{ws}-{slug}` → check Coolify env on
-that app → see DSN populated.
-
---
-
-## Stage 2 — Bake Sentry into every scaffold template (~2 hr)
-
-**Goal:** every Next.js / Vite / etc. starter template Vibn ships
-already has Sentry wired up. User does nothing.
-
-**What gets built:**
-
-1. **For each scaffold template in `vibn-frontend/lib/scaffold/`**,
-   add the same files we shipped today:
-   - `instrumentation.ts`
-   - `instrumentation-client.ts`
-   - `app/global-error.tsx` (Next.js) / equivalent boundary (Vite)
-   - `next.config.ts` wrapped with `withSentryConfig` (Next.js)
-   - `vite.config.ts` with `sentryVitePlugin` (Vite)
-   - `Dockerfile` ARG declarations for `NEXT_PUBLIC_SENTRY_DSN` +
-     `SENTRY_AUTH_TOKEN`
-2. **Add `@sentry/nextjs` (or `@sentry/react` + `@sentry/vite-plugin`)
-   to each template's `package.json` `dependencies`.**
-3. **Document in template README** that Sentry is pre-wired and the
-   user doesn't need to do anything.
-
-**Risk:** Sentry's wrapper sometimes interacts badly with custom
-build configs (e.g. monorepos, custom webpack rules). Mitigation:
-the `errorHandler` we set today (`console.warn` instead of throw)
-ensures source map upload failures don't break builds.
-
-**Definition of done:** scaffold a fresh Next.js project from Vibn
-templates → deploy → throw a test error → see it in Sentry,
-de-minified.
-
---
-
-## Stage 3 — Expose error feed to the AI as MCP tools (~2 hr)
-
-**Goal:** the AI can ask Sentry "what's broken in project X?" and
-get a real answer.
-
-**What gets built:**
-
-Three new MCP tools in `vibn-frontend/lib/ai/vibn-tools.ts`:
-
-1. **`project_recent_errors { projectId, since?, limit? }`**
-   - Returns: `[{ id, title, count, lastSeen, culprit, level }]`
-   - Default `since`: 24h. Default `limit`: 10.
-   - Filters to unresolved issues only.
-   - Implementation: read `sentryProjectSlug` off the project row,
-     call Sentry's `GET /api/0/projects/{org}/{slug}/issues/`.
-
-2. **`project_error_detail { projectId, issueId }`**
-   - Returns: `{ stacktrace, breadcrumbs, request, user, replay_url }`
-   - Implementation: Sentry's `GET /api/0/issues/{id}/events/latest/`.
-
-3. **`project_error_resolve { projectId, issueId }`**
-   - Side-effect: marks the issue resolved in Sentry.
-   - Used by the AI after it ships a fix and confirms via tests.
-   - Implementation: Sentry's `PUT /api/0/issues/{id}/` with
-     `status: "resolved"`.
-
-**Auth:** token storage is per-project (from Stage 1's `projects`
-row). Each project's AI sees only its own project's errors. No
-cross-project leakage.
-
-**Definition of done:** in a Vibn chat for a project with known
-errors, ask the AI "any errors lately?" → AI calls
-`project_recent_errors` → shows real list.
-
---
-
-## Stage 4 — Auto-surface unresolved errors at chat-turn start (~1 hr)
-
-**Goal:** the AI doesn't wait to be asked. When the user opens a
-chat and there are unresolved errors, the AI mentions them on the
-first turn.
-
-**What gets built:**
-
-In `vibn-frontend/app/api/chat/route.ts`, at the start of each chat
-turn (before calling the model):
-
-1. Call the same `project_recent_errors` logic Stage 3 exposed.
-2. If `count > 0`, prepend a synthetic system message:
-
-```
-[PROJECT HEALTH]
-{N} unresolved Sentry issues in the last 24 hours:
- {title} (×{count}, last seen {time}) — {culprit}
- ...
-
-If the user's first message is unrelated to these, you may still
-proactively mention them: "Quick FYI before we get into that —
-{X} has been failing for users."
-
-If their message IS about a broken thing, prefer the matching
-Sentry issue's stack trace over guessing.
-```
-
-3. Only fire this once per N chat turns (configurable, default 1
-   per session opening) — we don't want to spam every turn.
-
-**Risk:** false alarms (Sentry issue from yesterday's deploy that
-no one cares about anymore) make the AI annoying. Mitigation:
-tighten the `since` window to the last 6h, and only surface issues
-with `count >= 2` (one-off errors don't count).
-
-**Definition of done:** intentionally break a deployed user
-project, open chat, type "what's up?" → AI's first response
-mentions the issue, with file path.
-
---
-
-## Out of scope for this proposal
-
- **User-owned Sentry orgs.** Some users will eventually want their
-  own Sentry account, not the shared `vibnai` org. Ship-later;
-  doesn't block the loop. Easy retrofit because storage is already
-  per-project.
- **Performance / Tracing data.** Sentry also captures spans /
-  traces. Useful for "this endpoint is slow" but not the urgent
-  product loop. Ship-later.
- **Front-end UI for errors in Vibn.** A "Health" tab showing the
-  Sentry feed in the Vibn UI is nice but not required for the AI
-  loop to work. Ship-later.
-
---
-
-## Recommendation
-
-Add a **Phase 2.9 (Sentry-as-product loop)** to `BETA_LAUNCH_PLAN.md`
-covering Stages 1–4 as a single bundle. Estimate: **8 hr engineering**.
-
-This is the second-highest-leverage item still ahead of beta,
-behind only the deploy-failed webhook (which is 30 min). Every
-hour spent here directly upgrades the value of every other beta
-test session that follows it.
+## Architecture
+- Sentry is automatically provisioned for every new project (`lib/integrations/sentry.ts`).
+- Environment variables (`NEXT_PUBLIC_SENTRY_DSN` and `SENTRY_AUTH_TOKEN`) are injected into the Coolify app.
+- The AI has access to `project_recent_errors`, `project_error_detail`, and `project_error_resolve` MCP tools to automatically read, diagnose, and fix exceptions directly from the Sentry API.
+- If unhandled exceptions are firing, the AI is prompted at the start of a conversation to address them (`app/api/chat/route.ts`).