docs: heavily compress and simplify remaining reference files to represent current state

This commit is contained in:
2026-05-07 15:07:31 -07:00
parent 3563b98de1
commit 057115a9fc
8 changed files with 58 additions and 2926 deletions

View File

@@ -1,904 +1,22 @@
# Vibn AI Capabilities
> The full set of actions an AI agent can take on behalf of a Vibn workspace,
> along with the REST endpoints, MCP tools, and safety rails that back them.
>
> **Audience:** agent authors, Cursor rule writers, MCP tool designers, and
> anyone building on the Vibn control plane.
>
> **Scope:** everything an agent sees through `https://vibnai.com/api/*` and
> the `/api/mcp` bridge. No Firestore, no internal agent orchestration —
> just the tenant-safe capability surface.
---
## 1. Mental model
Every capability in this document operates on a single **workspace**. A
workspace is Vibn's tenant boundary and maps 1:1 to:
| Vibn concept | External identity | Example (`mark`) |
|---|---|---|
| Workspace | `vibn_workspaces.slug` | `mark` |
| Gitea org | `gitea_org` | `vibn-mark` |
| Gitea bot user | `gitea_bot_username` | `mark-bot` |
| SSH deploy keypair | `coolify_private_key_uuid` + `gitea_bot_ssh_key_id` | registered on both sides |
| Coolify project | `coolify_project_uuid` | `vibn-ws-mark` |
| Coolify environment | `coolify_environment_name` | `production` |
| Domain namespace | `*.{slug}.vibnai.com` | `*.mark.vibnai.com` |
| AI token | `vibn_sk_…` | one per agent/device |
A single agent token can only act on the workspace it was minted for. Cross-
workspace access is structurally impossible — enforced in
[`lib/coolify.ts`](./vibn-frontend/lib/coolify.ts) by matching every Coolify
resource's `environment_id` against the workspace's project environments
(`ensureResourceInProject`).
### The three views
All capabilities roll up into three user-facing surfaces:
- **Code** — every Gitea repo under `vibn-{slug}/`.
- **Live** — every Coolify app/database/service in `vibn-ws-{slug}`, each
reachable under `*.{slug}.vibnai.com`.
- **IDE** — Browser-based agent workspace sessions (outside the scope of this doc).
---
## 2. Authentication
Every agent-facing endpoint accepts **either**:
- `Authorization: Bearer vibn_sk_<base64url>` — a workspace-scoped API key
minted in the settings panel. Stored as a sha256 hash server-side; the
plaintext is shown exactly once on creation. Can be revoked at any time.
- A NextAuth session cookie — used for the dashboard UI and for browser
debugging. Not suitable for long-running agents.
Helper: [`requireWorkspacePrincipal()`](./vibn-frontend/lib/auth/workspace-auth.ts)
resolves either to a `WorkspacePrincipal { workspace, user?, source }`.
**403 on a tenant mismatch means:** the token is valid, but the resource
belongs to another workspace. The agent should stop and ask the user.
---
## 3. MCP surface
The MCP bridge lives at `POST https://vibnai.com/api/mcp`. It takes
JSON-over-HTTP bodies shaped like:
```json
{ "tool": "<tool-name>", "params": { /* tool-specific */ } }
```
The Cursor / Claude Desktop config block is auto-generated in the settings
panel and looks like:
```json
{
"mcpServers": {
"vibn-mark": {
"url": "https://vibnai.com/api/mcp",
"headers": { "Authorization": "Bearer vibn_sk_…" }
}
}
}
```
`GET /api/mcp` returns a self-description with the current tool list.
Version: **2.1.0**.
### 3.1 Workspace & identity tools
| Tool | Purpose | Params |
|---|---|---|
| `workspace.describe` | Returns slug, Coolify project uuid, Gitea org, provision status. | — |
| `gitea.credentials` | Returns the bot's username, PAT, clone URL template, and SSH remote template. Use this for every `git clone`/push — never other credentials. | — |
### 3.2 Project tools
| Tool | Purpose | Params |
|---|---|---|
| `projects.list` | Lists Vibn projects (PRDs, imports, etc.) in the workspace. | — |
| `projects.get` | Single project details. | `{ projectId }` |
### 3.3 Application tools
| Tool | Purpose | Params |
|---|---|---|
| `apps.list` | All Coolify apps in the workspace. | — |
| `apps.get` | Single app details (status, fqdn, domains, git info). | `{ uuid }` |
| `apps.create` | Create a Coolify app. **Four pathways** — pick the one that matches your source. **(1) Gitea repo** (user's own code): pass `repo`. Clones over HTTPS+PAT; no SSH. **(2) Docker image** (pre-built single-container third-party app, e.g. `nginx:alpine`): pass `image`. **(3) Inline Docker Compose YAML** (custom multi-service stack): pass `composeRaw`. **(4) Coolify one-click template** (RECOMMENDED for popular apps — Twenty, n8n, Supabase, Ghost, etc): pass `template` with a slug from `apps.templates.search`. Templates have battle-tested env defaults, healthchecks, and `depends_on` graphs. **Use pathway 4 over pathway 3 whenever a template exists** — it is dramatically more reliable. Auto-domain `{name}.{slug}.vibnai.com` for all pathways. | **(1) repo:** `{ repo, branch?, name?, ports?, buildPack?, domain?, envs?, instantDeploy?, dockerComposeLocation?, dockerfileLocation?, baseDirectory? }` **(2) image:** `{ image, name?, ports?, domain?, envs?, instantDeploy? }` **(3) composeRaw:** `{ composeRaw, name?, domain?, envs?, instantDeploy? }` **(4) template:** `{ template, name?, domain?, envs?, instantDeploy? }` |
| `apps.update` | PATCH a whitelisted set of fields (name, description, git branch/commit, ports, build commands, base directory, Dockerfile location, docker-compose location…). Returns `applied`, `ignored`, and `rerouted` arrays so the agent can see exactly what persisted; setting `fqdn`/`domains`/`docker_compose_domains` returns a `rerouted` entry pointing at `apps.domains.set`, and setting `git_repository` returns one pointing at `apps.rewire_git`. | `{ uuid, patch }` |
| `apps.rewire_git` | Re-point an app's `git_repository` at the canonical HTTPS+PAT clone URL. Use to recover older apps that were created with SSH URLs, or to refresh a rotated bot PAT. | `{ uuid, repo? }``repo` optional; inferred from current URL if omitted |
| `apps.delete` | Destroy the app. Volumes kept by default. | `{ uuid, confirm }``confirm` must equal the app's exact name |
| `apps.deploy` | Trigger a new deployment. | `{ uuid, force? }` |
| `apps.deployments` | List recent deployments + status. | `{ uuid }` |
| `apps.logs` | Runtime logs for a running app. Compose-aware: returns per-service logs for `dockercompose` build packs, single stream for `dockerfile`/`nixpacks`. Includes container status and any diagnostic warnings. | `{ uuid, service?, lines? }``service` filter (compose only), `lines` default 200, max 5000 |
| `apps.volumes.list` | List Docker volumes belonging to an app (name + size in bytes). Use before `apps.volumes.wipe` to know exact volume names. | `{ uuid }` |
| `apps.volumes.wipe` | **Destructive / irreversible.** Stop all app containers, remove a specific volume, leave it ready for a fresh `apps.deploy`. Use to recover from stale DB state on first boot (the most common compose app failure). `confirm` must equal the exact volume name. | `{ uuid, volume, confirm }` |
| `apps.containers.up` | Run `docker compose up -d` directly on the Coolify host for a compose app or service. Bypasses Coolify's queued-start worker (which routinely fails to actually invoke compose). Use after env or domain changes to recreate containers, or as a recovery path when `apps.create`/`apps.deploy` returned `started: false`. Idempotent — already-running containers are no-op'd. Up to 10 min timeout. Returns `{ ok, code, stdout, stderr, durationMs }`. | `{ uuid }` |
| `apps.containers.ps` | `docker compose ps -a` against the rendered compose dir. Quick diagnostic for "why isn't my stack running?" — distinguishes `Created` (queued-start failure → use `apps.containers.up`), `Exited` (app crash → use `apps.logs`), `Restarting` (boot loop → use `apps.logs`), and `Up healthy/unhealthy`. | `{ uuid }` |
| `apps.templates.list` | Browse the full Coolify one-click template catalog (320+ vetted apps: CRMs, AI tools, CMSes, dashboards, databases, …). Each entry is deployable via `apps.create({ template: <slug> })`. Returns `{ total, offset, limit, items: [{ slug, slogan, tags, port, documentation, logo }] }`. Catalog is fetched from upstream and cached for 1h. | `{ limit?, offset?, tag? }``limit` default 50, max 500; `tag` substring filter (e.g. `"crm"`, `"ai"`) |
| `apps.templates.search` | Find templates by name, tag, or slogan. Ranked: exact-slug > slug-starts-with > slug-contains > tag-exact > tag-contains > slogan. Use this **before** `apps.create` to discover the right slug (e.g. `"twenty"`, `"n8n-with-postgres-and-worker"`, `"forgejo-with-postgresql"`). | `{ query, tag?, limit? }``limit` default 25, max 100. Either `query` or `tag` must be set |
| `apps.exec` | Run a one-shot command inside an app container (via `docker exec` on the Coolify host). Compose-aware: pass `service` when the app has >1 container. Returns `{ container, service, code, stdout, stderr, truncated, durationMs, containerHealth }`. Default timeout 60s (max 10 min); default output cap 1 MB (max 5 MB). Command is run through `sh -lc` so shell syntax works. Use this for database migrations, seeds, CLI invocations, and ad-hoc debugging. Every call is audit-logged (command + target, not output). | `{ uuid, command, service?, user?, workdir?, timeout_ms?, max_bytes? }` |
| `apps.domains.list` | Current domain set. | `{ uuid }` |
| `apps.domains.set` | Replace the domain set. All entries must end with `.{slug}.vibnai.com`. Compose-aware: for `dockercompose` apps the domain is attached to a specific service (`server` by default; override with `service`). | `{ uuid, domains: string[], service? }` |
| `apps.envs.list` | List env vars. Values returned are redacted for `shown-once` secrets. | `{ uuid }` |
| `apps.envs.upsert` | Create or update an env var. `is_build_time` is **ignored** — Coolify derives build-vs-runtime from Dockerfile `ARG` usage. | `{ uuid, key, value, isPreview?, isMultiline?, isLiteral?, isShownOnce? }` |
| `apps.envs.delete` | Delete an env var. | `{ uuid, key }` |
### 3.4 Database tools
| Tool | Purpose | Params |
|---|---|---|
| `databases.list` | All databases in the workspace, across all flavors. | — |
| `databases.create` | Provision a database. Supported `type`: `postgresql`, `mysql`, `mariadb`, `mongodb`, `redis`, `keydb`, `dragonfly`, `clickhouse`. | `{ type, name?, isPublic?, publicPort?, image?, credentials?, limits? }` |
| `databases.get` | Details + internal connection URL. | `{ uuid }` |
| `databases.update` | PATCH name, public visibility, image, limits. | `{ uuid, patch }` |
| `databases.delete` | Destroy the database. Volumes kept by default. | `{ uuid, confirm }``confirm` must equal the db's exact name |
### 3.5 Auth provider tools
Authentication is a first-class capability. An agent cannot spin up arbitrary
Coolify services — only vetted auth providers from an allowlist.
| Tool | Purpose | Params |
|---|---|---|
| `auth.list` | Auth providers currently deployed in the workspace (classified by Coolify's `service_type`). | — |
| `auth.create` | Provision one of the allowed providers. | `{ provider, name?, description?, instantDeploy? }` |
| `auth.delete` | Destroy an auth provider. Volumes (user data) kept by default. | `{ uuid, confirm }``confirm` must equal the service's exact name |
**Allowed providers** (keys passed as `provider`):
- `pocketbase` — lightweight (SQLite) auth + data, single container.
- `authentik` — feature-rich self-hosted IDP.
- `keycloak` / `keycloak-with-postgres` — industry-standard OIDC/SAML.
- `pocket-id` / `pocket-id-with-postgresql` — passkey-first OIDC.
- `logto` — dev-first IDP.
- `supertokens-with-postgresql` — session/auth backend.
Requesting anything outside this list returns 400 with a hint listing the
allowed ones, so the agent can self-correct.
### 3.6 Domain tools (P5.1 — custom apex domains)
Custom apex domains are owned end-to-end by Vibn: the registrar is OpenSRS
(Tucows), authoritative DNS is Google Cloud DNS in the Canadian project, and
domains are pinned to the workspace that registered them. All four lifecycle
steps — search, register, attach, inspect — are agent-callable.
| Tool | Purpose | Params |
|---|---|---|
| `domains.search` | Check availability + price for one or more candidate apex domains via OpenSRS. Stateless; does not reserve anything. | `{ names: string[], period?: number }``names` up to 25, `period` in years (auto-bumped for quirky TLDs like `.ai` which requires 2y minimum). |
| `domains.register` | Register a domain through OpenSRS. Registers unlocked; locking happens automatically after `domains.attach` completes. Idempotent per `(workspace, domain)`. | `{ domain, period?, whoisPrivacy?, contact, nameservers?, ca?: { cprCategory, legalType } }``ca.*` required for `.ca`. |
| `domains.list` | List all domains owned by the workspace with their status, registrar order id, expiry, and DNS provider/zone. | — |
| `domains.get` | Full record + last 20 lifecycle events. | `{ domain }` |
| `domains.attach` | Wire a registered domain to a Coolify app (or arbitrary IP/CNAME): create Cloud DNS zone, write A/CNAME rrsets, update registrar-side nameservers, append FQDNs to the Coolify app's domain list. Idempotent; safe to retry. | `{ domain, appUuid? \| ip? \| cname?, subdomains?: string[] (default ["@","www"]), updateRegistrarNs? }` |
### Object storage (GCS via S3-compatible HMAC)
Every workspace gets a Canada-hosted GCS bucket, a dedicated service
account, and an HMAC keypair so agent-built apps can use any AWS S3
SDK. The HMAC *secret* is never returned through the API — it's written
directly into Coolify apps via `storage.inject_env`.
| Tool | Purpose | Params |
|---|---|---|
| `storage.describe` | Report the workspace bucket name, region, S3 endpoint, access-key id, and provision status. No secret returned. | — |
| `storage.provision` | Idempotently create/reconcile the workspace's GCP service account, JSON keyfile, bucket (`vibn-ws-{slug}-{rand}`), IAM binding, and HMAC key. Safe to re-run. | — |
| `storage.inject_env` | Push `STORAGE_*` env vars (endpoint, region, bucket, access key id, secret access key, force_path_style) into a Coolify app. The secret is written server-side with `is_shown_once=true`; it never transits the response body. | `{ uuid, prefix? }``prefix` defaults to `STORAGE_`; use `S3_` for apps that expect AWS-standard names |
The bucket is S3-compatible: point any `aws-sdk` / `@aws-sdk/client-s3`
/ `boto3` at `STORAGE_ENDPOINT` with `force_path_style=true` (`STORAGE_*`
env vars are set by `storage.inject_env`).
**Residency note:** Cloud DNS is global anycast — configuration is not
Canadian-pinned at the storage layer. The workspace-level `dns_provider`
flag (default `cloud_dns`) will let us swap in CIRA D-Zone for strict
Canadian residency without touching the MCP surface.
**Billing:** Every successful `domains.register` writes a `debit` row to
`vibn_billing_ledger` with the OpenSRS order id as `ref_id`. The
`vibn_domain_events` table keeps an append-only audit of every lifecycle
call (`register.attempt`, `register.success`, `register.failed`,
`attach.success`).
**Verified end-to-end (2026-04-22)** against PROD GCP + OpenSRS sandbox +
PROD Coolify (Coolify `v4.0.0-beta.473`); see
`vibn-frontend/scripts/smoke-attach-e2e.ts`. **All 5 sub-systems green.**
- ✓ OpenSRS register against Horizon (sandbox) returns order id, response 200.
- ✓ Cloud DNS managed zone created in `master-ai-484822` with public anycast NS.
- ✓ A records (`@`, `www`) written to the zone.
- ✓ Registrar-side nameserver update accepts Cloud DNS NS values
(trailing-dot normalization in `lib/opensrs.ts`); sandbox returns 480
because its mock registry doesn't know real Google NS hosts, which is
expected — live mode talks to real registries that accept any resolvable NS.
- ✓ Unlock → update NS → relock fallback path verified (sandbox-recognized
nameservers return 200; the unlock/relock sequence is exercised when the
registry returns 405 lock-conflict).
- ✓ Coolify domain-list PATCH adds the apex + `www` to the application
`fqdn` column and the smoke test re-fetches it to confirm.
> **Operational gotcha — the destination server must be proxy-enabled.**
> Coolify's `update_by_uuid` controller accepts `domains` as a comma-separated
> list and only maps it onto the model's `fqdn` column when the destination
> server's `Server::isProxyShouldRun()` returns `true`. That helper requires
> **both** `proxy.type ∈ {TRAEFIK, CADDY}` *and* `is_build_server = false`.
> If either is misconfigured the PATCH returns 200 but the field is silently
> dropped (Laravel mass-assignment ignores `domains` because it isn't in
> `$fillable`, and the controller never copies it into `fqdn`). We hit this
> on `coolify-server-mtl` (`zg4cwgc44ogc08804000gggo`), which had
> `proxy=null` and `is_build_server=true`. Fixed by:
>
> ```sql
> UPDATE servers
> SET proxy = jsonb_set(coalesce(proxy,'{}'::jsonb), '{type}', '"TRAEFIK"')
> WHERE uuid = 'zg4cwgc44ogc08804000gggo';
> UPDATE server_settings
> SET is_build_server = false
> WHERE server_id = (SELECT id FROM servers WHERE uuid = 'zg4cwgc44ogc08804000gggo');
> ```
>
> followed by `docker restart coolify` to clear Laravel's in-memory config.
> Sending `fqdn` directly is **not** an alternative — the controller's
> `$allowedFields` whitelist rejects it with 422 "This field is not allowed."
### 3.7 Agent-side stdio MCP servers (`vibn-agent-runner`)
Separate from the control-plane MCP at `/api/mcp` (which is what external
agents call *into* Vibn), the `vibn-agent-runner` exposes its own in-house
tool surface *outward* over stdio MCP. This lets Cursor, Claude Desktop,
Goose, or any MCP-speaking client drive the same Coolify / Gitea / workspace
tooling the Coder/PM/Marketing sub-agents use internally — with the same
protected-repo and protected-app guardrails enforced centrally.
Architecture: every tool now has three touch-points backed by one source of truth:
```
vibn-agent-runner/src/tools/<domain>-api.ts ← pure, config-agnostic logic + security guards
vibn-agent-runner/src/tools/<domain>.ts ← thin registerTool() wrappers for the in-process agent loop
vibn-agent-runner/src/mcp/<domain>-server.ts ← stdio MCP server for external clients
```
| Server | Tools | Required env |
|---|---|---|
| `vibn-coolify-mcp` | 7 — list_projects, list_applications, deploy, get_logs, list_all_apps, get_app_status, deploy_app | `COOLIFY_API_URL`, `COOLIFY_API_TOKEN` |
| `vibn-gitea-mcp` | 6 — create/list/close issues, list_repos, list_all_issues, read_repo_file | `GITEA_API_URL`, `GITEA_API_TOKEN`, `GITEA_USERNAME` |
| `vibn-workspace-mcp` | 8 — read/write/replace/list/find/search_code, execute_command, git_commit_and_push | `WORKSPACE_ROOT` (+ Gitea creds for git push) |
| `vibn-platform-mcp` | 7 — save_memory, list_memory, list_skills, get_skill, finalize_prd, get_prd, web_search | `SESSION_KEY` (optional), Gitea creds (for skills) |
| `vibn-agent-mcp` | 2 — spawn_agent, get_job_status (dispatches into the runner's HTTP API) | `AGENT_RUNNER_URL` (defaults to `http://localhost:3333`) |
Run locally with `npm run mcp:<name>` (or `:dev` via ts-node) in
`vibn-agent-runner/`. Smoke-test any server with
`node scripts/smoke-mcp.js <name>`. The in-process agent loop still sees
the same 28 registered tools — no behavioral regression.
---
## 4. REST surface
Every MCP tool is also exposed as a plain HTTP endpoint under
`/api/workspaces/{slug}/…`. Agents that prefer curl-style access can use
these directly; the shape is identical to the MCP `params`. Auth is the
same bearer header.
### 4.1 Workspace & key management
| Method | Path | Description |
|---|---|---|
| GET | `/api/workspaces` | All workspaces the principal has access to. |
| GET | `/api/workspaces/{slug}` | Workspace details. |
| POST | `/api/workspaces/{slug}/provision` | Idempotent re-run of Gitea org + bot + SSH keypair + Coolify project setup. |
| GET | `/api/workspaces/{slug}/keys` | List API keys (metadata only). |
| POST | `/api/workspaces/{slug}/keys` | Mint a new API key. Full token returned once. |
| DELETE | `/api/workspaces/{slug}/keys/{keyId}` | Revoke a key. |
| GET | `/api/workspaces/{slug}/gitea-credentials` | Return bot username, PAT (decrypted), clone/SSH templates. |
| GET | `/api/workspaces/{slug}/bootstrap.sh` | Shell script that writes `.cursor/rules`, `.cursor/mcp.json`, `.env.local` into the cwd. |
### 4.2 Applications
| Method | Path | Description |
|---|---|---|
| GET | `/api/workspaces/{slug}/apps` | List apps. |
| POST | `/api/workspaces/{slug}/apps` | Create an app from a workspace repo. |
| GET | `/api/workspaces/{slug}/apps/{uuid}` | App details. |
| PATCH | `/api/workspaces/{slug}/apps/{uuid}` | Update whitelisted fields. |
| DELETE | `/api/workspaces/{slug}/apps/{uuid}?confirm=<exact-name>` | Destroy app. |
| POST | `/api/workspaces/{slug}/apps/{uuid}/deploy` | Trigger deploy. |
| GET | `/api/workspaces/{slug}/apps/{uuid}/deployments` | List deployments. |
| GET | `/api/workspaces/{slug}/apps/{uuid}/domains` | List domains. |
| PATCH | `/api/workspaces/{slug}/apps/{uuid}/domains` | Replace domain set. |
| GET | `/api/workspaces/{slug}/apps/{uuid}/envs` | List env vars. |
| PATCH | `/api/workspaces/{slug}/apps/{uuid}/envs` | Upsert env var(s). |
| DELETE | `/api/workspaces/{slug}/apps/{uuid}/envs?key=FOO` | Delete env var. |
| GET | `/api/workspaces/{slug}/deployments/{deploymentUuid}/logs` | Deployment logs. |
### 4.3 Databases
| Method | Path | Description |
|---|---|---|
| GET | `/api/workspaces/{slug}/databases` | List databases. |
| POST | `/api/workspaces/{slug}/databases` | Create a database (8 flavors). |
| GET | `/api/workspaces/{slug}/databases/{uuid}` | Database details + internal connection URL. |
| PATCH | `/api/workspaces/{slug}/databases/{uuid}` | Update fields. |
| DELETE | `/api/workspaces/{slug}/databases/{uuid}?confirm=<exact-name>` | Destroy database. |
### 4.4 Auth providers
| Method | Path | Description |
|---|---|---|
| GET | `/api/workspaces/{slug}/auth` | List deployed auth providers + the allowlist. |
| POST | `/api/workspaces/{slug}/auth` | Provision a provider from the allowlist. |
| GET | `/api/workspaces/{slug}/auth/{uuid}` | Provider details. |
| DELETE | `/api/workspaces/{slug}/auth/{uuid}?confirm=<exact-name>` | Destroy provider. |
### 4.5 Domains (P5.1)
| Method | Path | Description |
|---|---|---|
| POST | `/api/workspaces/{slug}/domains/search` | Availability + pricing for up to 25 candidate names. |
| GET | `/api/workspaces/{slug}/domains` | List workspace-owned domains. |
| POST | `/api/workspaces/{slug}/domains` | Register a domain (idempotent per `(workspace, domain)`). |
| GET | `/api/workspaces/{slug}/domains/{domain}` | Full record + last 20 events. |
| POST | `/api/workspaces/{slug}/domains/{domain}/attach` | Create Cloud DNS zone, write records, update registrar NS, wire Coolify domain list. |
---
## 5. Gitea surface
AI agents **never** talk to the root Gitea admin token. They use the
workspace's dedicated bot user.
### 5.1 What the bot can do
- Fully own the `vibn-{slug}` org (added as the org's owner team).
- Read/write every repo in that org via its PAT.
- Push over SSH using the workspace's ed25519 deploy key (same keypair
Coolify uses to pull code).
- What it **cannot** do: touch any other org, the root admin surface, or
Gitea's `/admin/*` endpoints.
### 5.2 How to get the bot credentials
```http
GET /api/workspaces/{slug}/gitea-credentials
Authorization: Bearer vibn_sk_
```
Returns:
```json
{
"bot": { "username": "mark-bot", "token": "…" },
"gitea": {
"apiBase": "https://git.vibnai.com/api/v1",
"host": "git.vibnai.com",
"cloneUrlTemplate": "https://mark-bot:{{token}}@git.vibnai.com/vibn-mark/{{repo}}.git",
"sshRemoteTemplate": "git@git.vibnai.com:vibn-mark/{{repo}}.git",
"webUrlTemplate": "https://git.vibnai.com/vibn-mark/{{repo}}"
},
"workspace": { "slug": "mark", "giteaOrg": "vibn-mark" }
}
```
The PAT is stored **encrypted at rest** using AES-256-GCM with the
`VIBN_SECRETS_KEY` server secret; the decrypt step runs only on this endpoint.
### 5.3 Gitea operations via the standard Gitea API
Once the agent has `{bot.token, gitea.apiBase}`, it can call any standard
Gitea v1 endpoint as the bot, scoped to the workspace org. Common ones:
- `POST /orgs/{org}/repos` — create a repo.
- `PATCH /repos/{org}/{repo}` — update repo settings.
- `GET /repos/{org}/{repo}/contents/{path}` — read files.
- `PUT /repos/{org}/{repo}/contents/{path}` — write files (commits).
- `POST /repos/{org}/{repo}/pulls` — open PRs.
- `POST /repos/{org}/{repo}/branches` — create branches.
---
## 6. Domain policy
Every app gets an auto-generated domain under the workspace's namespace:
```
{app-slug}.{workspace-slug}.vibnai.com
```
For example, creating an app named `my-api` in workspace `mark` yields
`my-api.mark.vibnai.com` automatically — no DNS config, no cert work,
served by Coolify's wildcard Traefik.
### 6.1 What agents can do
- Accept the auto-generated domain (default path).
- Replace the domain set via `PATCH /apps/{uuid}/domains`, provided every
entry ends with `.{workspace-slug}.vibnai.com`.
### 6.2 What agents cannot do
- Point an app at a domain outside the workspace's namespace. The server
rejects this with 403 regardless of DNS state:
```json
{ "error": "Domain evil.com is not allowed; must end with .mark.vibnai.com",
"hint": "Use my-api.mark.vibnai.com" }
```
This is enforced by `isDomainUnderWorkspace()` in
[`lib/naming.ts`](./vibn-frontend/lib/naming.ts).
### 6.3 Custom (external) domains
Not exposed to AI agents. A human can still add them through Coolify
directly or through a future human-gated UI.
---
## 7. Safety model
### 7.1 Tenant enforcement
Every resource-returning helper in `lib/coolify.ts` runs through
`ensureResourceInProject()`. It:
1. Trusts an explicit `project_uuid` on the resource if present, else
2. Fetches the project's environment ids via `GET /projects/{uuid}` and
verifies the resource's `environment_id` is in that set.
A token for `mark` that tries to read an app in `justine`'s project returns:
```json
{ "error": "Application <uuid> does not belong to project <mark-project-uuid>" }
```
with HTTP 403. Cross-workspace enumeration and access are not just
discouraged — they fail at the helper level.
### 7.2 Destructive operations
Every delete endpoint requires `?confirm=<exact-resource-name>`:
```
DELETE /apps/{uuid} → 409 "confirmation required"
DELETE /apps/{uuid}?confirm=wrong → 409 "confirmation required"
DELETE /apps/{uuid}?confirm=my-api → 200 deleted
```
This means an agent hallucinating a delete call cannot cost you the
resource — it must first know the exact name, which implies it just listed
or just created it.
**Volumes are kept by default** on delete. To also remove volumes, pass
`?volumes=delete` (apps/dbs) — this is opt-in, per-call, never the default.
### 7.3 Creation guardrails
- Apps can only be created from repos in the workspace's Gitea org.
- Auth providers can only be created from the allowlist (see §3.5).
- Database flavors are restricted to the 8 Coolify supports.
- Env var keys must match `/^[A-Z_][A-Z0-9_]*$/` (no shell-escape tricks).
### 7.4 Secrets handling
- `VIBN_API_KEY` is only shown **once** on mint. Server keeps a sha256 hash.
- Gitea bot PATs are **encrypted at rest** (AES-256-GCM with
`VIBN_SECRETS_KEY`).
- The SSH private key is held by Coolify, not by Vibn; the public key is
pushed to the Gitea bot user's key list. Rotating is a re-provision.
- Agent prompts and Cursor rules include a "treat VIBN_API_KEY like a
password — never print or commit it" directive.
---
## 8. Worked examples
### 8.1 "Build me a Next.js app with a Postgres and Pocketbase auth"
From the agent's side, using MCP:
```json
// 1. Ensure a repo exists in the workspace org (standard Gitea API,
// using the bot PAT from gitea.credentials).
POST https://git.vibnai.com/api/v1/orgs/vibn-mark/repos
{ "name": "my-site", "private": true, "auto_init": true }
// 2. Create the Coolify app. Auto-domain my-site.mark.vibnai.com.
{ "tool": "apps.create",
"params": { "repo": "my-site", "ports": "3000", "instantDeploy": false } }
// 3. Provision a Postgres.
{ "tool": "databases.create",
"params": { "type": "postgresql", "name": "app-db" } }
// → returns { internalUrl: "postgres://…@<uuid>:5432/postgres" }
// 4. Wire the db URL into the app as an env var.
{ "tool": "apps.envs.upsert",
"params": { "uuid": "<app-uuid>", "key": "DATABASE_URL",
"value": "<internalUrl>" } }
// 5. Deploy Pocketbase as the auth layer.
{ "tool": "auth.create",
"params": { "provider": "pocketbase", "name": "auth" } }
// 6. First real deploy.
{ "tool": "apps.deploy", "params": { "uuid": "<app-uuid>" } }
// 7. Poll.
{ "tool": "apps.deployments", "params": { "uuid": "<app-uuid>" } }
// → [{ uuid, status: "finished" | "in_progress" | "failed" | "queued" }]
```
The agent hands the user back `https://my-site.mark.vibnai.com`.
### 8.2 "Add an `api` subdomain to my app"
```json
{ "tool": "apps.domains.set",
"params": {
"uuid": "<app-uuid>",
"domains": ["my-site.mark.vibnai.com", "api.mark.vibnai.com"]
} }
```
Valid — both end with `.mark.vibnai.com`. `evil.com` or `my-site.justine.vibnai.com`
would return 403.
### 8.3 "Delete the whole thing"
Agent must learn the resource names first (or it'll hit the confirm gate):
```json
// Learn the name.
{ "tool": "apps.get", "params": { "uuid": "<app-uuid>" } }
// → { name: "my-site", ... }
// Delete with matching confirm.
{ "tool": "apps.delete",
"params": { "uuid": "<app-uuid>", "confirm": "my-site" } }
```
Wrong confirm returns `409 "Confirmation required"`.
---
## 9. Error handling reference
| Status | Meaning | What the agent should do |
|---|---|---|
| 400 | Bad request body (invalid JSON, missing required field, invalid type). | Fix the body, retry. |
| 401 | No / bad bearer token. | Ask the user to mint a fresh key. |
| 403 | **Tenant mismatch** — resource belongs to another workspace, domain outside workspace namespace, or repo not in workspace org. | **Stop.** Do not retry with guessed values. Ask the user. |
| 404 | Resource not found (app/db/service/repo uuid wrong). | Re-list to find the right uuid. |
| 409 | Delete confirmation missing or wrong. | Fetch the resource name first, then retry with `confirm=<name>`. |
| 422 | Coolify validation failure (e.g. malformed domain). | Check the `details` field. |
| 502 | Upstream Coolify/Gitea error. | Retry with backoff. |
| 503 | Workspace not fully provisioned yet. | Call `POST /provision`, then retry. |
---
## 10. Versioning
The MCP descriptor at `GET /api/mcp` reports a semver `version`. Tool names
are append-only within a major version — agents can cache the tool list
safely for the duration of a conversation but should re-fetch on 404.
Current version: **2.4.8**.
- **1.x** — session-cookie-only MCP, no tenant keys.
- **2.0** — `vibn_sk_…` keys, workspace-scoped Gitea bot + Coolify project.
- **2.1** — create/update/delete for apps, 8 database flavors, auth
provider allowlist, domain policy enforcement, confirm-gated deletes.
- **2.2** — per-workspace GCS object storage (`storage.*`), compose-aware
domain routing, runtime log tailing (`apps.logs`), in-container command
execution (`apps.exec`), and diagnostic `apps.update` responses.
- **2.3** — `apps.create` Docker-image and inline-composeRaw pathways (no
Gitea repo required for third-party apps), `apps.volumes.list` +
`apps.volumes.wipe` for self-service volume recovery.
- **2.4** — `apps.create` Coolify-template pathway (`{ template: "twenty" }`
etc.) for one-click deploy of 320+ vetted apps, plus `apps.templates.list`
/ `apps.templates.search` for catalog discovery.
- **2.4.1** — `apps.containers.up` / `apps.containers.ps` to bypass Coolify's
unreliable queued-start worker. `apps.create` (template + composeRaw
pathways) now auto-falls-back to direct `docker compose up -d` over SSH
when Coolify's queue stalls, so a single `apps.create` call really does
leave a running stack.
- **2.4.2** — `apps.create` no longer reports `started: false` when only a
sidecar (worker / scheduler) failed its `depends_on: service_healthy`
gate. We now probe the host with `docker ps` after `compose up -d` and
return `started: true` whenever any container of the stack is running,
surfacing the compose stderr in `startDiag` so agents can decide whether
to re-run `apps.containers.up` later. This matches the real-world
behavior of slow-booting apps like Twenty (worker waits ~3 min for
twenty's healthcheck, exceeds compose's default depends_on timeout).
- **2.4.3** — Auto-attached stack containers to the `coolify` proxy network
after `compose up`, fixing Traefik 503s on third-party apps.
- **2.4.4** — Made the proxy-network attach selective (only `traefik.enable=true`
containers) to avoid DNS aliasing collisions where Twenty's `postgres`
hostname resolved to `coolify-db`.
- **2.4.5** — Architectural overhaul of `apps.create` for service templates.
We no longer run `docker compose up -d` over SSH as a deployment fallback
(that bypassed Coolify's compose generation, causing internal services to
land on the wrong networks). Instead `apps.create` now:
1. Calls Coolify's `start` and lets its queue do the full deploy
(volumes, internal networking, env interpolation, healthchecks).
2. Polls `service.applications[*].status` (the truthful per-app status
field — `service.status` itself routinely lies as
`starting:unknown` while containers are healthy).
3. Applies three surgical post-deploy fixes that Coolify's own
pipeline omits but its REST API does not expose:
- rewrites `SERVICE_FQDN_*` / `SERVICE_URL_*` in the rendered
`.env` so frontends that bake their backend URL into the SPA
bundle (Twenty's `SERVER_URL`, etc.) point at the real
custom domain instead of the auto-generated sslip.io URL;
- injects the missing
`traefik.http.services.<svc>.loadbalancer.server.port` label
(Coolify generates the routing rules but forgets the port,
so Traefik logs `error: port is missing` and returns 503);
- connects `coolify-proxy` to the project's Docker network
(Coolify writes a `caddy_ingress_network=<uuid>` hint label
but never actually runs `docker network connect`), then
force-recreates ONLY the public-facing container so the new
env+label apply, and restarts the proxy so Traefik
re-discovers.
The response shape gains:
- `reachable` — boolean, true when `https://<fqdn>` answers 2xx/3xx
- `appStatus` — the truthful per-application status from Coolify
- `postDeploy` — step-by-step diagnostic for each of the three fixes
The previous `started`/`startMethod`/`startDiag` fields are kept for
back-compat. Internal services (Postgres, Redis, worker) stay on
their isolated project network — fixing the `password authentication
failed` regression introduced in 2.4.4.
- **2.4.6** — Two fixes for transient Coolify queue lag observed in
2.4.5:
- **Polling no longer false-fails on early `exited` status.**
Coolify's queue worker can take 60-120s to dequeue a `start`
request; during that window `service.applications[*].status`
returns the stale `exited` (= "never started") state. Previously
we treated that as terminal failure after 90s. Now we require
*evidence of activity* (`starting:*` or `running:*` was seen at
least once) before treating subsequent `exited` reports as
terminal. Until activity is observed, the loop just keeps polling
up to the 8-min health timeout. Eliminates the case where
`apps.create` returned `started: false` on a stack that was
actually about to come up healthy.
- **`apps.repair`** — new tool. Re-runs the three post-deploy
patches (env rewrite, port label, proxy network attach + recreate
+ proxy restart) against an existing service without recreating
it. Useful when a deploy succeeded mechanically but ended up
serving Traefik 503 or Mixed Content errors, or whenever a user
rotates a custom domain. Params: `{ uuid, fqdn, publicAppName,
port? }`. Returns `{ reachable, postDeploy: { steps }, probe }`.
- **2.4.7** — `applyCoolifyPostDeployFixes` now schedules the
`coolify-proxy` restart (step 5) as a fire-and-forget background
job (`(sleep 3 && docker restart coolify-proxy) &`) instead of
blocking on it synchronously. The proxy restart kills any in-flight
TCP connection through the gateway — including the very request
that's running `apps.repair` / `apps.create` — so doing it inline
caused the agent to see a curl framing error (exit 16) right when
the work was in fact succeeding. Now the SSH command returns within
~50ms, the HTTP response is delivered, and Traefik re-discovers
labels ~3s later.
- **2.4.8** — Massive simplification of post-deploy logic. Coolify's
template engine is fully capable of generating correct Traefik
labels and `SERVICE_FQDN_<APP>` / `SERVICE_URL_<APP>` env vars **if
the URL passed to `setServiceDomains` includes the upstream port**
(the "Required Port" hint in Coolify's UI: `https://crm.example.com:3000`,
not `https://crm.example.com`). 2.4.52.4.7 were missing that
detail, which is why they had to re-write the `.env` and inject
the loadbalancer port label as a workaround.
In 2.4.8 `apps.create` reads `template.port` from the catalog and
passes `https://<fqdn>:<port>` to `setServiceDomains`. Coolify then:
- generates `traefik.http.services.<svc>.loadbalancer.server.port=<port>`
automatically;
- rewrites `.env` so `SERVICE_FQDN_<APP>=<fqdn>` and
`SERVICE_URL_<APP>=https://<fqdn>` (no sslip.io leak);
- keeps `SERVICE_FQDN_<APP>_<PORT>` magic placeholders correctly
pointed at the user's host:port.
All that's left is the one thing Coolify still skips: connecting
`coolify-proxy` to the resource's project Docker network. So
`applyCoolifyPostDeployFixes` is now ~30 lines (down from ~200) and
no longer SSH-runs an embedded Python script inside a
`python:3-alpine` container. The `CoolifyPostDeployResult.steps`
shape gains/keeps `proxyNetwork` + `proxyRestart` only; the old
`envRewrite` / `portLabel` / `recreate` step keys are removed.
`apps.repair` retains its API (`{ uuid, fqdn, publicAppName, port? }`)
but `port` is now informational only (not required for the helper
to function).
---
## 11. Troubleshooting compose apps
Most real-world app failures fall into a small number of patterns. The
recipes below are the canonical diagnostic flow for an agent operating
on behalf of a user.
### 11.1 "Deployment succeeds but the app keeps restarting"
Agents should NOT trust Coolify's deployment status alone. A successful
build + healthcheck-pending response usually means the containers came
up but the app logic is crashing. Investigate with:
1. `apps.logs { uuid, lines: 300 }` — look for `warnings` (empty
services indicate containers never ran) and per-service stderr.
2. If the logs show repeated DB errors like `relation "xxx" does not
exist` or `pq: no such table`, the app skipped its migration step.
This is common for Docker Compose apps whose `server` service only
runs migrations on a separate `worker` command.
3. Run the app's migration CLI via `apps.exec`, e.g. for Twenty:
```json
{
"action": "apps.exec",
"params": {
"uuid": "<app-uuid>",
"service": "server",
"command": "yarn command:prod database:migrate:prod",
"timeout_ms": 300000
}
}
```
4. Re-check logs — errors should be gone. Then `apps.deploy` (or just
wait for the next restart) and verify the container reports
`healthy`.
### 11.2 "`apps.update` returned success but nothing changed"
Check the `applied` / `ignored` / `rerouted` arrays in the response.
The most common reroutes:
- `fqdn`, `domains`, `docker_compose_domains` → use `apps.domains.set`.
- `git_repository` → use `apps.rewire_git` (rewrites the clone URL with
the workspace's Gitea PAT embedded).
- `build_pack` — changing this mid-life for an existing app is not
supported. Recreate the app.
### 11.3 "Compose app is up but the domain 502s"
Coolify's API treats compose and single-container apps differently:
compose apps use `docker_compose_domains` (array of `{name, domain}`),
single-container apps use `domains` (comma-separated string).
`apps.domains.set` handles both, but if you're seeing a 502:
1. `apps.domains.list { uuid }` — confirm the domain is actually
attached to a **service** (not just the app).
2. `apps.exec { uuid, service: "server", command: "nc -vz localhost <port>" }`
— verify the upstream container is listening.
3. `apps.logs { uuid, service: "server", lines: 200 }` — look for
startup errors like `EADDRINUSE` or config failures.
### 11.4 "Choosing the right `apps.create` pathway"
| Situation | Use |
|---|---|
| User's own code lives in their Gitea org | `repo` (pathway 1) |
| Single-container third-party app (nginx, redis, a docker image) | `image` (pathway 2) |
| Custom multi-service stack (no upstream template exists) | `composeRaw` (pathway 3) |
| **Popular third-party app (Twenty, n8n, Supabase, Ghost, Wordpress, …)** | **`template` (pathway 4) — strongly preferred** |
**Always check `apps.templates.search { query: "<app name>" }` first.** Coolify ships 320+ vetted one-click templates. Each one has tested env defaults, healthchecks, `depends_on` graphs, and the right volume mounts. The same app deployed via `composeRaw` will hit application-specific quirks (URL validation, DB bootstrap order, secret generation) that the template author already solved.
**Never** create a Gitea repo just to host a third-party app's compose file.
**Recipe — deploying any popular app in 3 calls:**
```json
// 1. Find the right template slug
{ "action": "apps.templates.search", "params": { "query": "twenty" } }
// → { "items": [{ "slug": "twenty", "slogan": "Twenty is a CRM…", "tags": ["crm","self-hosted"], "port": 3000 }] }
// 2. Deploy it
{ "action": "apps.create", "params": { "template": "twenty", "name": "crm" } }
// → { "uuid": "...", "domain": "crm.<slug>.vibnai.com", "started": true,
// "note": "First boot may take 1-5 min while Coolify pulls images and runs migrations." }
// 3. Watch it come up
{ "action": "apps.logs", "params": { "uuid": "...", "lines": 200 } }
```
For `composeRaw` (only when no template exists), fetch the app's official `docker-compose.yml` (from GitHub/DockerHub) and pass it inline. Override any hard-coded image tags with pinned versions for reproducibility.
**Browsing the catalog** with `apps.templates.list { tag: "ai" }` returns all AI/ML templates; `{ tag: "crm" }` returns CRMs; etc. Useful when the user asks "what self-hosted analytics tools can I deploy?" or similar open-ended questions.
### 11.5 "Compose app fails on second+ deploy — relation/table does not exist"
Classic stale volume problem. Sequence of events:
1. First deploy: Postgres starts and auto-creates an empty `default` database (from `POSTGRES_DB` env var)
2. App server starts, tries to `CREATE DATABASE` or `DROP DATABASE` inside a transaction → Postgres rejects it
3. Deploy fails, containers stop — but the volume persists with the half-initialized DB
4. Second deploy: Postgres finds existing data, skips init — but schema is corrupt/incomplete
5. Server errors cascade forever
**Fix:**
```json
// Step 1: find the volume
{ "action": "apps.volumes.list", "params": { "uuid": "<app-uuid>" } }
// → { "volumes": [{ "name": "abc123_db-data", "sizeBytes": 8192 }] }
// Step 2: wipe it
{ "action": "apps.volumes.wipe", "params": { "uuid": "<app-uuid>", "volume": "abc123_db-data", "confirm": "abc123_db-data" } }
// Step 3: redeploy clean
{ "action": "apps.deploy", "params": { "uuid": "<app-uuid>" } }
```
If Postgres still auto-creates the database before the app server runs migrations, use `apps.exec` to drop it outside a transaction:
```json
{ "action": "apps.exec", "params": { "uuid": "<app-uuid>", "service": "db", "command": "psql -U postgres -c 'DROP DATABASE IF EXISTS \"default\";'" } }
```
Then redeploy.
### 11.7 "Healthcheck times out on first deploy"
Docker Compose healthchecks have a `start_period` grace window. Apps
that run long-running migrations on first boot (Twenty, Directus,
older Strapi versions) need a `start_period` that covers the cold
start, typically 120600s.
- Fix at the compose level: edit the repo's `docker-compose.yml` to
set `healthcheck.start_period: 300s` on the affected service, commit,
push, `apps.deploy`.
- Alternatively, handle migrations out-of-band via `apps.exec` and let
the default healthcheck succeed instantly.
### 11.8 "I can't tell what's inside the container"
`apps.exec` is the escape hatch. Useful shell one-liners:
| Goal | Command |
|---|---|
| List running processes | `ps -ef` |
| Show env vars | `env \| sort` |
| Check file exists | `ls -la /path/to/file` |
| Test DB connection | `nc -vz postgres 5432` or `psql $POSTGRES_URL -c 'select 1'` |
| Tail an app's internal log | `tail -200 /var/log/app.log` |
| Run a framework CLI | `yarn <script>`, `npm run <script>`, `python manage.py <cmd>` |
| Inspect filesystem diff vs image | `find /app -newer /tmp/marker -type f 2>/dev/null` |
Output is capped at 1 MB by default (bump with `max_bytes`). Commands
that could exceed the wall-clock timeout should bump `timeout_ms`
(max 600000 = 10 minutes).
### 11.9 "The agent wants to run something interactively"
It can't. `apps.exec` is strictly non-interactive: no TTY, no stdin,
no session resumption. For migrations and CLI invocations this is the
right shape. For genuinely interactive work (a debug shell), the
operator needs SSH + `docker exec -it` directly — outside the
platform's AI surface.
---
## 12. Where to look in the code
- `lib/auth/workspace-auth.ts` — `requireWorkspacePrincipal`, the gate.
- `lib/auth/secret-box.ts` — AES-256-GCM encryption of Gitea PATs.
- `lib/workspaces.ts` — `ensureWorkspaceProvisioned` (the idempotent setup).
- `lib/gitea.ts` — Gitea client (orgs, users, PATs, SSH keys).
- `lib/coolify.ts` — Coolify client, tenant helpers, all resource CRUD.
- `lib/coolify-ssh.ts` — SSH transport for tools that need host-level
docker access (`apps.logs`, `apps.exec`). Uses a dedicated
`vibn-logs` user on the Coolify host with docker-group membership
and no shell.
- `lib/coolify-containers.ts` — container enumeration + service
resolution, shared between logs and exec paths.
- `lib/coolify-logs.ts` — compose-aware log tailing.
- `lib/coolify-exec.ts` — one-shot `docker exec` over SSH with
timeout, output caps, and audit logging.
- `lib/naming.ts` — domain policy, slugify, SSH URL templates.
- `lib/ssh-keys.ts` — ed25519 keypair generation + OpenSSH formatting.
- `app/api/workspaces/[slug]/…` — REST surface.
- `app/api/mcp/route.ts` — MCP dispatcher and tool implementations.
- `components/workspace/WorkspaceKeysPanel.tsx` — settings UI.
# Vibn AI Capabilities (Condensed)
> **Note:** The definitive, ground-truth list of AI capabilities and instructions is maintained in the codebase at `vibn-frontend/lib/ai/vibn-tools.ts`.
## Core Architecture
Vibn uses an MCP (Model Context Protocol) adapter to expose backend systems to the AI.
The primary systems are:
1. **Coolify:** For orchestrating Docker containers, PostgreSQL databases, reverse proxies (Traefik), and deploying third party apps.
2. **Gitea:** For hosting source code and managing repositories.
3. **Dev Containers:** Persistent, per-project Docker environments (`vibn-dev-*`) where the AI can read, write, and execute code interactively before shipping.
## Tool Categories
- **Workspace & Identity:** Retrieve Gitea credentials and workspace metadata.
- **Projects & Planning:** Create projects, read/write objective documents (`plan_vision_set`), manage tasks, log decisions.
- **File System (`fs_*`):** Read, write, edit (with line-number granularity), grep, and tree codebase directories.
- **Shell (`shell_exec`):** Run terminal commands inside the dev container (e.g. `npm install`).
- **Dev Servers (`dev_server_*`):** Spin up background processes (like `npm run dev`), view their logs, and return live Preview URLs (`*.preview.vibnai.com`) backed by Traefik.
- **Apps & Databases:** Create, list, configure, and delete Coolify applications and databases.
- **Domains & Auth:** Manage DNS records via OpenSRS and deploy auth providers (NextAuth, Supabase, etc).
- **GitHub & Web (`github_*`, `http_fetch`):** Source open-source reference material, read documentation, and import repositories.
*Refer to the system prompt in `vibn-frontend/app/api/chat/route.ts` for exact rules on how the AI should behave.*

View File

@@ -73,14 +73,6 @@ a slow loop until this lands.
| # | Task | Owner | Effort | Status |
|---|---|---|---|---|
| 1.1 | Sign up for Cloudflare; add `vibnai.com`; verify imported records (MX, SPF, wildcard A, apex A) | Mark | 15 min | ✓ done |
| 1.2 | Switch Namecheap nameservers to Cloudflare-assigned NS pair | Mark | 2 min | ✓ done |
| 1.3 | Wait for propagation; verify `dig @1.1.1.1` from multiple resolvers | AI | 30120 min | ✓ done — `34.19.250.135` from CF + Google resolvers |
| 1.4 | Generate Cloudflare API token (DNS edit, `vibnai.com` only) | Mark | 2 min | ✓ done — stored in `.coolify.env` |
| 1.5 | Configure Traefik Let's Encrypt DNS-01 with the Cloudflare token | AI | 20 min | ✓ done — `letsencrypt-dns` resolver wired in `coolify-proxy` |
| 1.6 | Test wildcard cert issues for `*.preview.vibnai.com` (curl, browser) | AI | 10 min | ✓ done — both `*.vibnai.com` and `*.preview.vibnai.com` certs issued; `curl https://test.preview.vibnai.com` returns valid LE cert |
| 1.7 | Wire `dev_server.start` to mint Traefik labels with the wildcard host | AI | 1 hr | ✓ done — pre-baked labels for ports 30003009 in `vibn-dev` compose; YAML escape bug fixed; cert resolver fixed to `letsencrypt-dns` |
| 1.8 | Spike: WebSocket / Vite HMR through Traefik against `vibn-dev` container | AI | 30 min | ✓ done — `101 Switching Protocols`, `vite-hmr` subprotocol negotiated, `js-update` messages fire within ~1s of file edit. See verified config below. |
**Definition of done:** ✅ AI says "open a Vite dev server", user clicks the URL,
sees Vite's welcome page, edits a file via `fs.edit`, change appears in
@@ -111,13 +103,6 @@ server: {
|---|---|---|---|---|
| 2.1 | Reproduce + diagnose `ERR_HTTP_HEADERS_SENT` from prod logs | AI | 12 hrs | Likely a server action / API route returning twice |
| 2.2 | Reproduce + diagnose `TypeError: reading 'z'/'j'/'aa'` in prod bundle | AI | 12 hrs | Minified prod error; suspect `react-markdown` server/client boundary |
| 2.3 | Wire Sentry (or alternative) for both client + server runtime errors | AI | ✓ done 2026-05-01 | `@sentry/nextjs` v10 wired in `vibn-frontend`. `instrumentation.ts` (server+edge), `instrumentation-client.ts` (browser w/ Session Replay free tier, all text masked), `app/global-error.tsx`, `next.config.ts` wrapped with `withSentryConfig`. `NEXT_PUBLIC_SENTRY_DSN` and `SENTRY_AUTH_TOKEN` in Coolify env, with matching `ARG` lines in `vibn-frontend/Dockerfile`. End-to-end verified via `/sentry-example-page` 2026-05-01: client + server errors capture, breadcrumbs work, **stack traces de-minify to real filenames** (`app/sentry-example-page/page.tsx:49`). |
| 2.4 | Wire deployment-failed Coolify webhook → Slack/email | AI | ✓ done 2026-05-01 | Slack webhook wired into `slack_notification_settings` for both Coolify teams. Defaults: failure events on (deploy, backup, scheduled task, docker cleanup, server unreachable, disk usage), success events off. Tested with a manual webhook ping — confirmed in user's Slack. |
| 2.5 | Tighten Coolify docker prune to every 6 hrs (vs daily) | AI | ✓ done 2026-05-01 | Already configured: both servers use `docker_cleanup_frequency: "0 */6 * * *"` with `force_docker_cleanup: true`. Verified via `/api/v1/servers`. |
| 2.6 | Bake `HEALTHCHECK 127.0.0.1` into `vibn-frontend/Dockerfile` so future apps inherit | AI | ✓ done 2026-05-01 | Already in `vibn-frontend/Dockerfile:67-68`; comment explains the IPv6 trap |
| 2.7 | Audit other Dockerfile-based apps for the same `localhost`/IPv6 trap | AI | ✓ done 2026-05-01 | Audited `vibn-dev/Dockerfile` and `vibn-agent-runner/Dockerfile` — neither defines a HEALTHCHECK, so neither can hit the localhost/IPv6 trap. No action needed today; revisit when either gets a healthcheck added. |
| 2.8 | **Tool-error recovery middleware** (AI_HARNESS_GAPS.md §1) — pattern-match known-recoverable tool errors and inject synthetic instructions before the model's next round | AI | ✓ done 2026-05-01 | `vibn-frontend/lib/ai/error-recovery.ts`. Initial rules: orphan container conflict, image pull denied, port allocated. Wired into `app/api/chat/route.ts` tool-result loop. |
| 2.9 | **Sentry-as-product loop** (SENTRY_AS_PRODUCT.md) — auto-provision per-project Sentry, bake into scaffolds, expose error feed to AI as MCP tools, auto-surface unresolved errors at chat-turn start | AI | ✓ done 2026-05-01 | All 4 stages shipped: (1) `lib/integrations/sentry.ts` provisions per-project Sentry under shared `vibnai` org from `POST /api/projects/create` and lazily on `apps.create`; injects `NEXT_PUBLIC_SENTRY_DSN` + `SENTRY_AUTH_TOKEN` into Coolify app env. (2) `lib/scaffold/sentry-snippets.ts` ships canonical Next.js + Vite snippets; AI system prompt instructs it to wire Sentry on every new app; `projects.get` returns `sentry: {slug, dsn}`. (3) Three MCP tools: `project_recent_errors`, `project_error_detail`, `project_error_resolve` (tenant-safe). (4) `app/api/chat/route.ts` injects `[PROJECT HEALTH]` block at chat-turn start when ≥2-occurrence unresolved issues exist in last 6h. End-to-end verification deferred to smoke test (4.1). |
**Definition of done:** force-fail a route in staging → Sentry alert lands in
< 1 min. Force-fail a Coolify deploy → notification fires. Reproduce an
@@ -136,13 +121,9 @@ or gets out of the way. No screens that exist "to teach the data model".
| 3.1 | **Hosting tab rewrite** — focus on the domain (live URL, redeploy, env, logs) instead of master-detail of "live + previews" | AI | 4 hrs | Mark flagged earlier |
| 3.2 | Replace the chat's "⚠️ Failed to get response. Please try again." with structured errors that show what tool failed and why | AI | 2 hrs | Critical — currently zero feedback |
| 3.3 | Empty states across Plan/Product/Infrastructure/Hosting that suggest the **next** AI prompt to try (not just "nothing here") | AI | 2 hrs | Vibe coders need a nudge |
| 3.4 | Project header URL chips: collapse to a "+N" pill when there are >3 endpoints | AI | ✓ done 2026-05-01 | `components/project/project-header-urls.tsx`: bumped MAX_VISIBLE to 3, replaced title-tooltip with click-to-open popover (closes on outside-click + Escape). Each row in the popover is a real clickable link with icon + label + host. |
| 3.5 | Status pill: tooltip should link directly to Coolify build logs | AI | ✓ done 2026-05-01 | `components/project/project-stage-pill.tsx`: "Logs" affordance now appears on `deploying`, `down`, and `build_failed` (not just failures). Deep-links to `<COOLIFY_URL>/project/<coolifyProjectUuid>` — one click from build logs. (Direct deployment-uuid link blocked on extending anatomy to surface deployment UUIDs; tracked but low priority.) |
| 3.6 | Product tab: confirm it's actually useful day-to-day. Revise scope if not | Mark + AI | 1 hr | Open question |
| 3.7 | **Scope-doc upload in Plan tab** — drop a PDF/.md/.docx/.txt as the project brief; server extracts text, stores on `fs_projects.brief_text` + `brief_meta`, exposes via `[PROJECT BRIEF]` block in system prompt and a `project_brief` MCP tool for on-demand grep. New file: `lib/integrations/brief-extract.ts`. Empty state replaces "nothing here" on Plan. | AI | 3 hrs | Came up during smoke test prep — users will arrive with scope docs (PDF/Notion-export/Doc); right now there's no way to hand the AI the source of truth except paste-into-chat. |
| 3.8 | **"Stop at something tangible" — three layers** | AI | partially done | Came up watching Manifest scaffold — AI stopped at "everything is wired together" with no preview, leaving the user to wonder if any of it was real. Code on disk is invisible; preview URL is the proof. |
| 3.8a | System-prompt rule: dedicated "Stop at something the user can see" section + tightened build-me-X recipe so `previewUrl` is the explicit stopping point | AI | ✓ done 2026-05-04 | `app/api/chat/route.ts` `buildSystemPrompt`. For multi-service stacks, instructs AI to start the user-facing service first even if other services aren't done. |
| 3.8b | ~~Persistent quick-action chips above the chat input~~ **REVERTED 2026-05-04** | AI | reverted | Tried it; pulled it. The chip menu was prescriptive ("here's what to type") which conflicts with the principle that the AI should drive toward the goal without presenting the user a menu of homework. Welcome-screen suggested prompts kept (different context — empty conversation, user genuinely needs a starting nudge). The `sendMessage(override)` refactor + welcome-screen auto-send shipped from this work survived; only the composer chip row was removed. |
| 3.8c | Server-side enforcement: if a turn called `fs_write` ≥10 times for source files but never `dev_server_start` or `apps_deploy`, append a synthetic recovery instruction telling the model to either start a server or explain the blocker | AI | 1 hr | Safety net for when the model ignores the prompt rule under load. Add a tracker in `app/api/chat/route.ts` tool loop, fire the instruction inside the round 2 system message. |
**Definition of done:** a stranger lands on every tab in turn. None of them
@@ -160,10 +141,8 @@ concrete next action.
|---|---|---|---|---|
| 4.1 | End-to-end smoke test on a fresh account: signup → workspace → project → first chat → first preview → first deploy | Mark + AI | 2 hrs | Walk through with an empty cookie jar; fix everything broken. **Runbook below.** |
| 4.2 | Landing page at `vibnai.com` that explains the product in 30s | Mark + AI | 4 hrs | Currently a login screen |
| 4.3 | "Delete project" UI in project settings (and underlying Coolify cleanup) | AI | ✓ done 2026-05-04 | `app/api/projects/delete/route.ts` now cascades: stops + deletes the dev container service (with volumes + docker-cleanup), deletes every linked Coolify resource via `fs_project_resources`, deletes the per-project Coolify project shell when no other Vibn project shares it, drops `fs_project_dev_containers` + `fs_project_resources` rows, unlinks `fs_sessions`, then deletes `fs_projects`. Gitea repo + Sentry project are deliberately preserved (returned in the response so the user can recover code/error history). Failure inside cascade is logged but doesn't abort; partial failure leaves the orphan in Coolify for manual cleanup, which is strictly better than rolling back to a half-state. Smoke test 2026-05-04 found 2 ghost containers from previously-deleted projects consuming the user's full quota; cleaned up manually + shipped this fix to prevent recurrence. |
| 4.4 | "Delete workspace" UI — same | AI | 1 hr | |
| 4.5 | Auth hardening pass: NextAuth session expiry, CSRF on mutating routes, GitHub OAuth scope review | AI | 2 hrs | |
| 4.6 | Per-workspace compute quota: max N Coolify projects, max N dev containers, soft cap with friendly error | AI | ✓ done 2026-05-01 | `lib/quotas.ts`: 3 active projects + 3 active dev containers per workspace (suspended containers don't count). Overridable via `VIBN_QUOTA_MAX_PROJECTS_PER_WORKSPACE` / `VIBN_QUOTA_MAX_DEV_CONTAINERS_PER_WORKSPACE` env. Hits return HTTP 402 with structured payload; AI's error-recovery middleware has a `workspace-quota-exceeded` rule that explains the cap to the user without blind retries. Wired into `POST /api/projects/create` and `lib/dev-container.ts` ensure/resume paths. |
| 4.7 | Per-workspace audit log of mutating MCP calls (apps/databases/services create/delete) | AI | 2 hrs | We need this when something goes wrong |
| 4.8 | Invite link / waitlist page (manual approval) so we control who joins | Mark + AI | 1 hr | |
@@ -179,13 +158,11 @@ that aren't covered above.
| # | Task | Owner | Effort | Notes |
|---|---|---|---|---|
| 5.1 | Build `ghcr.io/vibnai/vibn-dev:latest` on the live Coolify host (`ssh + setup-on-coolify.sh`) | AI | ✓ done 2026-05-01 | Image `vibn-dev:latest` built 2026-04-30 on Coolify host (589 MB, last Dockerfile change Apr 28 so build is current). Smoke-tested as `vibn` user: ripgrep, git, mise all functional. Toolchains install on demand via mise. |
| 5.2 | Hard-remove `gitea_file_*` from the AI tool list; keep REST routes alive 30 days with deprecation header | AI | 1 hr | Path B week 3 task |
| 5.3 | Update `AI_CAPABILITIES.md` to reflect everything that shipped | AI | 1 hr | |
| 5.4 | Eval harness: 10 reference prompts, measure time-to-first-preview, time-to-shipped, tool-call count, success rate | AI | 12 days | The actual proof Path B works |
| 5.5 | Theia / openvscode-server toggle: "Open IDE" button in chat → `https://ide-{ws}-{project}.vibnai.com` | AI | 4 hrs | Week 4 nice-to-have; gates the "user becomes developer" graduation |
| 5.6 | Idle-suspend cron — wire `POST /api/admin/path-b/idle-sweep` to a 5-min schedule once we trust it | AI | 30 min | Keeps cost bounded |
| 5.7 | **Persistent dev container ↔ Gitea wiring** — auto-clone repo into `/workspace/<slug>/` on first chat turn; auto-commit + push at end of every turn so AI work surfaces in the Product tab without manual `gitea_*` calls | AI | ✓ done 2026-05-04 | `lib/dev-container-git.ts` (`ensureProjectRepoCloned`, `commitAndPushIfDirty`) wired into `app/api/chat/route.ts` pre-loop + turn-end. Tri-state probe (`git` / `dir` / `absent`) so projects with files-but-no-git auto-heal on next turn. Production fix shipped today: `GITEA_USERNAME` was missing from prod env so `isGiteaConfigured()` silently no-op'd; added the env value AND a defensive fallback to `GITEA_ADMIN_USER` in code. Backfilled `vibn-mark/manifest` repo manually from the dev container after the env fix. Smoke-tested by inspecting `/workspace/manifest/` over SSH bridge — 64 tracked files pushed, all 6 phase directories present. |
**Definition of done:** eval harness reports ≥3× speedup on time-to-first-preview
vs. Path A baseline, ≥80% success rate across the 10 reference prompts.

View File

@@ -1,292 +1,5 @@
# Agent telemetry & live execution stream — project spec
# Agent Telemetry Streaming (Historical)
This document captures **concrete product and engineering additions** discussed for Vibn: moving from **poll-based session updates** and **in-memory jobs** to a **durable, ordered, push-friendly execution timeline**—the web equivalent of a terminal agents clarity (step-by-step visibility, tool boundaries, failures, and later multi-agent signals).
> **Note:** This historical spec covered the implementation of real-time streaming for the AI agent loop (Server-Sent Events) and timeline rendering.
---
## 1. Why this exists
### Current behavior (baseline)
| Surface | How progress reaches the user | Limits |
|--------|------------------------------|--------|
| **Agent sessions** (`agent_sessions`) | Runner `PATCH`es `output`, `status`, `changed_files` to Next; UI **polls** `GET …/agent/sessions/[id]`. | Latency, reconnect story, no single ordered stream; rich semantics encoded only in `text`. |
| **Jobs** (`/api/agent/run`, `/api/jobs/:id`) | In-memory `job-store` (`progress`, `toolCalls[]`); UI polls job endpoint. | Lost on restart; not shared across runner replicas; not unified with session UI. |
| **Orchestrator / Atlas chat** | Request/response to runner; advisor path may be remote URL. | No execution timeline for “long COO run” in-product unless you add the same event layer. |
### Product intent
- **Trust during long runs**: users see *what* happened, *when*, and *whether something was blocked*—not only a final status.
- **Differentiation**: “Ink-like” clarity in the browser—structured steps, not a blob of logs.
- **Foundation for multi-agent**: handoffs, child work, and safety events need a **common event pipe**, not ad-hoc strings.
---
## 2. Goals
1. **Append-only execution events** with **monotonic ordering** (per session or per job), suitable for replay after refresh.
2. **Server-push to the client** (recommend **SSE** first; WebSocket if you need bi-directional on the same channel).
3. **Persistence** so reconnect, refresh, and horizontal scaling do not lose history.
4. **Single conceptual model** (`AgentEvent`) usable by:
- Build → **Agent** tab (sessions),
- **Job** flows (create/analyze-style),
- optionally **orchestrator** long runs later.
5. **Backward compatibility** during rollout: existing `PATCH` + `output` can remain as a fallback or be fed from the same emitter.
### Non-goals (for v1)
- Full **OpenTelemetry** export (optional later).
- **Real-time collaborative** multi-user cursors on the same session.
- Merging **claude-code-fork**—this spec is **API + UI + persistence** only.
---
## 3. Concept: `AgentEvent`
### Core shape (suggested)
```ts
type AgentEvent = {
seq: number; // monotonic per stream (session_id or job_id)
ts: string; // ISO-8601
runId: string; // session UUID or job id — ties events to a run
runKind: 'session' | 'job';
phase: 'queued' | 'running' | 'completed' | 'failed' | 'stopped';
type: AgentEventType;
payload: Record<string, unknown>; // type-specific
};
type AgentEventType =
| 'run.started'
| 'run.phase' // e.g. planning, executing, committing
| 'llm.turn.start'
| 'llm.turn.end'
| 'tool.start'
| 'tool.end'
| 'tool.output' // chunked stdout/stderr if needed
| 'safety.block' // policy / protected path / command denied
| 'file.changed' // maps to todays changed_files semantics
| 'git.commit'
| 'deploy.triggered'
| 'deploy.status'
| 'error'
| 'run.completed'
| 'handoff' // v2: parent → child agent
| 'child_job.started' // v2: linked run id
;
```
### Mapping from todays session `outputLine`
| Today (`outputLine.type`) | Suggested event(s) |
|---------------------------|--------------------|
| `step` / `info` | `run.phase` or `llm.turn.*` with summary in `payload.message` |
| `stdout` / `stderr` | `tool.output` or dedicated stream events |
| `error` | `error` + optional `safety.block` if policy-driven |
| `done` | `run.completed` |
Keep **human-readable `message`** on events for UI defaults; add **structured fields** (`tool`, `argsSummary`, `durationMs`) for timeline rendering and filters.
---
## 4. Architecture (high level)
```mermaid
flowchart LR
subgraph runner [vibn-agent-runner]
RA[runSessionAgent / runAgent]
EMIT[emitAgentEvent]
end
subgraph api [vibn-frontend Next.js]
ING[POST internal ingest or PATCH extend]
DB[(Postgres agent_events)]
SSE[SSE GET /api/.../stream]
end
subgraph browser [Browser]
UI[Timeline + live log]
end
RA --> EMIT
EMIT -->|HTTPS + secret or mTLS| ING
ING --> DB
UI -->|EventSource| SSE
SSE --> DB
```
**Principles**
- **Runner remains stateless** regarding “truth”: it emits events; **Next + DB** are the source of truth for the UI (matches todays session model).
- Alternatively, runner could expose **SSE directly**—usually worse for **auth**, **CORS**, and **one domain** for the product. Prefer **Next as SSE endpoint** reading from DB.
---
## 5. Backend: `vibn-agent-runner`
### 5.1 Emit from execution paths
| Location | Action |
|----------|--------|
| `agent-session-runner.ts` | Replace or supplement `patchSession` output-only updates with **`emitAgentEvent`** each turn / tool / error. |
| `runAgent` / tool loop (`executeTool`) | Same emitter for **job** runs. |
| `server.ts` `/agent/execute` | Emit `run.started` after 202; `run.completed` / `error` on exit. |
| Security / blocked tools (`security.ts` or equivalent) | Emit `safety.block` with reason code (no secrets in payload). |
### 5.2 Transport runner → Next
**Option A (recommended):** extend existing **PATCH** or add **`POST /api/internal/agent-events`** (or per-session batch append):
- Headers: `x-agent-runner-secret` (same as todays PATCH).
- Body: single event or small batch `{ events: AgentEvent[] }` with server-assigned `seq` to avoid races.
**Option B:** Runner writes to **Redis/Postgres** directly—couples runner to DB credentials; only do if you already run runner inside the same trust zone with DB URL.
### 5.3 Jobs store
- **Short term:** continue in-memory for job metadata; **persist events** to Postgres keyed by `jobId`.
- **Medium term:** optional **Redis** for job status + pub/sub to Next for low-latency SSE fanout (only if DB polling becomes a bottleneck).
---
## 6. Backend: `vibn-frontend` (Next.js)
### 6.1 Persistence
**New table (example): `agent_run_events`**
| Column | Notes |
|--------|--------|
| `id` | UUID |
| `run_id` | Session id or job id (text) |
| `run_kind` | `'session' \| 'job'` |
| `seq` | BIGSERIAL or per-run sequence enforced with unique constraint `(run_id, seq)` |
| `project_id` | Nullable for jobs if not scoped |
| `event` | JSONB — full `AgentEvent` or `{ type, ts, payload }` |
| `created_at` | default now() |
Index: `(run_id, seq)` for range queries (`WHERE run_id = $1 AND seq > $lastSeen`).
**Optional:** migrate legacy `agent_sessions.output` to be **derived** (last N lines for email export) or **dual-write** during transition.
### 6.2 SSE route (example contract)
- **`GET /api/projects/[projectId]/agent/sessions/[sessionId]/events/stream`**
- Auth: session cookie / same as GET session (user must own project).
- Query: `?afterSeq=123` for replay.
- Response: `text/event-stream`; each message: `data: {JSON}\n\n`.
- Heartbeat comments every ~1530s to keep proxies alive.
For **jobs** (if not project-scoped): `GET /api/jobs/[jobId]/events/stream` with appropriate auth.
### 6.3 Ingest route (runner-only)
- **`POST /api/internal/agent-events`** (or nested under project/session as you prefer).
- Validates `x-agent-runner-secret`.
- Inserts rows with **server-generated `seq`** (transaction per run or advisory lock per `run_id`).
---
## 7. Frontend (product UI)
### 7.1 Agent tab — timeline
- **EventSource** (SSE) subscription when session is `running`; on load, **fetch historical** events (`GET …/events?afterSeq=0` or SSE from 0).
- **Timeline components**:
- Group by `llm.turn` / `tool.start``tool.end`.
- Expandable tool args (sanitized).
- Distinct styling for `safety.block` and `error`.
- **Reconnect**: on `EventSource` error, reopen with `lastSeq` from last received event.
### 7.2 Jobs / analyze flows
- Same timeline component keyed by `jobId` if you surface those runs in UI.
- Unifies mental model: “every run has a stream.”
### 7.3 Deprecate slow polling
- Reduce `GET …/agent/sessions/[id]` poll interval when SSE connected; keep **single poll** for `status` / `changed_files` if those stay on session row only, or **also** emit `file.changed` events and drive UI from stream + one final consistency read.
---
## 8. Security & privacy
- **Never** put tokens, env values, or full file contents in events by default; use **truncation** and **hashes** where needed.
- **`safety.block`**: log reason **code** + user-safe message; align with `security.ts` behavior.
- **Rate limits** on ingest endpoint (per `run_id` / per IP) to avoid abuse if misconfigured.
---
## 9. Environment variables
| Variable | Where | Purpose |
|----------|--------|---------|
| `AGENT_RUNNER_SECRET` | Runner + Next | Ingest / extended PATCH auth |
| `VIBN_API_URL` | Runner | Base URL for callbacks |
| `AGENT_RUNNER_URL` | Next | Start runs (unchanged) |
Add if needed:
| Variable | Purpose |
|----------|---------|
| `AGENT_EVENTS_INGEST_PATH` | Optional override for ingest URL |
| `SSE_MAX_BUFFER` | Cap replay batch size |
---
## 10. Phased roadmap (suggested)
### Phase 1 — Foundation
- [ ] Define `AgentEvent` TypeScript types in a **shared package** or duplicated minimal types in runner + frontend.
- [ ] Create `agent_run_events` (or equivalent) + migration.
- [ ] Implement **ingest** endpoint; wire **runner session path** to emit core events: `run.started`, `tool.start` / `tool.end`, `error`, `run.completed`, `file.changed`.
- [ ] **Dual-write**: keep existing `PATCH` `outputLine` so nothing breaks.
### Phase 2 — Push
- [ ] SSE route + **EventSource** in Agent tab.
- [ ] Backfill UI from DB on mount; then live tail.
- [ ] Lower or gate polling on `GET` session.
### Phase 3 — Jobs + durability
- [ ] Emit same events from **job** execution path; persist by `jobId`.
- [ ] Optional: replace in-memory job list with DB for **multi-instance** runner (later).
### Phase 4 — Rich semantics
- [ ] `safety.block` from policy layer.
- [ ] `deploy.*` events if Coolify integration is user-visible.
- [ ] **Multi-agent**: `handoff`, `child_job.*` with links in payload.
---
## 11. Success metrics
- Time-to-first-visible-step after **Run** &lt; **1s** p95 (SSE).
- After hard refresh mid-run, user sees **consistent history** (no duplicate seq, no gaps if you guarantee at-least-once ingest with idempotency keys later).
- Support tickets / confusion drops on “what is the agent doing?” (qualitative).
---
## 12. Related code (repo anchors)
Use these when implementing:
- Runner session loop + PATCH bridge: `vibn-agent-runner/src/agent-session-runner.ts`
- Runner HTTP: `vibn-agent-runner/src/server.ts` (`/agent/execute`, `/agent/stop`, `/agent/approve`, `/api/agent/run`, `/api/jobs/:id`)
- In-memory jobs: `vibn-agent-runner/src/job-store.ts`
- Next session API + runner callback: `vibn-frontend/app/api/projects/[projectId]/agent/sessions/[sessionId]/route.ts`
- Session create + fire-and-forget execute: `vibn-frontend/app/api/projects/[projectId]/agent/sessions/route.ts`
---
## 13. Open decisions
1. **Single table** for sessions + jobs vs **two tables** (simpler queries vs flexibility).
2. **Seq generation**: DB sequence per `run_id` vs global monotonic with `(run_id, seq)` composite only in app logic.
3. **Idempotency**: runner retries may duplicate events—use **`event_id` UUID** from runner for dedupe on ingest.
4. **Orchestrator chat**: treat as v2 unless you need a **COO run** timeline immediately.
---
*Document version: 1.0 — aligned with discussion of runner ↔ frontend telemetry, SSE-first delivery, Postgres persistence, and future multi-agent event types.*
The streaming system is fully implemented in `app/api/chat/route.ts` and rendered in the frontend via `Timeline`, `ThinkingBubble`, and `TimelineToolGroup` components inside `chat-panel.tsx`.

View File

@@ -1,673 +1,5 @@
# Vibn AI Capability Roadmap
# AI Capabilities Roadmap (Historical)
> **⚠ See also:** [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md)
> — proposed pivot to a Claude-Code-style persistent dev container per
> project. Once approved, that doc supersedes any "code authoring" item
> in this roadmap; this file remains the source of truth for
> infrastructure primitives (P5.x, P6.x, P7.x).
>
> The ordered plan for closing the gap between what the Vibn agent can do
> today and what it needs to do for a real customer to ship, operate, and
> scale a SaaS through it.
>
> **Companion to:** [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (current state).
>
> **Prioritization framing:**
> 1. Does it unblock *shipping a real product* (not a demo)?
> 2. Does it unblock *surviving past the first paying customer*?
> 3. Does it only matter once usage scales?
>
> Tier 1 = (1). Tier 2 = (2). Tier 3 = (3). Tier 4 = revisit when demanded.
>
> **Sequencing rule:** complete Tier 1 before any Tier 2 item. The trap
> is polishing safety rails (audit, scopes, quotas) before the product is
> actually shippable.
> **Note:** This is a historical roadmap document. Most of the core Path B capabilities (persistent dev containers, Gitea mirroring, Traefik wildcard proxies) have been successfully shipped.
---
## 0. Substrate & constraints
Vibn runs on a two-cloud substrate, constrained to Canadian data residency:
| Layer | Provider | Region | Purpose |
|---|---|---|---|
| **App hosting** | Coolify (self-managed) | Montreal VPS | All app / database / auth containers. Current state. |
| **Managed services** | **Google Cloud** | `northamerica-northeast1` (Montreal) | Object storage, cron, queues, logs, backups, monitoring, secrets. |
| **Domain registration** | OpenSRS (Tucows) | Toronto | Wholesale domain API. Canadian company, pre-funded float account. |
| **Authoritative DNS** | Cloud DNS (default) / CIRA D-Zone (strict) | Global anycast / Canadian | Managed DNS for workspace-owned domains. |
| **Transactional email** | Amazon SES | `ca-central-1` (Montreal) | No GCP equivalent; AWS's Canadian region keeps data in-country. |
**Absolute rule: no customer data leaves Canada.** Every workspace-owned
resource (storage bucket, database, log bucket, task queue, scheduler
job, email message body) must be pinned to a Canadian region.
### Why mix clouds?
- **Coolify stays** because we already built the workspace-scoped
provisioning around it (Phase 4). Migrating apps to Cloud Run is a
rewrite we don't need.
- **GCP-CA** fills every managed-service gap Coolify has. Cheaper and
more reliable than self-hosting MinIO/Loki/scheduler.
- **AWS SES for email** because GCP has no first-party transactional
email service and SES `ca-central-1` is the only credible
Canadian-resident managed option.
- **OpenSRS for domains** because it's the wholesale API behind most
Canadian registrars, and we already have the deposit.
### Compliance upgrade path (Tier 4 territory)
For regulated customers (healthcare, financial, public sector):
- **Assured Workloads for Canada** on GCP — enforces Canadian personnel
access + data residency contractually.
- **CIRA D-Zone** instead of Cloud DNS — first-party Canadian managed DNS.
- Keep the SES and OpenSRS pieces as-is (already Canadian-resident).
Document the caveat on a public trust page. Build the Assured-Workloads
variant when a real customer asks.
---
## Current state (Phase 4 + P5.1 verified, Apr 2026)
- Workspace tenancy: Gitea org + Coolify project + SSH deploy key per
workspace.
- Agent can: create repos, create apps, provision 8 database flavors,
deploy 8 vetted auth providers, manage env vars, deploy + poll,
update, delete (with `?confirm=<name>`), set domains under
`*.{slug}.vibnai.com`.
- Control-plane MCP: 24 tools + full REST surface at `/api/mcp`.
API-key scoped per workspace.
- **P5.1 custom apex domains** — OpenSRS + Cloud DNS + Coolify
lifecycle (search / register / attach / inspect) shipped and
verified end-to-end against PROD GCP + OpenSRS sandbox + PROD
Coolify on `v4.0.0-beta.473` (2026-04-22). All 5 sub-systems green
in `smoke-attach-e2e.ts`: register → zone → A records → registrar
NS update → Coolify `fqdn` patch → cleanup. Required a server-side
config fix on `coolify-server-mtl` (proxy.type=TRAEFIK,
is_build_server=false) so `Server::isProxyShouldRun()` returns
true and the controller maps `domains``fqdn` — see
[`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) § 3.6 for the gory details.
- **Agent-runner stdio MCP bridge** — `vibn-agent-runner` now exposes
its full in-house toolkit (28 tools) outward over 5 stdio MCP
servers so external clients (Cursor, Claude Desktop, Goose) can
drive the same Coolify / Gitea / workspace / memory / search /
sub-agent surface as the internal Coder/PM/Marketing agents, with
shared protected-repo + protected-app guardrails. Every tool now
has a pure `*-api.ts` module, a registry wrapper for the in-process
loop, and an MCP server wrapper — single source of truth, verified
by `scripts/smoke-mcp.js`.
- Enforced: tenant isolation, domain policy, delete confirms,
secrets-at-rest encryption, protected-repo / protected-app guards.
See [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (§ 3.6 for P5.1,
§ 3.7 for the stdio MCP bridge) for the complete current surface.
---
## Tier 1 — Blocks shipping a real product
Without these, anything the agent builds is *demo-shaped*. Ship these
next, in the recommended sequence below.
### P5.1 · Custom apex domains via OpenSRS
**Goal:** agent buys `mysaas.com` on the user's behalf and attaches it
to a Coolify app with automatic TLS.
**Why now:** you already opened an OpenSRS reseller account with a $100
float. Unlocks real branding, DKIM for email (P5.2 depends on this),
and gives you a revenue line (markup on domains).
**Surface:**
| Tool / endpoint | Purpose |
|---|---|
| `domains.search` | Live availability + suggestions via OpenSRS `lookup`. |
| `domains.check_price` | Per-TLD price from OpenSRS + markup. |
| `domains.register` | Debits workspace float, registers via OpenSRS. |
| `domains.list` | Workspace's owned domains. |
| `domains.renew` / `domains.transfer` | Lifecycle. |
| `domains.{name}.attach` | Attach to a Coolify app: DNS records + Coolify `fqdn` + Let's Encrypt. |
| `domains.{name}.detach` | Free a domain from an app, keep registration. |
| `domains.{name}.attach_status` | Polls DNS propagation + cert issuance (async). |
**Infra:**
- **OpenSRS client** (their XML/SOAP or REST API).
- **Cloud DNS** for zone management (default). CIRA D-Zone available as a
workspace-level preference for strict-residency customers.
- **Workspace float ledger** (`vibn_workspace_billing_float`) — a
prepaid balance in CAD, debited on register/renew. Reconciled nightly
against the OpenSRS master deposit.
- `VIBN_OPENSRS_DEPOSIT_ACCOUNT` as the master float handle.
**New columns** on `vibn_workspaces`:
- `preferred_dns_provider TEXT DEFAULT 'cloud_dns'`
- `cloud_dns_zone_name TEXT` ← GCP managed zone for this workspace.
**Risks:**
- DNS propagation is human-scale (minuteshours). Agents need the
async `attach_status` polling loop, not a sync call.
- Cert issuance via Let's Encrypt is rate-limited (50/week per domain).
Abuse-prevent with per-workspace rate caps.
**Estimate:** **2 weeks.**
---
### P5.2 · Transactional email (AWS SES `ca-central-1`)
**Goal:** auth providers can send password-reset emails; agents can
`email.send` from `noreply@mysaas.com`.
**Why now:** every auth provider on the allowlist is broken without
SMTP. Also pairs with P5.1 — per-workspace sender domains need DKIM on
domains you own.
**Why SES ca-central-1 specifically:** GCP has no first-party
transactional email service. All mainstream providers (Postmark,
Resend, Mailgun, SendGrid) are US-primary. SES's Montreal region is the
only credible managed option that keeps message bodies in Canada.
**Two-phase rollout:**
**Phase A — shared-sender MVP (1 week):**
- One SES-verified sender domain `mail.vibnai.com`.
- Every workspace can send from `noreply@mail.vibnai.com` out of the box.
- `email.send` tool + injected `SMTP_*` env vars.
- Bounce / complaint webhooks routed via SNS → a Cloud Run service
that writes per-workspace notifications.
**Phase B — per-workspace sender domains (1 week, depends on P5.1):**
- `email.verify_sender_domain` creates the SPF/DKIM/DMARC records via
the Cloud DNS / CIRA D-Zone client on a workspace-owned domain.
- Polls SES verification; flips `verified=true` when done.
- Workspace can now `email.send from: founder@mysaas.com`.
**Surface:**
| Tool | Purpose |
|---|---|
| `email.send` | Single message; returns SES `message_id`. |
| `email.send_batch` | Up to 100 at a time. |
| `email.list_messages` | Recent sent mail + delivery state (from SES + our log). |
| `email.verify_sender_domain` | Kick off DKIM for a workspace-owned domain. |
| `email.sender_status` | Poll verification state. |
| `email.webhooks.list` | Recent bounces/complaints. |
**Infra:**
- SES identity per workspace-owned sender domain.
- SNS topic → Cloud Run webhook receiver (in `northamerica-northeast1`)
for bounce/complaint ingestion.
- Rate limits: start in SES sandbox (200/day), request production limits
after first real customer.
**Estimate:** **2 weeks total** (1 week Phase A + 1 week Phase B).
---
### P5.3 · Object storage (Google Cloud Storage, `northamerica-northeast1`)
**Goal:** any SaaS the agent builds can take user uploads — avatars,
attachments, exports, images — without the user pasting in third-party
credentials.
**Why now:** "can users upload a file?" is the #1 post-demo question.
Blocks ~half of realistic SaaS ideas.
**GCP collapses this item.** No MinIO container to babysit; GCS provides
managed bucket + signed URLs + lifecycle policies + encryption out of
the box.
**Surface:**
| Tool | Purpose |
|---|---|
| `storage.buckets.list` | Buckets in this workspace (filtered by `workspace={slug}` label). |
| `storage.buckets.create` | New bucket. Optional `public_read`. Enforced region: `northamerica-northeast1`. |
| `storage.buckets.delete` | Destroy bucket. `confirm` gate. |
| `storage.presign_upload` | PUT URL, TTL, content-type constraint. |
| `storage.presign_download` | GET URL, TTL. |
| `storage.list_objects` | Pagination + prefix filter. |
| `storage.delete_object` | Single object. |
| `storage.set_lifecycle` | TTL delete, multipart cleanup, archive tiering. |
**Provisioning additions:**
- Default bucket `vibn-ws-{slug}` created on workspace provision.
- Uniform bucket-level access enabled by default.
- Per-workspace GCP service account `vibn-ws-{slug}@...`, scoped to its
own bucket via `roles/storage.objectAdmin`.
- Keyfile stored encrypted (AES-256-GCM, same `VIBN_SECRETS_KEY`) in
`vibn_workspaces.gcp_service_account_key_encrypted`.
**New columns** on `vibn_workspaces`:
- `gcs_bucket_name TEXT`
- `gcp_service_account_email TEXT`
- `gcp_service_account_key_encrypted BYTEA`
**Env injection:**
- `STORAGE_ENDPOINT=https://storage.googleapis.com`
- `STORAGE_BUCKET={workspace-bucket-name}`
- `STORAGE_ACCESS_KEY`, `STORAGE_SECRET_KEY` (S3-compatible via GCS HMAC keys)
— auto-injected on app creation so agent code uses standard S3 SDKs.
**Estimate:** **3 days.**
---
### P5.4 · Workers, cron, and queues (Cloud Tasks + Cloud Scheduler + Cloud Run Jobs)
**Goal:** agents can declare async workers, scheduled jobs, and queued
tasks. Anything that isn't a single `ports: 3000` web container.
**Why now:** webhooks, retries, nightly cleanup, image processing,
email sending — every real SaaS needs a non-web process. Current
workaround (second Coolify app) is brittle and manual.
**Hybrid approach — Coolify for compute, GCP for orchestration:**
Option evaluated and chosen:
- **Cloud Scheduler** (`northamerica-northeast1`) for cron: fires
HTTP webhooks into the app at the scheduled time.
- **Cloud Tasks** (`northamerica-northeast1`) for queue: agent code
calls `enqueue(task)`, Cloud Tasks dispatches to the app's worker
endpoint with retries, backoff, and at-least-once semantics.
- **Worker process** stays on Coolify as a second app-per-repo with a
different start command, exposed on an internal URL.
Rejected alternative: migrate everything to Cloud Run Jobs. More managed
but splits the "Live" view across two deploy targets and changes the
agent's mental model. Not worth it for MVP.
**Shape — extend `apps.create`:**
```json
{
"repo": "my-site",
"services": {
"web": { "command": "npm start", "ports": "3000" },
"worker": { "command": "npm run worker", "replicas": 2 }
},
"cron": [
{ "name": "nightly-backup", "schedule": "0 3 * * *", "path": "/tasks/backup" },
{ "name": "sync", "schedule": "*/10 * * * *", "path": "/tasks/sync" }
],
"queues": [
{ "name": "emails" },
{ "name": "image-processing" }
]
}
```
Internally creates: two Coolify apps (web + worker), N Cloud Scheduler
jobs labeled `workspace={slug}`, N Cloud Tasks queues.
**Surface additions:**
| Tool | Purpose |
|---|---|
| `apps.services.list` | All processes in an app. |
| `apps.services.update` | Scale replicas, change command. |
| `apps.services.logs` | Per-process logs. |
| `cron.list` | Scheduler jobs in this workspace. |
| `cron.create` / `cron.update` / `cron.delete` | Manage scheduled jobs. |
| `cron.run_now` | Fire a scheduled job immediately (useful for agent testing). |
| `queues.list` | Cloud Tasks queues in this workspace. |
| `queues.create` / `queues.delete` | Manage queues. |
| `queues.enqueue` | (Normally called from app code, but exposed for agent-driven testing.) |
| `queues.pause` / `queues.resume` | Emergency ops. |
**New columns** on `vibn_workspaces`:
- `cloud_scheduler_location TEXT DEFAULT 'northamerica-northeast1'`
- `cloud_tasks_location TEXT DEFAULT 'northamerica-northeast1'`
**Auth to GCP:** per-workspace service account (provisioned in P5.3) is
extended with `roles/cloudscheduler.admin` and `roles/cloudtasks.admin`
*scoped to resources labeled `workspace={slug}`* via IAM conditions.
Agents can only act on their own workspace's jobs/queues.
**Estimate:** **1 week.**
---
### Tier 1 total: ~5 weeks of focused work
After Tier 1 lands, an agent can:
- Buy `mysaas.com`, point it at a Next.js app.
- Deploy Authentik with working password-reset emails from `noreply@mysaas.com`.
- Offer user uploads (avatars, attachments).
- Run `0 3 * * *` nightly cleanup cron.
- Process Stripe webhooks idempotently via a retry queue.
That's a shippable SaaS. Everything after this is about *keeping* it
shipped.
---
## Tier 2 — Blocks surviving past the first real customer
Once users exist, these prevent silent failures.
### P6.1 · Database backups + restore (GCS + wal-g)
**Goal:** nightly backups, on-demand backups, one-call restore. No
"agent ran `DROP TABLE` in a migration" permanent data loss.
**Why:** scariest item on this list. Failure mode is irrecoverable.
**Shape:**
- `databases.{uuid}.backup` — on-demand `pg_dump` / `mongodump` to the
workspace's GCS bucket (depends on P5.3).
- `databases.{uuid}.backups.list` — lists backups with timestamp + size.
- `databases.{uuid}.backups.restore``confirm`-gated restore from a
specific backup uuid.
- Per-database backup policy: daily / hourly / off, retention days.
- Default: every AI-created database gets daily backups + 7-day
retention on.
**Infra:**
- Cron jobs run via P5.4's Cloud Scheduler primitive.
- Stored at `gs://vibn-ws-{slug}/backups/{db-uuid}/{iso-timestamp}.sql.gz`.
- Lifecycle rules auto-delete backups older than retention.
- Object-level retention lock available for "immutable backups" on
request (Tier 3 feature).
**Upgrade path:**
- **Postgres point-in-time recovery** via `wal-g` shipping WAL segments
to the same GCS bucket. Adds RPO < 5 min.
- **ClickHouse**: `clickhouse-backup` to GCS.
- **MongoDB**: `mongodump` incremental.
**Estimate:** **3 days** for MVP (pg_dump + schedule + restore).
**+1 week** for wal-g PITR if/when a customer asks.
---
### P6.2 · Runtime log streaming (Cloud Logging)
**Goal:** agent can see "is the app erroring at 10 req/s right now?",
not just "did the build succeed."
**Why:** today deploy logs are surfaced but container stdout/stderr is
not. An agent that "fixed a bug" can't verify the fix without a human
SSH-ing into Coolify.
**GCP collapses this item** — ship container logs to Cloud Logging with
a workspace label, query via the logs API.
**Shape:**
- Fluent-bit sidecar (or Coolify label) ships container stdout/stderr
to Cloud Logging in `northamerica-northeast1` with labels
`workspace={slug}`, `app={app-uuid}`, `service={web|worker|...}`.
- Per-workspace log bucket for retention isolation.
**Surface:**
| Tool | Purpose |
|---|---|
| `apps.logs` | Last N lines across replicas. Filter by timestamp, severity. |
| `apps.logs.tail` | SSE stream of new log lines. |
| `apps.logs.search` | Thin wrapper on Cloud Logging's query API — grep, severity filter, time window. |
| `apps.services.logs` | Same, scoped to a single service. |
**Retention:** default 30 days in the workspace log bucket; exportable
to the workspace's GCS bucket on request for long-term storage.
**Estimate:** **3 days** (fluent-bit config + thin API wrapper).
---
### P6.3 · Scoped API keys
**Goal:** invite a CI bot or teammate without giving root on the
workspace.
**Why:** solo-builder flow survives without it. Breaks the moment a
second principal enters.
**Shape:**
- Keys gain `scopes: string[]` and optional `expires_at`.
- Scope tokens: `apps:read`, `apps:write`, `apps:delete`,
`databases:*`, `auth:*`, `domains:read`, `domains:write`,
`storage:*`, `email:send`, `cron:*`, `queues:*`, `deploy:*`.
- Per-scope rate limits optional (Tier 3; API shape supports it from
day one).
**Surface changes:**
| Tool | Change |
|---|---|
| `keys.create` | Accepts `scopes`, `expires_at`. |
| `keys.list` | Returns scopes per key. |
| `keys.rotate` | Mints new token, preserves scope set. |
Every MCP/REST handler gets a scope requirement checked in the
principal resolver.
**Estimate:** **1 week.**
---
### Tier 2 total: ~2 weeks
After Tier 2 lands, a SaaS shipped on Vibn can survive without you
dropping into a psql REPL at 3am.
---
## Tier 3 — Matters once usage scales
Don't build these until at least one real customer is hitting them.
Building them pre-market is the classic infra-overinvestment trap.
### P7.1 · Per-workspace quotas + cost caps
Max apps, max dbs, max GCS GB, max egress, max SES messages/month, max
OpenSRS spend/month. Per-plan configurable. Hallucinating agents can't
OOM the cluster or burn your SES reputation.
### P7.2 · Audit log
Append-only per-workspace log of (principal, action, params, timestamp,
result). Cloud Logging with a dedicated `audit-logs` log-bucket, 400-day
retention. Read API for the settings panel. Needed for any
SOC-2-adjacent buyer.
### P7.3 · Preview-per-PR environments
Open a PR → `pr-42.mark.vibnai.com` deploys automatically with a
throw-away database. Teardown on PR close/merge. Unblocks multi-agent
flows.
### P7.4 · Atomic multi-resource operations (`stacks`)
`POST /stacks` takes a full app + db + auth + domain + cron spec;
creates atomically, rolls back on failure. Agent ergonomics win once
demo flow is routine.
### P7.5 · Billing integration
Stripe subscriptions for Vibn itself (workspace billing), plus
per-workspace float top-ups, plus reconciliation to the OpenSRS master
deposit and GCP / SES cost allocation. Only needed when you charge
real dollars.
### P7.6 · Assured Workloads for Canada
GCP policy-enforced Canadian residency + Canadian personnel access.
For regulated customers (healthcare, financial, public sector). Priced
accordingly; ship only when a real customer needs it.
### P7.7 · CIRA D-Zone as a workspace DNS option
Swap Cloud DNS → CIRA D-Zone for a workspace with strict residency
requirements. API-compatible wrapper so nothing agent-facing changes.
---
## Tier 4 — Revisit when demanded
Items to explicitly *not* build until a concrete customer asks.
- **Multi-region** — single-region Canada is fine for B2B SaaS makers
(our early market).
- **Cloud Run migration** — would rewrite most of Coolify-based
capabilities. Revisit if/when Coolify becomes a bottleneck.
- **Managed search / vector DB as first-class types** — agents can
deploy Meilisearch / Typesense / pgvector-Postgres as regular services.
- **mTLS / custom CAs / BYO-cert upload** — enterprise creep.
- **MCP protocol polish** (streaming, resources, prompts, per-tool
schemas) — current JSON-over-HTTP works. Revisit on real friction.
- **Per-app basic auth, IP allowlists, WAF** — Traefik middleware
manually until someone asks.
---
## Roadmap at a glance
| Phase | Items | Est. | Unblocks |
|---|---|---|---|
| **P5 — Real SaaS primitives** | Domains, email, storage, workers/cron/queues | ~5 wk | Shipping a real product |
| **P6 — Keep-it-running** | Backups, runtime logs, scoped keys | ~2 wk | First real customer survives |
| **P7 — Scale** | Quotas, audit, previews, stacks, billing, Assured Workloads, D-Zone | demand-driven | Platform grows past 1st cohort |
| **P8+** | Tier 4 items | never, unless pulled by customer | — |
**Total to "agent ships a SaaS a founder would pay $29/mo for":**
P5 + P6 = **~7 weeks** (was ~11 before GCP-CA; ~40% compression from
managed-service leverage).
---
## Dependency graph
```
P5.1 Domains ──┬──→ P5.2 Email Phase B (per-domain DKIM)
├──→ P7.7 CIRA D-Zone swap
└──→ (future: customer-owned sub-domain routing)
P5.3 Storage ──┬──→ P6.1 Database backups (backups need a bucket)
└──→ P7.2 Audit log export
P5.4 Workers/cron/queues ──┬──→ P6.1 Database backups (run via scheduler)
└──→ most real SaaS patterns
P6.2 Runtime logs — independent, can land anytime
P6.3 Scoped keys — independent, can land anytime
P7.6 Assured Workloads — wraps everything; build once demanded
```
**Parallelizable (three people):**
- Track A: P5.1 → P5.2
- Track B: P5.3 → P6.1
- Track C: P5.4 → P6.2
Track C finishes earliest; use that slack to land P6.3.
---
## Per-workspace GCP provisioning (shared across P5.3, P5.4, P6.1, P6.2)
`ensureWorkspaceProvisioned()` gains a GCP-CA block that runs once per
workspace, idempotently. All resources are created in
`northamerica-northeast1`.
| Resource | Name pattern | Notes |
|---|---|---|
| GCS bucket | `vibn-ws-{slug}` | Uniform bucket-level access. Lifecycle policies off by default. |
| Cloud DNS managed zone | `vibn-ws-{slug}-zone` | Created per workspace-owned domain in P5.1, not on workspace provision. |
| Cloud Logging log bucket | `vibn-ws-{slug}-logs` | 30-day retention default. |
| Cloud Tasks location | `northamerica-northeast1` | Queues created per-app in P5.4, not here. |
| GCP service account | `vibn-ws-{slug}@{project}.iam` | Single SA per workspace, narrow roles. |
| Service account key | stored encrypted in `vibn_workspaces` | AES-256-GCM, same `VIBN_SECRETS_KEY`. |
**New columns** on `vibn_workspaces` (cumulative across P5.1-P6.2):
```sql
-- P5.1
preferred_dns_provider TEXT DEFAULT 'cloud_dns',
cloud_dns_zone_name TEXT,
-- P5.3
gcs_bucket_name TEXT,
gcp_service_account_email TEXT,
gcp_service_account_key_encrypted BYTEA,
-- P5.4
cloud_scheduler_location TEXT DEFAULT 'northamerica-northeast1',
cloud_tasks_location TEXT DEFAULT 'northamerica-northeast1',
-- P6.2
cloud_logging_bucket_name TEXT
```
Three migration steps, one per phase. All guarded by the existing
admin-gated `POST /api/admin/migrate` endpoint.
---
## Non-goals (stated explicitly so they don't creep in)
- **A general-purpose PaaS.** Vibn is an agent-driven SaaS builder, not
a Heroku / Fly clone. Every capability must answer "what does an agent
need to build a SaaS?" — not "what does a dev need to deploy a
container?"
- **Support for non-allowlisted auth providers, databases, services.**
The curated surface is the feature. "Any Coolify service" would blow
up the tenant-safety model and dilute agent decision-making.
- **A consumer-facing OpenSRS UI.** OpenSRS is plumbing for the agent.
Humans should never see an OpenSRS checkout screen — only
`domains.register { name: "mysaas.com" }` from the agent.
- **Multi-cloud abstraction layer.** One Coolify cluster + GCP-CA +
SES-CA + OpenSRS is the contract. If customers want to bring their
own, that's Tier 4.
- **Anything that moves customer data out of Canada.** Even for
performance. If a managed service only has US regions, we self-host
in Canada or we don't offer it.
---
## Recommended execution order (opinionated)
Given dependencies and quick-wins-first philosophy:
**Week 1:**
- P5.3 Storage (GCS wrap, 3 days) → proves the GCP-CA provisioning pattern.
- P5.4 Workers/cron/queues (starts in parallel; depends on P5.3 only for
the service account).
**Week 2:**
- P5.4 completes.
- P5.1 Domains starts (OpenSRS client + Cloud DNS wrapper).
**Week 3:**
- P5.1 completes.
- P5.2 Email Phase A (shared-sender MVP) starts.
**Week 4:**
- P5.2 Phase A completes.
- P5.2 Phase B (per-domain DKIM) starts, now that P5.1 is available.
**Week 5:**
- P5.2 Phase B completes. **P5 / Tier 1 done.**
- P6.1 Database backups starts (3 days).
- P6.2 Runtime logs starts in parallel (3 days).
**Week 6:**
- P6.3 Scoped keys (1 week).
**Week 7:**
- Slack week — hardening, docs (`AI_CAPABILITIES.md` refresh), first
real customer onboarding.
**End state at week 7:** agent can take a founder from "I have an idea"
to "I have `mysaas.com` live, with auth, with user uploads, with email,
with backups, with visible error logs, and a CI bot can deploy it
without root access."
That's the Vibn product.
---
## How to use this doc
- When someone proposes a feature, find its tier. If it's Tier 3 or 4
and we're still shipping Tier 1, say no.
- Before starting a Tier 1 item, re-read its section and make sure
prerequisites shipped. Email-per-domain before domains is wasted code.
- [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) is the canonical
reference of *what exists today*. This doc is the canonical reference
of *what comes next*. When an item ships, move it from here to that
doc and delete its section here.
- When a user request implies Canadian residency (they say "PIPEDA",
"healthcare", "public sector", or "our data can't leave Canada"), pin
the answer to this doc's §0 Substrate & constraints. Don't improvise.
Current pending capabilities/roadmap items are tracked in `BETA_LAUNCH_PLAN.md`.

View File

@@ -1,227 +1,8 @@
# AI Harness Gaps — Proposal
# AI Harness Stability & Middleware (Shipped)
> Four gaps in the Vibn AI experience that are **structural, not promptable**.
> Each one is responsible for a specific failure pattern visible in real
> production chat transcripts. None of them are scoped in
> [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md),
> [`BETA_LAUNCH_PLAN.md`](./BETA_LAUNCH_PLAN.md),
> [`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md), or the
> agent-execution / telemetry-streaming designs.
>
> **Drafted:** 2026-04-30 (after a transcript review of the Dr Dave + Twenty CRM threads).
>
> **Why these four:** they share a common shape — the model is doing what
> the prompt told it to, and still producing a bad outcome. The fix lives
> in the *harness around the model*, not in instructions to the model.
> **Note:** These middleware stability mechanisms have been shipped.
---
## TL;DR
| # | Gap | Failure pattern in prod | Fix size |
|---|---|---|---|
| 1 | Tool-error recovery middleware | Orphan twenty-* services (4 shipped). Model keeps delete-and-recreating despite explicit prompt rule against it. | ~2 hr |
| 2 | Browser-driver tool for the AI | "Should be live in 10s" — AI ships URLs without ever loading them; user discovers the 502. | ~4 hr |
| 3 | Live UI state attached to chat messages | "this isn't working" / "fix the URL" with no signal of which "this". AI guesses, often wrong. | ~3 hr |
| 4 | Diff preview / accept-changes gate | `fs_edit` writes straight to the dev container with no review surface. Fine for sub-second iteration; bad for prod-bound edits. | ~6 hr |
Total: ~15 hr of work. None require new infra.
---
## Gap 1 — Tool-error recovery middleware (highest ROI)
**Failure observed:** in thread `d698ef40-…` ("Hey there, what can you see about this project?"), the AI hit
`Conflict. The container name "/postgres-…" is already in use` **three separate times**.
On each attempt it responded by *creating a new service with a new name*,
not by calling `apps_unstick`. The prompt explicitly tells it not to do
this and tells it the recovery sequence. The model still did it.
**Why prompt rules fail here:** the model treats the system prompt as
soft guidance against a 30k-token document; the tool result is concrete
and 200ms-fresh. When tool reality contradicts prompt rules, tool
reality wins.
**Proposed fix:** middleware in `executeMcpTool` that pattern-matches
known-recoverable errors and **injects a synthetic system message** into
the conversation before the next round. The model can't ignore an
injected instruction the way it can ignore a static prompt rule.
```ts
// In app/api/chat/route.ts, around the executeMcpTool call:
const errorRecovery = detectKnownError(result);
if (errorRecovery) {
messages.push({
role: "system",
content: `[RECOVERY] ${errorRecovery.diagnosis}. Required next action: ${errorRecovery.fix}. Do NOT ${errorRecovery.antipattern}.`,
});
}
```
**Initial recovery rules** (high-confidence, low-false-positive):
| Error signature | Diagnosis | Fix | Antipattern |
|---|---|---|---|
| `Conflict. The container name … is already in use` | Orphan container blocking new boot | `apps_unstick { uuid }` then `apps_deploy { uuid }` | Delete and recreate with a new name |
| `pull access denied` / `manifest unknown` | Image not on the host yet | `apps_repair { uuid }` | Retry deploy without addressing the cause |
| `port … is already allocated` | Another container holds the port | List containers, identify holder, decide | Pick a random different port |
**Effort:** ~2 hr. New file `lib/ai/error-recovery.ts` with a registry of
patterns + the injection in the chat route. Each rule is ~10 lines.
**Slot into:** `BETA_LAUNCH_PLAN.md` Phase 2 (Stability & visibility) — fits next to 2.4 (deployment-failed webhook).
---
## Gap 2 — Browser-driver tool for the AI
**Failure observed:** in the same Twenty thread, the AI said *"It's
fully deployed, healthy, and I've verified it's returning a 200 OK
status"* — but the user saw "Unable to Reach Back-end" on the actual
page. The AI checked Coolify's status reporting, not the rendered app.
Also visible in the Dr Dave thread: *"Note: it might take 10-15 seconds
on the very first load for the DNS to propagate"* — the AI hedged
because it couldn't load the URL itself.
**Why this matters for beta:** every "I deployed it" claim is unverified
unless the AI can open the URL. Sentry (planned in P2.3) catches
errors *after a user hits them*. A browser tool catches errors
*before any user hits them*.
**Proposed fix:** add a `browser.*` MCP tool surface backed by a
headless Chromium running on the Coolify host (or in the vibn-dev
container). Initial tools:
| Tool | Purpose |
|---|---|
| `browser.navigate { url, timeoutMs? }` | Load the URL, return final URL + status code + page title |
| `browser.screenshot { url }` | Visual confirmation. Return base64 PNG (or store in GCS) |
| `browser.console_logs { url }` | Capture client-side JS errors (the `TypeError: reading 'z'/'j'/'aa'` from BETA P2.2 would be findable this way) |
| `browser.fetch { url, headers? }` | HTTP-level smoke test. Subset of `http_fetch` but always from inside Vibn's network |
**Implementation:** Playwright already has an MCP server (`@modelcontextprotocol/server-playwright`).
Wire it as a Coolify service, expose via the same per-workspace MCP
token Vibn already issues.
**Effort:** ~4 hr. ~2 hr to deploy Playwright as a service, ~1 hr to
add tool definitions, ~1 hr to wire prompt instructions ("after any
deploy or `dev_server.start`, call `browser.navigate` to confirm").
**Slot into:** Phase 2 (Stability & visibility) — pairs with the
runtime error chase (2.1, 2.2) and the Sentry wiring (2.3).
---
## Gap 3 — Live UI state attached to chat messages
**Failure observed:** in the Dr Dave thread, user typed *"are you able
to give me a preview url?"* The AI didn't know which port the
Next.js dev server would bind to, what was already running, or
whether the user was looking at the chat or another tab. It
guessed and re-discovered everything from scratch.
In the Twenty thread, *"can you see the different sections?"* — user
meant Plan tab sections (Vision/Tasks/Decisions/Ideas). AI listed
metadata. No way to know.
**Why prompt rules can't fix this:** the AI literally lacks the
information.
**Proposed fix:** the chat panel sends a small `uiContext` object
alongside every user message. Inject into the system prompt as a
dynamic block (same shape as `activeBlock`):
```ts
{
currentRoute: "/mark-account/project/abc/hosting",
currentTab: "hosting",
visibleResources: [
{ kind: "app", uuid: "y4cs…", name: "vibn-frontend" },
{ kind: "service", uuid: "igcp…", name: "vibn-dev-twenty-crm" },
],
lastUserActions: [
{ at: "2m ago", action: "opened twenty-crm logs" },
{ at: "5m ago", action: "switched to Hosting tab" },
],
}
```
System-prompt block becomes:
> The user is currently looking at the **Hosting tab** (route: `…/hosting`).
> Visible resources: `vibn-frontend`, `vibn-dev-twenty-crm`.
> Recent actions: opened twenty-crm logs (2m ago), switched to Hosting (5m ago).
> When the user says "this" / "it" / "the URL" — assume they mean
> something visible in the current viewport unless they name something else.
**Effort:** ~3 hr. ~1 hr to wire the chat panel's
`uiContext` collection (existing route + tab state, last 5 actions
from a small ring buffer in the panel), ~1 hr to plumb through the
chat API, ~1 hr to add the prompt block.
**Slot into:** Phase 3 (UX surfaces) — pairs with 3.2 (structured
errors in chat) and 3.3 (empty-state nudges).
---
## Gap 4 — Diff preview / accept-changes gate
**Failure observed:** none yet, but the surface is exposed today —
`fs_edit` writes directly to `/workspace` in the dev container. For
ephemeral exploration this is correct (sub-second iteration is the
whole Path B point). For changes destined to ship, the user has no
review surface; they only see what changed after the AI summarizes.
**Why this matters for beta:** the moment a paying user wants to
"see what the AI changed before it goes live," there's nothing to
show them. Cursor's whole UX is built on diffs the user accepts.
**Proposed fix:** two-mode `fs_edit` / `fs_write`:
1. **Direct mode (default for dev container):** write immediately. Current
behavior. Fine for "make the button blue" iteration.
2. **Staged mode (default when `ship` is the next likely action):**
write to a shadow path, surface a diff in the chat UI, gate the
real write on a one-click "Accept" button.
The model decides which mode based on context — or simpler: stage when
the file is in a "protected" set (e.g. `prisma/schema.prisma`,
`Dockerfile`, `package.json`, anything in `prod/` or `migrations/`),
direct otherwise.
**Effort:** ~6 hr. ~2 hr backend (shadow write + apply endpoint),
~3 hr UI (diff renderer in the chat panel, accept/reject buttons),
~1 hr prompt + tool changes.
**Slot into:** Phase 4 (Onboarding & safety) — pairs with 4.5 (auth
hardening) and 4.6 (compute quotas) as part of "what a stranger
needs day 1."
---
## Suggested sequencing
If we ship in priority order:
1. **Gap 1 first** — kills the worst pattern in prod for ~2 hr of work. Should be ahead of any new feature in Phase 2.
2. **Gap 2 second** — closes the verify-deploy loop. Multiplies the value of every subsequent AI-shipped change because it's no longer blind.
3. **Gap 3 third** — tighter conversational UX. Once 1 and 2 work, the remaining UX cliff is "AI doesn't know what I'm looking at."
4. **Gap 4 last** — only matters once we have paying users editing prod-bound code. Pre-beta optional.
Total effort to ship 1+2+3 (the meaningful UX wins): **~9 hours.**
---
## How this changes BETA_LAUNCH_PLAN.md
Two new tasks slot in:
- **P2.8** Tool-error recovery middleware (Gap 1) — block on nothing, ship before P2.4.
- **P2.9** Browser-driver MCP tool (Gap 2) — block on nothing.
One new task in P3:
- **P3.7** UI-state injection into chat (Gap 3) — block on nothing.
Gap 4 stays out of beta scope unless eval reveals real damage from
unstaged edits.
- The chat loop (`app/api/chat/route.ts`) acts as a robust harness that intercepts tool errors and automatically suggests recovery paths (e.g., port conflicts, container collisions).
- The maximum tool execution loop is capped (`MAX_TOOL_ROUNDS=30`) to prevent runaway AI loops.
- `fs_edit` uses line-number replacements alongside strict `oldString` matching to avoid Aider-style search-and-replace failures.
- Sentry and Coolify deployment webhooks automatically pipe deployment/build failures back to the user/AI.

View File

@@ -1,288 +1,12 @@
# Path B Execution Plan — Persistent Dev Container Architecture
# AI Path B (Shipped)
> The plan to replace Vibn's current "API-wrap-every-Coolify-action" agent
> surface with a Claude-Code-style architecture: one persistent dev
> container per Vibn project, ~10 composable tools, sub-15-second
> iteration, and Coolify only touched at "ship it" time.
>
> **Companion to:** [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (current
> state) and [`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md)
> (everything else).
>
> **Status:** week 1 shipped (2026-04-28). Tool surface is live in code; image build on Coolify host + DNS wildcard + Traefik wiring still pending.
>
> **Why this exists:** today's AI loop is *37 min to first preview, 24
> min per iteration*, because every change goes through a Coolify nixpacks
> build. That UX cannot host the marketplace / SaaS / iterative-build
> stories Vibn is selling. Path B fixes the floor.
> **Note:** This document outlines the architecture for "Path B", which shifted the AI's execution context from Cloud Run to persistent per-project Docker containers hosted on the Coolify server. This architecture was fully successfully shipped in May 2026.
---
## Architecture
- Every project has a persistent Gitea repository.
- Every project gets a single `vibn-dev` container provisioned as a Coolify service (`ensureDevContainer`).
- The AI runs its tools (like `shell_exec` and `fs_*`) *inside* this container using `docker exec` via the Coolify API.
- Dev servers (like `npm run dev`) bind to `0.0.0.0:3000` and are exposed to the internet via Traefik wildcard subdomains (`*.preview.vibnai.com`).
- When the user is ready, the code is committed to Gitea and deployed to production via `apps_deploy`.
## 1. The user experience this unlocks
Reference scenario: a non-technical founder chats *"build me a
two-sided marketplace for handmade ceramics."*
| Phase | Path A (today) | Path B (target) |
|---|---|---|
| Discovery & OSS pick | OK | OK |
| Fork an OSS base (e.g. Sharetribe, 800 files) | ~15 min of single-file commits, 800 webhook fires | `git clone` in 8s |
| First live preview | 37 min (Coolify build) | ~30s (Vite HMR in dev container) |
| Each iteration | 24 min (rebuild) | 315s (HMR / process restart) |
| User makes 10 small decisions | ~40 min of staring at spinners | ~3 min of conversation |
| "Ship it" → real domain | already 3 min | 3 min (unchanged — this is the only Coolify build) |
| Total time to live, polished marketplace | 3060 min, often abandoned | ~20 min, mostly the user thinking |
The asymmetry is structural, not optimisable inside Path A.
---
## 2. Architecture overview
```
┌──────────────────────────┐ ┌────────────────────────────────┐
│ vibnai.com chat (user) │ ←→ │ /api/mcp │
└──────────────────────────┘ │ ├ shell.exec │
│ ├ fs.read / fs.edit / fs.glob │
│ ├ dev_server.start │
│ ├ ship │
│ └ apps.* / databases.* / ... │
└────────────┬───────────────────┘
▼ (workspace-scoped)
┌────────────────────────────────────┐
│ Per-Vibn-project Coolify project │
│ ├ vibn-dev ← dev container │
│ ├ web ← prod app │
│ ├ db │
│ └ ... │
└────────────────────────────────────┘
```
### Per-project dev container — the only new piece
For every active Vibn project, we run **one long-lived Coolify
service named `vibn-dev`** inside that project's dedicated Coolify
project (Stage 2/3 of per-project isolation already shipped).
| Property | Value |
|---|---|
| **Image** | `ghcr.io/vibnai/vibn-dev:latest` (we build & maintain) |
| **Base** | Ubuntu 24.04 |
| **Pre-installed** | Node 20, bun, pnpm, Python 3.12 + uv, Go 1.23, Rust, git, gh, `tea` (Gitea CLI), ripgrep, fd, jq, curl, tar, openvscode-server |
| **Default `cwd`** | `/workspace` (persistent volume containing the Gitea working tree) |
| **Persistent volumes** | `/workspace` (git tree), `/cache/{npm,pip,go,cargo}` (package caches) |
| **Resource floor** | 512 MB / 0.25 CPU when idle |
| **Resource ceiling** | 4 GB / 2 CPU during builds (configurable per workspace plan) |
| **Idle suspend** | After 30 min no `shell.exec` activity |
| **Re-wake** | Any `shell.exec` / `fs.*` / `dev_server.*` call |
| **Ports** | 30009999 reserved for the AI's dev server, exposed at `https://preview-{ws}-{project}.vibnai.com` via Traefik wildcard |
| **Tenancy** | Inherits per-project Coolify isolation — workspace can never reach into another's dev container |
### Why this shape (and not e2b / Cloud Run / VM-per-task)
- We already have Coolify, per-project Coolify projects, and Coolify
exec primitives. Adding one service per project is zero new infra.
- Persistence (workspace state, package cache, git working tree)
matters more than per-task isolation for our user. Founders return
to projects across sessions.
- Tenant safety is already solved at the Coolify-project layer.
- Cost stays bounded: one container per *active* project, idle-suspended.
- Upgrade path to e2b / Firecracker exists later if needed (replace the
executor, keep the tool surface).
---
## 3. Tool surface
### New tools (the AI's primary working set)
| Tool | Signature | Purpose |
|---|---|---|
| `shell.exec` | `{ cmd, cwd?, timeoutSec?, env? }` | Run any shell command in the dev container. Streams stdout/stderr back. Capped 15 min. |
| `fs.read` | `{ path, ref? }` | Read a file (or directory listing) from `/workspace`. |
| `fs.write` | `{ path, content }` | Create/overwrite a file. |
| `fs.edit` | `{ path, oldString, newString, replaceAll? }` | Aider-style search/replace. Fails if `oldString` not found / not unique. |
| `fs.glob` | `{ pattern, cwd? }` | List files matching a pattern (e.g. `**/*.tsx`). |
| `fs.grep` | `{ pattern, glob?, contextLines? }` | ripgrep-backed code search. |
| `fs.delete` | `{ path }` | Delete a file or directory. |
| `dev_server.start` | `{ cmd, port, name? }` | Start a long-running process (e.g. `npm run dev`). Returns a public preview URL. |
| `dev_server.stop` | `{ id }` | Kill a dev server. |
| `dev_server.list` | — | What's running, on what URL. |
| `ship` | `{ projectId, commitMsg, deploy? }` | `git add . && git commit && git push` to Gitea, then trigger Coolify deploy of the prod app. The "graduate to production" tool. |
### Kept (orchestration — these are correctly modeled as APIs)
- `apps.*` — Coolify app CRUD, logs, domains, env vars, etc.
- `databases.*`, `auth.*`, `domains.*`, `storage.*` — infrastructure primitives.
- `projects_get`, `projects_list`, `workspace_describe` — context.
- `github_search`, `github_file`, `http_fetch` — external lookup.
### Deprecated (kept for back-compat, banner in docs)
- `gitea_file_read`, `gitea_file_write`, `gitea_file_delete`,
`gitea_branches_list`, `gitea_branch_create`,
`gitea_repo_create`, `gitea_repo_get`, `gitea_repos_list` — the
AI uses `shell.exec` (`git`/`tea` CLI) and `fs.*` instead.
- `apps.exec` — kept (it's still useful for prod-container debugging),
but deprecated for *dev-time* code work.
**Net change:** 53 tools → ~30 tools, but the new ones compose to do
everything the old ones did and more.
---
## 4. The system prompt rewrite
The AI's prompt today says *"call gitea_file_write to push code."* It
becomes:
> You have a real Linux dev environment for this project at `/workspace`.
> Use `shell.exec` to run any command (npm, git, tea, python, anything).
> Use `fs.edit` for surgical changes, `fs.write` for new files.
>
> Standard loop:
> 1. `shell.exec { cmd: "git status" }` to see what's there.
> 2. Edit / create files via `fs.edit` / `fs.write`.
> 3. `shell.exec { cmd: "npm test" }` (or relevant test runner).
> 4. `dev_server.start` to give the user a live preview URL.
> 5. When the user says "ship it", call `ship` — that pushes and
> triggers the production Coolify deploy.
>
> NEVER call `apps_create` to deploy code that hasn't been tested via
> `shell.exec` first. The dev container is your safety net.
---
## 5. Week-by-week execution
### Week 1 — Foundations (dev container + shell) — **SHIPPED 2026-04-28**
**Goal:** AI can clone a repo, install deps, run a script.
- [x] `vibn-dev/Dockerfile` (Ubuntu 24.04 + git + ripgrep + python3 + mise lazy toolchains). `setup-on-coolify.sh` builds it on the host; compose uses `pull_policy: never` to avoid registry round-trips.
- [x] `lib/dev-container.ts`: ensure / exec / suspend / resume helpers. Backed by `fs_project_dev_containers` (auto-created).
- [x] `devcontainer.{ensure,status,suspend}` MCP tools.
- [x] `shell.exec` + `fs.{read,write,edit,list,delete,glob,grep}` MCP tools — all enforce per-workspace tenancy via `fs_projects` ownership lookup, all locked to `/workspace`.
- [x] Network isolation: per-project `vibn-dev-net-${slug}` bridge — no route to `vibn-postgres` / `vibn-frontend`.
- [x] Kill switch: `/api/admin/path-b/{disable,enable}` flips a feature flag in <10s.
- [x] `vibn-tools.ts`: 11 new Gemini tool defs, smoke test passes (63 tools accepted).
- [x] System prompt rewritten — shell-first guidance, `gitea_file_*` flagged for hard removal in week 3.
**Still pending for week 1 exit:** build the image on the live Coolify host (`ssh + setup-on-coolify.sh`), end-to-end verify `devcontainer.ensure → shell.exec ls` against a real project once the frontend deploy lands.
### Week 2 — Preview URLs + iteration — **PARTIALLY SHIPPED 2026-04-28**
**Goal:** AI starts a dev server, user clicks a preview URL, sees their app.
- [ ] DNS: `*.preview.vibnai.com → coolify-host-ip` in OpenSRS. **Manual step, not yet done.**
- [ ] Traefik wildcard cert via DNS-01 against OpenSRS. **Config staged in `vibn-dev/PREVIEWS.md`, not yet applied to live Traefik.**
- [x] `dev_server.{start,stop,list,logs}` MCP tools. Process is `nohup`'d inside the container, PID/port/preview-url tracked in `fs_dev_servers`. Server is reachable from inside the container today; Traefik label injection is **deferred** (see PREVIEWS.md for the recommended pre-allocated-port-range approach).
- [x] `fs.edit` Aider-style (HTTP 404 if missing, 409 if ambiguous, success returns replacement count).
- [x] Per-container CPU/RAM caps: 1 vCPU / 1 GiB by default. Tier scaling via env var.
- [x] System prompt rewritten with shell-first recipe.
**Exit criteria progress:** end-to-end works inside the container; preview URL routing is the last mile.
### Week 3 — Ship-it path + cleanup — **PARTIALLY SHIPPED 2026-04-28**
**Goal:** the dev container's working tree graduates to production.
- [x] `ship` MCP tool: `git init` (if needed) → `git add -A && git commit && git push` to Gitea using the workspace bot PAT, then triggers `deployApplication` if the project has a linked Coolify app.
- [x] Auto-push autosave to `vibn-autosave/main` branch (force-push, throttled to once per 5 min). Endpoint: `POST /api/admin/path-b/autosave { projectId | sweep:true }`.
- [x] Idle-suspend sweep: `POST /api/admin/path-b/idle-sweep[?minutes=30]`. Wire to a 5-min cron once we trust the suspend path.
- [ ] Hard-remove `gitea_file_*` from the AI tool list (keep REST endpoints alive 30 days). **Deferred to next week so we can A/B the new tools first.**
- [ ] Update `AI_CAPABILITIES.md`. **Deferred — will rewrite once eval data is in.**
**Exit criteria progress:** ship loop is functionally complete. Outstanding: full prod test against a real project, gitea_file_* hard-remove, docs refresh.
### Week 4 — Eval, polish, IDE drop-in
**Goal:** measure that this actually delivers the promised UX, ship the optional graduation path.
- [ ] **Eval harness:** 10 reference prompts (TODO app, marketplace, blog with auth, kanban, image-uploader, AI chatbot, simple e-commerce, dashboard, REST API + DB, static site). Measure: time-to-first-preview, time-to-shipped, AI tool-call count, success rate. Compare to a baseline run on Path A.
- [ ] **Theia drop-in:** expose openvscode-server (already in the image) at `https://ide-{ws}-{project}.vibnai.com`. Optional toggle in chat UI: "Open IDE." Lets a user-becoming-developer drop into the same `/workspace` the AI's been editing.
- [ ] **Bug fixes** found during eval.
- [ ] **Docs:** update Vibn's user-facing pages to reflect the new "describe → live preview in seconds → iterate → ship" flow.
**Exit criteria:** eval shows ≥3× speedup on time-to-first-preview vs.
Path A, ≥80% success rate on the 10 reference prompts.
---
## 6. OSS we will lean on (not reinvent)
| Need | OSS choice | Notes |
|---|---|---|
| Dev container image base | Ubuntu 24.04 + toolchains | We bake & maintain. ~1 GB. |
| In-browser IDE (week 4 graduation path) | `openvscode-server` (`gitpod-io/openvscode-server`, MIT) | Pre-installed in the image. Optional toggle. |
| Edit format | **Aider's search/replace block format** (`Aider-AI/aider`, Apache 2.0) | Borrow the format + error semantics. |
| Process supervision inside the container | `tini` (already standard) + a tiny in-house supervisor for `dev_server.*` | No need for full systemd. |
| Code search inside the container | `ripgrep` (`BurntSushi/ripgrep`, MIT) | Pre-installed. `fs.grep` is a thin wrapper. |
| Git inside the container | `git` + `tea` (Gitea CLI, MIT) | `tea` lets the AI do PR ops without us building gitea_pr_* tools. |
| Reference for end-to-end agent loops | `All-Hands-AI/OpenHands` (MIT) | Read their runtime + tool design. Don't import their code. |
| Reference for fast iteration UX | `bolt.new` (`stackblitz/bolt.new`) | UX north star, not a code source. |
---
## 7. Risks & open questions
| Risk | Mitigation |
|---|---|
| **Dev containers eat money.** 100 active projects × 24/7 = ~$50/mo wasted. | Idle-suspend after 30 min. Resume in <5s. Per-plan caps. Auto-delete suspended-and-untouched volumes after 30 days. |
| **`shell.exec` is the universal escape hatch — security?** AI inside a single workspace's container can do anything that container can do. | (a) Per-project Coolify isolation. (b) **Network policy: dev containers have NO route to internal Vibn services (vibn-postgres, vibn-frontend, Coolify control plane). Implemented via Docker network rules in week 1, not deferred.** (c) Audit log on every `shell.exec` call. (d) Per-container CPU/RAM caps absorb fork-bomb / coin-mining attempts. |
| **Preview URL leaks.** `https://preview-mark-ceramic-market.vibnai.com` is publicly resolvable. | Default: random suffix in subdomain (`preview-mark-ceramic-market-7a3f.vibnai.com`) — ~64 bits of unguessability. Optional Vibn-session-cookie auth as paid-tier feature later. |
| **Hot reload through Traefik.** WebSocket / HMR can be finicky over a reverse proxy. | **Spike on week 1, day 1**: bring up a Vite dev server inside vibn-dev, expose via Traefik, edit a file, verify HMR fires. Failure here is the biggest "things look fine until you actually test" risk; de-risk early. |
| **Image size / pull time on first project.** ~1 GB pull adds 3060s to first dev container spin-up. | (a) Pre-pull image on every Coolify host on deploy. (b) **Keep base image small (~500 MB: OS + git + ripgrep + supervisord + IDE server). Lazy-install language toolchains via `mise` on first project use.** Prevents the image from bloating to 4 GB six months from now. |
| **Dependency cache poisoning.** Cached `node_modules` from project A bleeds into project B. | Caches are per-project (volume `vibn-dev-cache-{projectId}`). Never share. Take the slower-first-install hit; add a Verdaccio mirror later only if it bothers anyone. |
| **AI keeps calling `gitea_file_*` instead of `shell.exec`.** | **Hard removal from AI's tool list in week 3, not soft deprecation.** Keep REST endpoints alive for a 30-day grace period for any external MCP client. After 30 days, return 410 Gone. The AI has no muscle memory; no graceful migration needed. |
| **What if the user has no Vibn project yet?** | First chat creates a project + provisions its Coolify project + spins up `vibn-dev` lazily. ~10s overhead, one-time. Stream progress to the chat ("creating workspace... installing tools..."). Same UX bolt.new uses while WebContainers boot. |
| **Coolify host disk dies → users lose unshipped `/workspace` work.** | **Auto-push to Gitea `vibn-autosave/main` branch every 5 min of activity, plus before idle-suspend.** Treat Gitea as canonical, container disk as ephemeral. Built in week 1, day 2 (not optional). |
| **Path B turns out to be wrong; we need to revert.** | **Kill-switch admin endpoint (`POST /api/admin/path-b/disable`) flips a feature flag — all new chat sessions go back to Path A; existing dev containers drain.** ~10-min revert window. Built week 1. |
---
## 8. Success metrics
We're not done until **all four** are true on the eval harness:
| Metric | Target | Today (Path A) |
|---|---|---|
| Time-to-first-preview (10 reference prompts, p50) | ≤ 60 s | ~5 min |
| Iteration loop (small edit → user sees change) p50 | ≤ 15 s | ~3 min |
| Tool calls per "build me X" task (median) | ≤ 30 | ~80 |
| End-to-end success rate (live deployable result) | ≥ 80% | ~50% |
---
## 9. What this changes about the existing roadmap
- **Tier 1.5 ("Code authoring capability") is collapsed into this doc.** C1C9 mostly disappear (replaced by `shell.exec` + `fs.edit`); C10 ("persistent agent dev workspace") **is** Path B.
- **Tier 1 P5.1P5.4 are unchanged.** Domains, email, storage, workers — still the right next infra primitives. Path B doesn't replace them; it makes the AI capable enough to actually use them.
- **Tier 2 P6.x** (backups, runtime logs, scoped keys) — unchanged.
- **`gitea_*` tools shipped 2026-04-28** are now legacy. Mark deprecated in week 3. Remove in a future cleanup once telemetry confirms zero usage.
---
## 10. Decision needed before week 1 starts
1. **Approve Path B as the primary architecture for code authoring.** (If no, this doc dies here.)
2. **Approve the dev-container-as-Coolify-service implementation choice.** Alternatives: separate dev-host, e2b self-host, Cloud Run jobs. Picked Coolify-service for zero new infra; flag if you want to revisit.
3. **Approve the deprecation of `gitea_file_*` tools.** They were shipped today; deprecating them within 3 weeks is fine if the path forward is clearer, embarrassing if we keep them around as half-working alternates.
4. **Approve the resource cap defaults** (free: 1 GB / 0.5 CPU, paid: 4 GB / 2 CPU). Or set different numbers.
Once those four are decided, week 1 starts.
---
## How to use this doc
- This is the *architectural* execution plan. The detailed task list
goes into the agent's TodoWrite per-week, not into this file.
- When an item ships, **move it from "planned" to "shipped"** in
[`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) and link the commit/PR.
- When a risk in §7 turns out to be real, document the mitigation
outcome inline so future readers see what actually happened.
- This doc supersedes the proposed Tier 1.5 in
[`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md). Add a
one-line pointer there once approved.
*(Refer to `lib/ai/vibn-tools.ts` and `app/api/mcp/route.ts` for the live implementation).*

View File

@@ -1,275 +1,11 @@
# Project Page Architecture — Product / Infrastructure / Hosting
# Project Page Architecture
> The plan to collapse the 16-page sidebar mess at
> `/[workspace]/project/[projectId]/*` into 3 founder-friendly
> sections, and to make `/project/<id>` actually reflect what the AI
> is doing in the dev container instead of stale Gitea/prod-Coolify
> data.
>
> **Companion to:** [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md)
> (Path B is the engine; this doc is the dashboard for it).
>
> **Status:** week 1 doc + home-page redesign in flight (2026-04-28).
> **Note:** The UI was heavily refactored. The primary surfaces for a project are now:
---
1. **The Plan Tab (`/plan`):** Contains the project's vision/objective document, tasks, decisions, and raw ideas. The AI acts as a scribe here.
2. **The Product Tab (`/product`):** Lists the live codebases (Gitea) and running images (Docker containers).
3. **The Infrastructure Tab (`/infrastructure`):** Lists the underlying resources (PostgreSQL databases, Redis, etc.) managed by Coolify.
4. **The Hosting Tab (`/hosting`):** Lists live runtime environments, logs, and preview URLs.
5. **The Chat Panel:** Available on all project surfaces as a slide-out, used to orchestrate work.
## 1. Why this exists
Today the project page (`/[workspace]/project/[projectId]`) shows two
tiles — Code + Infrastructure — and links to a sidebar with 16
sub-routes (`build`, `run`, `infrastructure`, `deployment`,
`overview`, `insights`, `analytics`, `prd`, `tasks`, `settings`,
`assist`, `design`, `growth`, `grow`, `mvp-setup`, `code` — the last
of which doesn't exist as a route, so the home tile is a dead link).
Two structural problems:
1. **The sidebar grew without an anchor concept.** Founders have no
mental model of what the 16 pages map to; they just see a list
and click around hoping for the right one. Half the pages are
placeholders ("Coming soon"); the rest overlap.
2. **None of the data sources have been updated for Path B.** The
Code tile reads the Gitea repo (production master branch), but the
AI now writes to the dev container's `/workspace`, often without
pushing for hours. The Infrastructure tile reads production
Coolify apps; new `dev_server.start` previews don't show up
anywhere. So when AI does great work in chat, the project page
doesn't update — the user has to tab back to chat to see anything.
---
## 2. The framing
Three sections, founder-friendly names, every project on Vibn maps
cleanly into all three:
| Section | What it is | Founder asks… |
|---|---|---|
| **Product** | Custom code, design, content built for THIS vision | *"What did I build?"* |
| **Infrastructure** | Reusable, swappable third-party services (auth, db, email, payments…) | *"What do I depend on?"* |
| **Hosting** | Where the product runs and how people reach it (Coolify, domain, observability, cost) | *"Where does it live?"* |
### The boundary rule
> **Custom code = Product. Third-party service = Infrastructure.**
> Runtime + reachability = Hosting.
Concrete edge cases:
- A custom `/api/upload` endpoint that calls S3 → endpoint is
**Product**, S3 bucket + credentials are **Infrastructure**.
- Custom job that sends a welcome email → job is **Product**, the
job runner (Sidekiq/BullMQ) and email service (Resend) are
**Infrastructure**.
- Webhook handler that processes Stripe events → handler is
**Product**, Stripe is **Infrastructure**.
- Coolify scheduled task that runs your code → your code is
**Product**, Coolify itself is **Hosting**.
---
## 3. Charters
### Product
Everything custom-built for this specific vision. The unique IP that
wouldn't exist without this product.
**Includes:**
- Frontend web app
- Marketing site
- Custom backend code & APIs
- Custom business logic
- Custom jobs / runners (the code, not the runner)
- Brand, copy, design system
- The repository itself
- Customer base — the actual users you've earned
**Rule:** if you wrote it for this product, it's Product. If it's
`node_modules` or a third-party SDK, it's not.
### Infrastructure
The reusable, swappable services your product depends on. The
annoying multi-vendor world where you have to pick a provider.
**Includes:**
- Auth provider (Clerk, Pocketbase, Authentik, Google OAuth, …)
- Database (Postgres, MySQL, MongoDB, Redis, …)
- File storage (S3, R2, MinIO)
- Email (Resend, SendGrid, SES)
- Payments (Stripe, Paddle, Lemon Squeezy)
- Analytics (Plausible, PostHog, GA)
- Search (Algolia, Meili, Typesense)
- LLM provider (OpenAI, Anthropic, Gemini, Vertex)
- Queues, maps, SMS, push notifications, …
- Secrets and API keys that wire all of the above
**Rule:** if you could swap the vendor without changing your product
code, it's Infrastructure.
### Hosting
Where the product physically runs and how people reach it.
**Includes:**
- Container runtime (Coolify in our case)
- Domain + DNS + SSL
- CDN / edge
- Observability (logs, errors, uptime)
- Backups
- Monthly cost
**Rule:** it's about *runtime and reachability,* not about what the
software does.
---
## 4. Future sections (deferred)
Add as separate top-level cards once they become real concerns:
- **Models** — for AI-heavy products: which LLMs, which embedding
model, prompt versions, eval scores, cost-per-call.
- **Analytics** — when there are real users worth measuring.
- **Marketing** — campaigns, blog, SEO, social, when there's a
growth motion.
- **Compliance** — Terms, Privacy, GDPR, SOC2, when shipping to
paying customers.
- **Support** — helpdesk, chat, status page, when there are
customers complaining.
- **Team** — when the project has more than one collaborator.
Same charter template each time. Same rule: code = Product,
swappable = Infrastructure, runs/reachable = Hosting, otherwise it
needs its own section.
---
## 5. Mapping today → tomorrow
| Today's page | Where it goes | Notes |
|---|---|---|
| `(home)/page.tsx` | New `(home)/page.tsx` (3-card grid) | Full redesign |
| `code` (404) | `product/` (new) | Stub the route, point home tile at it |
| `build` | Subroute under `product/files` (later) | Heavy 1626 lines; preserve the file tree component |
| `run` | `hosting/` | Production runtime |
| `infrastructure` | `hosting/` | Same data, different name |
| `deployment` | `hosting/deploys` (later) | Deploy history is Hosting |
| `overview` | Subroute under `product/` or merged into home | Decide once we see how home feels |
| `prd` | Subroute under `product/` (vision) | Or its own "Define" section if we add one |
| `tasks` | Subroute under `product/` (roadmap) | Or its own section later |
| `assist` | `product/` (it's emails/chat your product sends) | These ARE product features |
| `design` | `product/design` | Custom for this vision |
| `growth`, `grow`, `analytics`, `insights`, `mvp-setup` | Defer, probably absorbed into a future "Analytics" or "Marketing" section | Many are placeholders today |
| `settings` | Top-right gear (lives outside the 3 sections) | Project-level meta |
**Net:** 16 routes → 3 sections (+ settings). 8+ pages get rationalized
into nothing because they were duplicating their neighbors.
---
## 6. Phased delivery
### Phase 1 — Tab navigation + section stubs (this session)
The three sections are TABS at the project level, not a card-grid
landing page. A founder lands on the project URL and is immediately
inside Product (the default tab); flipping to Infrastructure or
Hosting is one click and stays in the same view. No
intermediate "click a tile to drill in" step.
URL shape:
```
/[workspace]/project/[id] → 308 redirect to /product
/[workspace]/project/[id]/product → Product tab
/[workspace]/project/[id]/infrastructure → Infrastructure tab
/[workspace]/project/[id]/hosting → Hosting tab
```
A shared layout at the project root renders:
- Project header (name, vision, stage pill, settings gear)
- Tab bar (Product · Infrastructure · Hosting) — active tab
highlighted; each tab carries a tiny status dot (green/amber/grey)
- Slot for the active tab's page
The current `(home)/page.tsx` (the two-tile landing) is replaced by
the redirect.
**Don't kill anything in `(workspace)/`.** Existing 16 routes stay
alive while we migrate. Sidebar still works for them.
### Phase 2 — Wire data sources
- **Product card** reads from the dev container's `/workspace`:
- File count + recent edits via `fs.list` against the project's
dev container
- User count from the project's auth provider (Pocketbase /
Clerk / etc.)
- Frontend URL from `dev_server.list` or production `apps_list`
- **Infrastructure card** reads from Coolify databases, env vars,
and known integrations:
- Database type + size
- Auth provider name
- Wired services (any env var matching `STRIPE_*`, `RESEND_*`,
etc.)
- **Hosting card** reads from Coolify apps + domains + container metrics:
- Production URL, SSL status, last deploy
- Monthly cost (Coolify resource usage × pricing)
- Recent error count (from logs)
### Phase 3 — Section detail pages
Build each of `/product`, `/infrastructure`, `/hosting` as a real,
useful surface. Each page can have internal subnav for the bits
listed in its charter (e.g., Product has Frontend, Backend, Jobs,
Brand, Customers; Infrastructure has Auth, DB, Storage, Email,
Payments, …).
### Phase 4 — Migration / deletion
Once the new structure is proven, redirect the legacy routes:
- `code``product`
- `build``product/files`
- `run``hosting`
- `infrastructure``hosting`
- `deployment``hosting/deploys`
- `prd`, `tasks`, `assist``product/...`
- `growth`, `grow`, `analytics`, `insights`, `mvp-setup` → soft-delete
with a tombstone redirect to `product` or to a future section page.
---
## 7. Open questions
- **Where do the chat threads live?** They're a per-project
conversation surface today (right rail in the chat panel). I'd
argue they're not a section — they're *across* sections, like the
AI is. Keep as the persistent right rail.
- **Settings is technically project-level meta**, not one of the
three sections. Where does it surface? Gear icon in the page
header, opens settings as a side sheet or as a separate route.
Decide when we get there.
- **Mobile layout** — three cards stack vertically; no special
layout needed. The section detail pages need a layout pass when
we get to phase 3.
---
## 8. Success criteria
You should be able to look at `/project/<id>` after AI activity in
chat and immediately see:
1. *"What did the AI just build?"* → Product card updated count of
files + recent diffs.
2. *"What's it depending on?"* → Infrastructure card shows the new
Postgres, the new Stripe key, etc.
3. *"Is it live?"* → Hosting card shows the dev preview URL or the
production URL with status.
If any of those three answers requires going back to the chat or
checking another page, the redesign hasn't worked.
*(Refer to `vibn-frontend/app/[workspace]/project/[projectId]` for the UI implementation).*

View File

@@ -1,258 +1,9 @@
# Sentry-as-Product — Proposal
# Sentry as a Product (Shipped)
> Today's Sentry wiring catches errors in **the Vibn platform**.
> The bigger opportunity is wiring Sentry into **every project Vibn
> ships**, then feeding those errors back into the user's AI chat.
> Difference between "an AI that codes" and "an AI that owns the
> product."
> **Note:** This spec was implemented in May 2026.
## TL;DR
Today, when a Vibn user's deployed app crashes for real users:
```
real user → site 500s → user closes tab, never tells founder
→ founder finds out hours/days later (or never)
→ AI in Vibn chat has zero idea anything is wrong
```
The fix is to make every Vibn project ship with Sentry pre-wired,
then expose the error feed to the AI as a tool. Total effort:
**~8 hours**, in 4 stages, each independently shippable.
| Stage | Capability | Effort | Unlocks |
|---|---|---|---|
| 1 | Auto-provision a Sentry project per Vibn project on first deploy | ~3 hr | Real-user errors captured at all |
| 2 | Bake Sentry into every scaffold template | ~2 hr | Capture works without user setup |
| 3 | Add `project_recent_errors` MCP tool for the AI | ~2 hr | AI can answer "is anything broken?" |
| 4 | Auto-surface unresolved errors at chat-turn start | ~1 hr | AI proactively offers fixes |
Total: **~8 hr**, no new infra (we already have Sentry org access,
Coolify env API, scaffold templates, MCP tool registry).
---
## Why this is the right next investment
### The current loop is broken at the seam between user and platform
Vibn's value proposition is "the AI is your technical co-founder."
That promise breaks the moment the AI's last commit causes a real
user error and the AI doesn't know about it. The current loop:
```
1. User describes feature in chat
2. AI ships code
3. AI says "deployed, give it a try"
4. (silence)
5. Real users hit edge cases → 500s → bounce
6. Founder eventually notices via support ticket / analytics dip
7. Founder pastes error back to AI
8. AI fixes
```
Steps 46 are dead air for the founder, **and the AI cannot help
during them.** This is the gap that separates Vibn from "any IDE
with an LLM."
### What it looks like with this proposal shipped
```
1. User describes feature in chat
2. AI ships code
3. AI says "deployed, give it a try"
4. Real users hit edge cases → 500s → Sentry captures
5. (Founder opens Vibn chat 3 hrs later for unrelated reason)
6. AI: "Hey — checkout has 500'd for 3 users in the last hour
because `customer.email` is undefined on
app/checkout/route.ts:47. Want me to fix it?"
7. AI fixes, deploys, marks issue resolved in Sentry
```
The AI becomes the on-call engineer. This is what "technical
co-founder" actually means and we are 8 hours away from it.
### Why now (not Phase 4)
- The Sentry wiring we just shipped for vibn-frontend gave us:
- A working Sentry org (`vibnai`)
- An auth token with project-management scope
- Verified knowledge that the build args / source maps flow works
- A working `withSentryConfig` recipe in `vibn-frontend/next.config.ts`
- All of those are reusable for stage 1 and 2 of this proposal.
- Doing this **before** the beta means user projects start emitting
error data on day one, so by the time we're debugging real beta
user pain, we have a month of history to reason about.
- Doing it after the beta means we'd have to retroactively
instrument projects that have already been deployed for weeks.
---
## Stage 1 — Auto-provision a Sentry project per Vibn project (~3 hr)
**Goal:** when a user creates a Vibn project, the platform creates a
matching Sentry project under the `vibnai` org and stashes the DSN
+ auth token in Coolify env vars on the user's app.
**What gets built:**
1. **A `provisionSentryProject(projectId, name)` helper** in
`vibn-frontend/lib/integrations/sentry.ts`. Calls Sentry's
`POST /api/0/teams/vibnai/{team}/projects/` with the project
slug, returns the DSN.
2. **Hook into project-create flow** — on first successful deploy,
call the helper and write the resulting DSN + auth token into
Coolify env vars (`NEXT_PUBLIC_SENTRY_DSN`,
`SENTRY_AUTH_TOKEN`) for that app via the same Coolify API we
used today.
3. **Idempotency** — if the Sentry project already exists, fetch
its DSN instead of creating a duplicate. Same project name
convention every time: `vibn-{workspace}-{projectSlug}`.
4. **Storage** — store `sentryProjectSlug` and `sentryAuthTokenId`
on the Postgres `projects` row so we can look them up later
without re-walking the Sentry org.
**Risk:** Sentry's API rate-limits team-project creation. We bypass
this by reading-before-writing, so the only API cost on subsequent
deploys is one GET.
**Definition of done:** create a fresh Vibn project → check Sentry
org → see a project named `vibn-{ws}-{slug}` → check Coolify env on
that app → see DSN populated.
---
## Stage 2 — Bake Sentry into every scaffold template (~2 hr)
**Goal:** every Next.js / Vite / etc. starter template Vibn ships
already has Sentry wired up. User does nothing.
**What gets built:**
1. **For each scaffold template in `vibn-frontend/lib/scaffold/`**,
add the same files we shipped today:
- `instrumentation.ts`
- `instrumentation-client.ts`
- `app/global-error.tsx` (Next.js) / equivalent boundary (Vite)
- `next.config.ts` wrapped with `withSentryConfig` (Next.js)
- `vite.config.ts` with `sentryVitePlugin` (Vite)
- `Dockerfile` ARG declarations for `NEXT_PUBLIC_SENTRY_DSN` +
`SENTRY_AUTH_TOKEN`
2. **Add `@sentry/nextjs` (or `@sentry/react` + `@sentry/vite-plugin`)
to each template's `package.json` `dependencies`.**
3. **Document in template README** that Sentry is pre-wired and the
user doesn't need to do anything.
**Risk:** Sentry's wrapper sometimes interacts badly with custom
build configs (e.g. monorepos, custom webpack rules). Mitigation:
the `errorHandler` we set today (`console.warn` instead of throw)
ensures source map upload failures don't break builds.
**Definition of done:** scaffold a fresh Next.js project from Vibn
templates → deploy → throw a test error → see it in Sentry,
de-minified.
---
## Stage 3 — Expose error feed to the AI as MCP tools (~2 hr)
**Goal:** the AI can ask Sentry "what's broken in project X?" and
get a real answer.
**What gets built:**
Three new MCP tools in `vibn-frontend/lib/ai/vibn-tools.ts`:
1. **`project_recent_errors { projectId, since?, limit? }`**
- Returns: `[{ id, title, count, lastSeen, culprit, level }]`
- Default `since`: 24h. Default `limit`: 10.
- Filters to unresolved issues only.
- Implementation: read `sentryProjectSlug` off the project row,
call Sentry's `GET /api/0/projects/{org}/{slug}/issues/`.
2. **`project_error_detail { projectId, issueId }`**
- Returns: `{ stacktrace, breadcrumbs, request, user, replay_url }`
- Implementation: Sentry's `GET /api/0/issues/{id}/events/latest/`.
3. **`project_error_resolve { projectId, issueId }`**
- Side-effect: marks the issue resolved in Sentry.
- Used by the AI after it ships a fix and confirms via tests.
- Implementation: Sentry's `PUT /api/0/issues/{id}/` with
`status: "resolved"`.
**Auth:** token storage is per-project (from Stage 1's `projects`
row). Each project's AI sees only its own project's errors. No
cross-project leakage.
**Definition of done:** in a Vibn chat for a project with known
errors, ask the AI "any errors lately?" → AI calls
`project_recent_errors` → shows real list.
---
## Stage 4 — Auto-surface unresolved errors at chat-turn start (~1 hr)
**Goal:** the AI doesn't wait to be asked. When the user opens a
chat and there are unresolved errors, the AI mentions them on the
first turn.
**What gets built:**
In `vibn-frontend/app/api/chat/route.ts`, at the start of each chat
turn (before calling the model):
1. Call the same `project_recent_errors` logic Stage 3 exposed.
2. If `count > 0`, prepend a synthetic system message:
```
[PROJECT HEALTH]
{N} unresolved Sentry issues in the last 24 hours:
- {title} (×{count}, last seen {time}) — {culprit}
- ...
If the user's first message is unrelated to these, you may still
proactively mention them: "Quick FYI before we get into that —
{X} has been failing for users."
If their message IS about a broken thing, prefer the matching
Sentry issue's stack trace over guessing.
```
3. Only fire this once per N chat turns (configurable, default 1
per session opening) — we don't want to spam every turn.
**Risk:** false alarms (Sentry issue from yesterday's deploy that
no one cares about anymore) make the AI annoying. Mitigation:
tighten the `since` window to the last 6h, and only surface issues
with `count >= 2` (one-off errors don't count).
**Definition of done:** intentionally break a deployed user
project, open chat, type "what's up?" → AI's first response
mentions the issue, with file path.
---
## Out of scope for this proposal
- **User-owned Sentry orgs.** Some users will eventually want their
own Sentry account, not the shared `vibnai` org. Ship-later;
doesn't block the loop. Easy retrofit because storage is already
per-project.
- **Performance / Tracing data.** Sentry also captures spans /
traces. Useful for "this endpoint is slow" but not the urgent
product loop. Ship-later.
- **Front-end UI for errors in Vibn.** A "Health" tab showing the
Sentry feed in the Vibn UI is nice but not required for the AI
loop to work. Ship-later.
---
## Recommendation
Add a **Phase 2.9 (Sentry-as-product loop)** to `BETA_LAUNCH_PLAN.md`
covering Stages 14 as a single bundle. Estimate: **8 hr engineering**.
This is the second-highest-leverage item still ahead of beta,
behind only the deploy-failed webhook (which is 30 min). Every
hour spent here directly upgrades the value of every other beta
test session that follows it.
## Architecture
- Sentry is automatically provisioned for every new project (`lib/integrations/sentry.ts`).
- Environment variables (`NEXT_PUBLIC_SENTRY_DSN` and `SENTRY_AUTH_TOKEN`) are injected into the Coolify app.
- The AI has access to `project_recent_errors`, `project_error_detail`, and `project_error_resolve` MCP tools to automatically read, diagnose, and fix exceptions directly from the Sentry API.
- If unhandled exceptions are firing, the AI is prompted at the start of a conversation to address them (`app/api/chat/route.ts`).