docs: heavily compress and simplify remaining reference files to represent current state
This commit is contained in:
@@ -1,904 +1,22 @@
|
||||
# Vibn AI Capabilities
|
||||
|
||||
> The full set of actions an AI agent can take on behalf of a Vibn workspace,
|
||||
> along with the REST endpoints, MCP tools, and safety rails that back them.
|
||||
>
|
||||
> **Audience:** agent authors, Cursor rule writers, MCP tool designers, and
|
||||
> anyone building on the Vibn control plane.
|
||||
>
|
||||
> **Scope:** everything an agent sees through `https://vibnai.com/api/*` and
|
||||
> the `/api/mcp` bridge. No Firestore, no internal agent orchestration —
|
||||
> just the tenant-safe capability surface.
|
||||
|
||||
---
|
||||
|
||||
## 1. Mental model
|
||||
|
||||
Every capability in this document operates on a single **workspace**. A
|
||||
workspace is Vibn's tenant boundary and maps 1:1 to:
|
||||
|
||||
| Vibn concept | External identity | Example (`mark`) |
|
||||
|---|---|---|
|
||||
| Workspace | `vibn_workspaces.slug` | `mark` |
|
||||
| Gitea org | `gitea_org` | `vibn-mark` |
|
||||
| Gitea bot user | `gitea_bot_username` | `mark-bot` |
|
||||
| SSH deploy keypair | `coolify_private_key_uuid` + `gitea_bot_ssh_key_id` | registered on both sides |
|
||||
| Coolify project | `coolify_project_uuid` | `vibn-ws-mark` |
|
||||
| Coolify environment | `coolify_environment_name` | `production` |
|
||||
| Domain namespace | `*.{slug}.vibnai.com` | `*.mark.vibnai.com` |
|
||||
| AI token | `vibn_sk_…` | one per agent/device |
|
||||
|
||||
A single agent token can only act on the workspace it was minted for. Cross-
|
||||
workspace access is structurally impossible — enforced in
|
||||
[`lib/coolify.ts`](./vibn-frontend/lib/coolify.ts) by matching every Coolify
|
||||
resource's `environment_id` against the workspace's project environments
|
||||
(`ensureResourceInProject`).
|
||||
|
||||
### The three views
|
||||
|
||||
All capabilities roll up into three user-facing surfaces:
|
||||
|
||||
- **Code** — every Gitea repo under `vibn-{slug}/`.
|
||||
- **Live** — every Coolify app/database/service in `vibn-ws-{slug}`, each
|
||||
reachable under `*.{slug}.vibnai.com`.
|
||||
- **IDE** — Browser-based agent workspace sessions (outside the scope of this doc).
|
||||
|
||||
---
|
||||
|
||||
## 2. Authentication
|
||||
|
||||
Every agent-facing endpoint accepts **either**:
|
||||
|
||||
- `Authorization: Bearer vibn_sk_<base64url>` — a workspace-scoped API key
|
||||
minted in the settings panel. Stored as a sha256 hash server-side; the
|
||||
plaintext is shown exactly once on creation. Can be revoked at any time.
|
||||
- A NextAuth session cookie — used for the dashboard UI and for browser
|
||||
debugging. Not suitable for long-running agents.
|
||||
|
||||
Helper: [`requireWorkspacePrincipal()`](./vibn-frontend/lib/auth/workspace-auth.ts)
|
||||
resolves either to a `WorkspacePrincipal { workspace, user?, source }`.
|
||||
|
||||
**403 on a tenant mismatch means:** the token is valid, but the resource
|
||||
belongs to another workspace. The agent should stop and ask the user.
|
||||
|
||||
---
|
||||
|
||||
## 3. MCP surface
|
||||
|
||||
The MCP bridge lives at `POST https://vibnai.com/api/mcp`. It takes
|
||||
JSON-over-HTTP bodies shaped like:
|
||||
|
||||
```json
|
||||
{ "tool": "<tool-name>", "params": { /* tool-specific */ } }
|
||||
```
|
||||
|
||||
The Cursor / Claude Desktop config block is auto-generated in the settings
|
||||
panel and looks like:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"vibn-mark": {
|
||||
"url": "https://vibnai.com/api/mcp",
|
||||
"headers": { "Authorization": "Bearer vibn_sk_…" }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`GET /api/mcp` returns a self-description with the current tool list.
|
||||
Version: **2.1.0**.
|
||||
|
||||
### 3.1 Workspace & identity tools
|
||||
|
||||
| Tool | Purpose | Params |
|
||||
|---|---|---|
|
||||
| `workspace.describe` | Returns slug, Coolify project uuid, Gitea org, provision status. | — |
|
||||
| `gitea.credentials` | Returns the bot's username, PAT, clone URL template, and SSH remote template. Use this for every `git clone`/push — never other credentials. | — |
|
||||
|
||||
### 3.2 Project tools
|
||||
|
||||
| Tool | Purpose | Params |
|
||||
|---|---|---|
|
||||
| `projects.list` | Lists Vibn projects (PRDs, imports, etc.) in the workspace. | — |
|
||||
| `projects.get` | Single project details. | `{ projectId }` |
|
||||
|
||||
### 3.3 Application tools
|
||||
|
||||
| Tool | Purpose | Params |
|
||||
|---|---|---|
|
||||
| `apps.list` | All Coolify apps in the workspace. | — |
|
||||
| `apps.get` | Single app details (status, fqdn, domains, git info). | `{ uuid }` |
|
||||
| `apps.create` | Create a Coolify app. **Four pathways** — pick the one that matches your source. **(1) Gitea repo** (user's own code): pass `repo`. Clones over HTTPS+PAT; no SSH. **(2) Docker image** (pre-built single-container third-party app, e.g. `nginx:alpine`): pass `image`. **(3) Inline Docker Compose YAML** (custom multi-service stack): pass `composeRaw`. **(4) Coolify one-click template** (RECOMMENDED for popular apps — Twenty, n8n, Supabase, Ghost, etc): pass `template` with a slug from `apps.templates.search`. Templates have battle-tested env defaults, healthchecks, and `depends_on` graphs. **Use pathway 4 over pathway 3 whenever a template exists** — it is dramatically more reliable. Auto-domain `{name}.{slug}.vibnai.com` for all pathways. | **(1) repo:** `{ repo, branch?, name?, ports?, buildPack?, domain?, envs?, instantDeploy?, dockerComposeLocation?, dockerfileLocation?, baseDirectory? }` **(2) image:** `{ image, name?, ports?, domain?, envs?, instantDeploy? }` **(3) composeRaw:** `{ composeRaw, name?, domain?, envs?, instantDeploy? }` **(4) template:** `{ template, name?, domain?, envs?, instantDeploy? }` |
|
||||
| `apps.update` | PATCH a whitelisted set of fields (name, description, git branch/commit, ports, build commands, base directory, Dockerfile location, docker-compose location…). Returns `applied`, `ignored`, and `rerouted` arrays so the agent can see exactly what persisted; setting `fqdn`/`domains`/`docker_compose_domains` returns a `rerouted` entry pointing at `apps.domains.set`, and setting `git_repository` returns one pointing at `apps.rewire_git`. | `{ uuid, patch }` |
|
||||
| `apps.rewire_git` | Re-point an app's `git_repository` at the canonical HTTPS+PAT clone URL. Use to recover older apps that were created with SSH URLs, or to refresh a rotated bot PAT. | `{ uuid, repo? }` — `repo` optional; inferred from current URL if omitted |
|
||||
| `apps.delete` | Destroy the app. Volumes kept by default. | `{ uuid, confirm }` — `confirm` must equal the app's exact name |
|
||||
| `apps.deploy` | Trigger a new deployment. | `{ uuid, force? }` |
|
||||
| `apps.deployments` | List recent deployments + status. | `{ uuid }` |
|
||||
| `apps.logs` | Runtime logs for a running app. Compose-aware: returns per-service logs for `dockercompose` build packs, single stream for `dockerfile`/`nixpacks`. Includes container status and any diagnostic warnings. | `{ uuid, service?, lines? }` — `service` filter (compose only), `lines` default 200, max 5000 |
|
||||
| `apps.volumes.list` | List Docker volumes belonging to an app (name + size in bytes). Use before `apps.volumes.wipe` to know exact volume names. | `{ uuid }` |
|
||||
| `apps.volumes.wipe` | **Destructive / irreversible.** Stop all app containers, remove a specific volume, leave it ready for a fresh `apps.deploy`. Use to recover from stale DB state on first boot (the most common compose app failure). `confirm` must equal the exact volume name. | `{ uuid, volume, confirm }` |
|
||||
| `apps.containers.up` | Run `docker compose up -d` directly on the Coolify host for a compose app or service. Bypasses Coolify's queued-start worker (which routinely fails to actually invoke compose). Use after env or domain changes to recreate containers, or as a recovery path when `apps.create`/`apps.deploy` returned `started: false`. Idempotent — already-running containers are no-op'd. Up to 10 min timeout. Returns `{ ok, code, stdout, stderr, durationMs }`. | `{ uuid }` |
|
||||
| `apps.containers.ps` | `docker compose ps -a` against the rendered compose dir. Quick diagnostic for "why isn't my stack running?" — distinguishes `Created` (queued-start failure → use `apps.containers.up`), `Exited` (app crash → use `apps.logs`), `Restarting` (boot loop → use `apps.logs`), and `Up healthy/unhealthy`. | `{ uuid }` |
|
||||
| `apps.templates.list` | Browse the full Coolify one-click template catalog (320+ vetted apps: CRMs, AI tools, CMSes, dashboards, databases, …). Each entry is deployable via `apps.create({ template: <slug> })`. Returns `{ total, offset, limit, items: [{ slug, slogan, tags, port, documentation, logo }] }`. Catalog is fetched from upstream and cached for 1h. | `{ limit?, offset?, tag? }` — `limit` default 50, max 500; `tag` substring filter (e.g. `"crm"`, `"ai"`) |
|
||||
| `apps.templates.search` | Find templates by name, tag, or slogan. Ranked: exact-slug > slug-starts-with > slug-contains > tag-exact > tag-contains > slogan. Use this **before** `apps.create` to discover the right slug (e.g. `"twenty"`, `"n8n-with-postgres-and-worker"`, `"forgejo-with-postgresql"`). | `{ query, tag?, limit? }` — `limit` default 25, max 100. Either `query` or `tag` must be set |
|
||||
| `apps.exec` | Run a one-shot command inside an app container (via `docker exec` on the Coolify host). Compose-aware: pass `service` when the app has >1 container. Returns `{ container, service, code, stdout, stderr, truncated, durationMs, containerHealth }`. Default timeout 60s (max 10 min); default output cap 1 MB (max 5 MB). Command is run through `sh -lc` so shell syntax works. Use this for database migrations, seeds, CLI invocations, and ad-hoc debugging. Every call is audit-logged (command + target, not output). | `{ uuid, command, service?, user?, workdir?, timeout_ms?, max_bytes? }` |
|
||||
| `apps.domains.list` | Current domain set. | `{ uuid }` |
|
||||
| `apps.domains.set` | Replace the domain set. All entries must end with `.{slug}.vibnai.com`. Compose-aware: for `dockercompose` apps the domain is attached to a specific service (`server` by default; override with `service`). | `{ uuid, domains: string[], service? }` |
|
||||
| `apps.envs.list` | List env vars. Values returned are redacted for `shown-once` secrets. | `{ uuid }` |
|
||||
| `apps.envs.upsert` | Create or update an env var. `is_build_time` is **ignored** — Coolify derives build-vs-runtime from Dockerfile `ARG` usage. | `{ uuid, key, value, isPreview?, isMultiline?, isLiteral?, isShownOnce? }` |
|
||||
| `apps.envs.delete` | Delete an env var. | `{ uuid, key }` |
|
||||
|
||||
### 3.4 Database tools
|
||||
|
||||
| Tool | Purpose | Params |
|
||||
|---|---|---|
|
||||
| `databases.list` | All databases in the workspace, across all flavors. | — |
|
||||
| `databases.create` | Provision a database. Supported `type`: `postgresql`, `mysql`, `mariadb`, `mongodb`, `redis`, `keydb`, `dragonfly`, `clickhouse`. | `{ type, name?, isPublic?, publicPort?, image?, credentials?, limits? }` |
|
||||
| `databases.get` | Details + internal connection URL. | `{ uuid }` |
|
||||
| `databases.update` | PATCH name, public visibility, image, limits. | `{ uuid, patch }` |
|
||||
| `databases.delete` | Destroy the database. Volumes kept by default. | `{ uuid, confirm }` — `confirm` must equal the db's exact name |
|
||||
|
||||
### 3.5 Auth provider tools
|
||||
|
||||
Authentication is a first-class capability. An agent cannot spin up arbitrary
|
||||
Coolify services — only vetted auth providers from an allowlist.
|
||||
|
||||
| Tool | Purpose | Params |
|
||||
|---|---|---|
|
||||
| `auth.list` | Auth providers currently deployed in the workspace (classified by Coolify's `service_type`). | — |
|
||||
| `auth.create` | Provision one of the allowed providers. | `{ provider, name?, description?, instantDeploy? }` |
|
||||
| `auth.delete` | Destroy an auth provider. Volumes (user data) kept by default. | `{ uuid, confirm }` — `confirm` must equal the service's exact name |
|
||||
|
||||
**Allowed providers** (keys passed as `provider`):
|
||||
|
||||
- `pocketbase` — lightweight (SQLite) auth + data, single container.
|
||||
- `authentik` — feature-rich self-hosted IDP.
|
||||
- `keycloak` / `keycloak-with-postgres` — industry-standard OIDC/SAML.
|
||||
- `pocket-id` / `pocket-id-with-postgresql` — passkey-first OIDC.
|
||||
- `logto` — dev-first IDP.
|
||||
- `supertokens-with-postgresql` — session/auth backend.
|
||||
|
||||
Requesting anything outside this list returns 400 with a hint listing the
|
||||
allowed ones, so the agent can self-correct.
|
||||
|
||||
### 3.6 Domain tools (P5.1 — custom apex domains)
|
||||
|
||||
Custom apex domains are owned end-to-end by Vibn: the registrar is OpenSRS
|
||||
(Tucows), authoritative DNS is Google Cloud DNS in the Canadian project, and
|
||||
domains are pinned to the workspace that registered them. All four lifecycle
|
||||
steps — search, register, attach, inspect — are agent-callable.
|
||||
|
||||
| Tool | Purpose | Params |
|
||||
|---|---|---|
|
||||
| `domains.search` | Check availability + price for one or more candidate apex domains via OpenSRS. Stateless; does not reserve anything. | `{ names: string[], period?: number }` — `names` up to 25, `period` in years (auto-bumped for quirky TLDs like `.ai` which requires 2y minimum). |
|
||||
| `domains.register` | Register a domain through OpenSRS. Registers unlocked; locking happens automatically after `domains.attach` completes. Idempotent per `(workspace, domain)`. | `{ domain, period?, whoisPrivacy?, contact, nameservers?, ca?: { cprCategory, legalType } }` — `ca.*` required for `.ca`. |
|
||||
| `domains.list` | List all domains owned by the workspace with their status, registrar order id, expiry, and DNS provider/zone. | — |
|
||||
| `domains.get` | Full record + last 20 lifecycle events. | `{ domain }` |
|
||||
| `domains.attach` | Wire a registered domain to a Coolify app (or arbitrary IP/CNAME): create Cloud DNS zone, write A/CNAME rrsets, update registrar-side nameservers, append FQDNs to the Coolify app's domain list. Idempotent; safe to retry. | `{ domain, appUuid? \| ip? \| cname?, subdomains?: string[] (default ["@","www"]), updateRegistrarNs? }` |
|
||||
|
||||
### Object storage (GCS via S3-compatible HMAC)
|
||||
|
||||
Every workspace gets a Canada-hosted GCS bucket, a dedicated service
|
||||
account, and an HMAC keypair so agent-built apps can use any AWS S3
|
||||
SDK. The HMAC *secret* is never returned through the API — it's written
|
||||
directly into Coolify apps via `storage.inject_env`.
|
||||
|
||||
| Tool | Purpose | Params |
|
||||
|---|---|---|
|
||||
| `storage.describe` | Report the workspace bucket name, region, S3 endpoint, access-key id, and provision status. No secret returned. | — |
|
||||
| `storage.provision` | Idempotently create/reconcile the workspace's GCP service account, JSON keyfile, bucket (`vibn-ws-{slug}-{rand}`), IAM binding, and HMAC key. Safe to re-run. | — |
|
||||
| `storage.inject_env` | Push `STORAGE_*` env vars (endpoint, region, bucket, access key id, secret access key, force_path_style) into a Coolify app. The secret is written server-side with `is_shown_once=true`; it never transits the response body. | `{ uuid, prefix? }` — `prefix` defaults to `STORAGE_`; use `S3_` for apps that expect AWS-standard names |
|
||||
|
||||
The bucket is S3-compatible: point any `aws-sdk` / `@aws-sdk/client-s3`
|
||||
/ `boto3` at `STORAGE_ENDPOINT` with `force_path_style=true` (`STORAGE_*`
|
||||
env vars are set by `storage.inject_env`).
|
||||
|
||||
**Residency note:** Cloud DNS is global anycast — configuration is not
|
||||
Canadian-pinned at the storage layer. The workspace-level `dns_provider`
|
||||
flag (default `cloud_dns`) will let us swap in CIRA D-Zone for strict
|
||||
Canadian residency without touching the MCP surface.
|
||||
|
||||
**Billing:** Every successful `domains.register` writes a `debit` row to
|
||||
`vibn_billing_ledger` with the OpenSRS order id as `ref_id`. The
|
||||
`vibn_domain_events` table keeps an append-only audit of every lifecycle
|
||||
call (`register.attempt`, `register.success`, `register.failed`,
|
||||
`attach.success`).
|
||||
|
||||
**Verified end-to-end (2026-04-22)** against PROD GCP + OpenSRS sandbox +
|
||||
PROD Coolify (Coolify `v4.0.0-beta.473`); see
|
||||
`vibn-frontend/scripts/smoke-attach-e2e.ts`. **All 5 sub-systems green.**
|
||||
|
||||
- ✓ OpenSRS register against Horizon (sandbox) returns order id, response 200.
|
||||
- ✓ Cloud DNS managed zone created in `master-ai-484822` with public anycast NS.
|
||||
- ✓ A records (`@`, `www`) written to the zone.
|
||||
- ✓ Registrar-side nameserver update accepts Cloud DNS NS values
|
||||
(trailing-dot normalization in `lib/opensrs.ts`); sandbox returns 480
|
||||
because its mock registry doesn't know real Google NS hosts, which is
|
||||
expected — live mode talks to real registries that accept any resolvable NS.
|
||||
- ✓ Unlock → update NS → relock fallback path verified (sandbox-recognized
|
||||
nameservers return 200; the unlock/relock sequence is exercised when the
|
||||
registry returns 405 lock-conflict).
|
||||
- ✓ Coolify domain-list PATCH adds the apex + `www` to the application
|
||||
`fqdn` column and the smoke test re-fetches it to confirm.
|
||||
|
||||
> **Operational gotcha — the destination server must be proxy-enabled.**
|
||||
> Coolify's `update_by_uuid` controller accepts `domains` as a comma-separated
|
||||
> list and only maps it onto the model's `fqdn` column when the destination
|
||||
> server's `Server::isProxyShouldRun()` returns `true`. That helper requires
|
||||
> **both** `proxy.type ∈ {TRAEFIK, CADDY}` *and* `is_build_server = false`.
|
||||
> If either is misconfigured the PATCH returns 200 but the field is silently
|
||||
> dropped (Laravel mass-assignment ignores `domains` because it isn't in
|
||||
> `$fillable`, and the controller never copies it into `fqdn`). We hit this
|
||||
> on `coolify-server-mtl` (`zg4cwgc44ogc08804000gggo`), which had
|
||||
> `proxy=null` and `is_build_server=true`. Fixed by:
|
||||
>
|
||||
> ```sql
|
||||
> UPDATE servers
|
||||
> SET proxy = jsonb_set(coalesce(proxy,'{}'::jsonb), '{type}', '"TRAEFIK"')
|
||||
> WHERE uuid = 'zg4cwgc44ogc08804000gggo';
|
||||
> UPDATE server_settings
|
||||
> SET is_build_server = false
|
||||
> WHERE server_id = (SELECT id FROM servers WHERE uuid = 'zg4cwgc44ogc08804000gggo');
|
||||
> ```
|
||||
>
|
||||
> followed by `docker restart coolify` to clear Laravel's in-memory config.
|
||||
> Sending `fqdn` directly is **not** an alternative — the controller's
|
||||
> `$allowedFields` whitelist rejects it with 422 "This field is not allowed."
|
||||
|
||||
### 3.7 Agent-side stdio MCP servers (`vibn-agent-runner`)
|
||||
|
||||
Separate from the control-plane MCP at `/api/mcp` (which is what external
|
||||
agents call *into* Vibn), the `vibn-agent-runner` exposes its own in-house
|
||||
tool surface *outward* over stdio MCP. This lets Cursor, Claude Desktop,
|
||||
Goose, or any MCP-speaking client drive the same Coolify / Gitea / workspace
|
||||
tooling the Coder/PM/Marketing sub-agents use internally — with the same
|
||||
protected-repo and protected-app guardrails enforced centrally.
|
||||
|
||||
Architecture: every tool now has three touch-points backed by one source of truth:
|
||||
|
||||
```
|
||||
vibn-agent-runner/src/tools/<domain>-api.ts ← pure, config-agnostic logic + security guards
|
||||
vibn-agent-runner/src/tools/<domain>.ts ← thin registerTool() wrappers for the in-process agent loop
|
||||
vibn-agent-runner/src/mcp/<domain>-server.ts ← stdio MCP server for external clients
|
||||
```
|
||||
|
||||
| Server | Tools | Required env |
|
||||
|---|---|---|
|
||||
| `vibn-coolify-mcp` | 7 — list_projects, list_applications, deploy, get_logs, list_all_apps, get_app_status, deploy_app | `COOLIFY_API_URL`, `COOLIFY_API_TOKEN` |
|
||||
| `vibn-gitea-mcp` | 6 — create/list/close issues, list_repos, list_all_issues, read_repo_file | `GITEA_API_URL`, `GITEA_API_TOKEN`, `GITEA_USERNAME` |
|
||||
| `vibn-workspace-mcp` | 8 — read/write/replace/list/find/search_code, execute_command, git_commit_and_push | `WORKSPACE_ROOT` (+ Gitea creds for git push) |
|
||||
| `vibn-platform-mcp` | 7 — save_memory, list_memory, list_skills, get_skill, finalize_prd, get_prd, web_search | `SESSION_KEY` (optional), Gitea creds (for skills) |
|
||||
| `vibn-agent-mcp` | 2 — spawn_agent, get_job_status (dispatches into the runner's HTTP API) | `AGENT_RUNNER_URL` (defaults to `http://localhost:3333`) |
|
||||
|
||||
Run locally with `npm run mcp:<name>` (or `:dev` via ts-node) in
|
||||
`vibn-agent-runner/`. Smoke-test any server with
|
||||
`node scripts/smoke-mcp.js <name>`. The in-process agent loop still sees
|
||||
the same 28 registered tools — no behavioral regression.
|
||||
|
||||
---
|
||||
|
||||
## 4. REST surface
|
||||
|
||||
Every MCP tool is also exposed as a plain HTTP endpoint under
|
||||
`/api/workspaces/{slug}/…`. Agents that prefer curl-style access can use
|
||||
these directly; the shape is identical to the MCP `params`. Auth is the
|
||||
same bearer header.
|
||||
|
||||
### 4.1 Workspace & key management
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| GET | `/api/workspaces` | All workspaces the principal has access to. |
|
||||
| GET | `/api/workspaces/{slug}` | Workspace details. |
|
||||
| POST | `/api/workspaces/{slug}/provision` | Idempotent re-run of Gitea org + bot + SSH keypair + Coolify project setup. |
|
||||
| GET | `/api/workspaces/{slug}/keys` | List API keys (metadata only). |
|
||||
| POST | `/api/workspaces/{slug}/keys` | Mint a new API key. Full token returned once. |
|
||||
| DELETE | `/api/workspaces/{slug}/keys/{keyId}` | Revoke a key. |
|
||||
| GET | `/api/workspaces/{slug}/gitea-credentials` | Return bot username, PAT (decrypted), clone/SSH templates. |
|
||||
| GET | `/api/workspaces/{slug}/bootstrap.sh` | Shell script that writes `.cursor/rules`, `.cursor/mcp.json`, `.env.local` into the cwd. |
|
||||
|
||||
### 4.2 Applications
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| GET | `/api/workspaces/{slug}/apps` | List apps. |
|
||||
| POST | `/api/workspaces/{slug}/apps` | Create an app from a workspace repo. |
|
||||
| GET | `/api/workspaces/{slug}/apps/{uuid}` | App details. |
|
||||
| PATCH | `/api/workspaces/{slug}/apps/{uuid}` | Update whitelisted fields. |
|
||||
| DELETE | `/api/workspaces/{slug}/apps/{uuid}?confirm=<exact-name>` | Destroy app. |
|
||||
| POST | `/api/workspaces/{slug}/apps/{uuid}/deploy` | Trigger deploy. |
|
||||
| GET | `/api/workspaces/{slug}/apps/{uuid}/deployments` | List deployments. |
|
||||
| GET | `/api/workspaces/{slug}/apps/{uuid}/domains` | List domains. |
|
||||
| PATCH | `/api/workspaces/{slug}/apps/{uuid}/domains` | Replace domain set. |
|
||||
| GET | `/api/workspaces/{slug}/apps/{uuid}/envs` | List env vars. |
|
||||
| PATCH | `/api/workspaces/{slug}/apps/{uuid}/envs` | Upsert env var(s). |
|
||||
| DELETE | `/api/workspaces/{slug}/apps/{uuid}/envs?key=FOO` | Delete env var. |
|
||||
| GET | `/api/workspaces/{slug}/deployments/{deploymentUuid}/logs` | Deployment logs. |
|
||||
|
||||
### 4.3 Databases
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| GET | `/api/workspaces/{slug}/databases` | List databases. |
|
||||
| POST | `/api/workspaces/{slug}/databases` | Create a database (8 flavors). |
|
||||
| GET | `/api/workspaces/{slug}/databases/{uuid}` | Database details + internal connection URL. |
|
||||
| PATCH | `/api/workspaces/{slug}/databases/{uuid}` | Update fields. |
|
||||
| DELETE | `/api/workspaces/{slug}/databases/{uuid}?confirm=<exact-name>` | Destroy database. |
|
||||
|
||||
### 4.4 Auth providers
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| GET | `/api/workspaces/{slug}/auth` | List deployed auth providers + the allowlist. |
|
||||
| POST | `/api/workspaces/{slug}/auth` | Provision a provider from the allowlist. |
|
||||
| GET | `/api/workspaces/{slug}/auth/{uuid}` | Provider details. |
|
||||
| DELETE | `/api/workspaces/{slug}/auth/{uuid}?confirm=<exact-name>` | Destroy provider. |
|
||||
|
||||
### 4.5 Domains (P5.1)
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| POST | `/api/workspaces/{slug}/domains/search` | Availability + pricing for up to 25 candidate names. |
|
||||
| GET | `/api/workspaces/{slug}/domains` | List workspace-owned domains. |
|
||||
| POST | `/api/workspaces/{slug}/domains` | Register a domain (idempotent per `(workspace, domain)`). |
|
||||
| GET | `/api/workspaces/{slug}/domains/{domain}` | Full record + last 20 events. |
|
||||
| POST | `/api/workspaces/{slug}/domains/{domain}/attach` | Create Cloud DNS zone, write records, update registrar NS, wire Coolify domain list. |
|
||||
|
||||
---
|
||||
|
||||
## 5. Gitea surface
|
||||
|
||||
AI agents **never** talk to the root Gitea admin token. They use the
|
||||
workspace's dedicated bot user.
|
||||
|
||||
### 5.1 What the bot can do
|
||||
|
||||
- Fully own the `vibn-{slug}` org (added as the org's owner team).
|
||||
- Read/write every repo in that org via its PAT.
|
||||
- Push over SSH using the workspace's ed25519 deploy key (same keypair
|
||||
Coolify uses to pull code).
|
||||
- What it **cannot** do: touch any other org, the root admin surface, or
|
||||
Gitea's `/admin/*` endpoints.
|
||||
|
||||
### 5.2 How to get the bot credentials
|
||||
|
||||
```http
|
||||
GET /api/workspaces/{slug}/gitea-credentials
|
||||
Authorization: Bearer vibn_sk_…
|
||||
```
|
||||
|
||||
Returns:
|
||||
|
||||
```json
|
||||
{
|
||||
"bot": { "username": "mark-bot", "token": "…" },
|
||||
"gitea": {
|
||||
"apiBase": "https://git.vibnai.com/api/v1",
|
||||
"host": "git.vibnai.com",
|
||||
"cloneUrlTemplate": "https://mark-bot:{{token}}@git.vibnai.com/vibn-mark/{{repo}}.git",
|
||||
"sshRemoteTemplate": "git@git.vibnai.com:vibn-mark/{{repo}}.git",
|
||||
"webUrlTemplate": "https://git.vibnai.com/vibn-mark/{{repo}}"
|
||||
},
|
||||
"workspace": { "slug": "mark", "giteaOrg": "vibn-mark" }
|
||||
}
|
||||
```
|
||||
|
||||
The PAT is stored **encrypted at rest** using AES-256-GCM with the
|
||||
`VIBN_SECRETS_KEY` server secret; the decrypt step runs only on this endpoint.
|
||||
|
||||
### 5.3 Gitea operations via the standard Gitea API
|
||||
|
||||
Once the agent has `{bot.token, gitea.apiBase}`, it can call any standard
|
||||
Gitea v1 endpoint as the bot, scoped to the workspace org. Common ones:
|
||||
|
||||
- `POST /orgs/{org}/repos` — create a repo.
|
||||
- `PATCH /repos/{org}/{repo}` — update repo settings.
|
||||
- `GET /repos/{org}/{repo}/contents/{path}` — read files.
|
||||
- `PUT /repos/{org}/{repo}/contents/{path}` — write files (commits).
|
||||
- `POST /repos/{org}/{repo}/pulls` — open PRs.
|
||||
- `POST /repos/{org}/{repo}/branches` — create branches.
|
||||
|
||||
---
|
||||
|
||||
## 6. Domain policy
|
||||
|
||||
Every app gets an auto-generated domain under the workspace's namespace:
|
||||
|
||||
```
|
||||
{app-slug}.{workspace-slug}.vibnai.com
|
||||
```
|
||||
|
||||
For example, creating an app named `my-api` in workspace `mark` yields
|
||||
`my-api.mark.vibnai.com` automatically — no DNS config, no cert work,
|
||||
served by Coolify's wildcard Traefik.
|
||||
|
||||
### 6.1 What agents can do
|
||||
|
||||
- Accept the auto-generated domain (default path).
|
||||
- Replace the domain set via `PATCH /apps/{uuid}/domains`, provided every
|
||||
entry ends with `.{workspace-slug}.vibnai.com`.
|
||||
|
||||
### 6.2 What agents cannot do
|
||||
|
||||
- Point an app at a domain outside the workspace's namespace. The server
|
||||
rejects this with 403 regardless of DNS state:
|
||||
|
||||
```json
|
||||
{ "error": "Domain evil.com is not allowed; must end with .mark.vibnai.com",
|
||||
"hint": "Use my-api.mark.vibnai.com" }
|
||||
```
|
||||
|
||||
This is enforced by `isDomainUnderWorkspace()` in
|
||||
[`lib/naming.ts`](./vibn-frontend/lib/naming.ts).
|
||||
|
||||
### 6.3 Custom (external) domains
|
||||
|
||||
Not exposed to AI agents. A human can still add them through Coolify
|
||||
directly or through a future human-gated UI.
|
||||
|
||||
---
|
||||
|
||||
## 7. Safety model
|
||||
|
||||
### 7.1 Tenant enforcement
|
||||
|
||||
Every resource-returning helper in `lib/coolify.ts` runs through
|
||||
`ensureResourceInProject()`. It:
|
||||
|
||||
1. Trusts an explicit `project_uuid` on the resource if present, else
|
||||
2. Fetches the project's environment ids via `GET /projects/{uuid}` and
|
||||
verifies the resource's `environment_id` is in that set.
|
||||
|
||||
A token for `mark` that tries to read an app in `justine`'s project returns:
|
||||
|
||||
```json
|
||||
{ "error": "Application <uuid> does not belong to project <mark-project-uuid>" }
|
||||
```
|
||||
|
||||
with HTTP 403. Cross-workspace enumeration and access are not just
|
||||
discouraged — they fail at the helper level.
|
||||
|
||||
### 7.2 Destructive operations
|
||||
|
||||
Every delete endpoint requires `?confirm=<exact-resource-name>`:
|
||||
|
||||
```
|
||||
DELETE /apps/{uuid} → 409 "confirmation required"
|
||||
DELETE /apps/{uuid}?confirm=wrong → 409 "confirmation required"
|
||||
DELETE /apps/{uuid}?confirm=my-api → 200 deleted
|
||||
```
|
||||
|
||||
This means an agent hallucinating a delete call cannot cost you the
|
||||
resource — it must first know the exact name, which implies it just listed
|
||||
or just created it.
|
||||
|
||||
**Volumes are kept by default** on delete. To also remove volumes, pass
|
||||
`?volumes=delete` (apps/dbs) — this is opt-in, per-call, never the default.
|
||||
|
||||
### 7.3 Creation guardrails
|
||||
|
||||
- Apps can only be created from repos in the workspace's Gitea org.
|
||||
- Auth providers can only be created from the allowlist (see §3.5).
|
||||
- Database flavors are restricted to the 8 Coolify supports.
|
||||
- Env var keys must match `/^[A-Z_][A-Z0-9_]*$/` (no shell-escape tricks).
|
||||
|
||||
### 7.4 Secrets handling
|
||||
|
||||
- `VIBN_API_KEY` is only shown **once** on mint. Server keeps a sha256 hash.
|
||||
- Gitea bot PATs are **encrypted at rest** (AES-256-GCM with
|
||||
`VIBN_SECRETS_KEY`).
|
||||
- The SSH private key is held by Coolify, not by Vibn; the public key is
|
||||
pushed to the Gitea bot user's key list. Rotating is a re-provision.
|
||||
- Agent prompts and Cursor rules include a "treat VIBN_API_KEY like a
|
||||
password — never print or commit it" directive.
|
||||
|
||||
---
|
||||
|
||||
## 8. Worked examples
|
||||
|
||||
### 8.1 "Build me a Next.js app with a Postgres and Pocketbase auth"
|
||||
|
||||
From the agent's side, using MCP:
|
||||
|
||||
```json
|
||||
// 1. Ensure a repo exists in the workspace org (standard Gitea API,
|
||||
// using the bot PAT from gitea.credentials).
|
||||
POST https://git.vibnai.com/api/v1/orgs/vibn-mark/repos
|
||||
{ "name": "my-site", "private": true, "auto_init": true }
|
||||
|
||||
// 2. Create the Coolify app. Auto-domain my-site.mark.vibnai.com.
|
||||
{ "tool": "apps.create",
|
||||
"params": { "repo": "my-site", "ports": "3000", "instantDeploy": false } }
|
||||
|
||||
// 3. Provision a Postgres.
|
||||
{ "tool": "databases.create",
|
||||
"params": { "type": "postgresql", "name": "app-db" } }
|
||||
// → returns { internalUrl: "postgres://…@<uuid>:5432/postgres" }
|
||||
|
||||
// 4. Wire the db URL into the app as an env var.
|
||||
{ "tool": "apps.envs.upsert",
|
||||
"params": { "uuid": "<app-uuid>", "key": "DATABASE_URL",
|
||||
"value": "<internalUrl>" } }
|
||||
|
||||
// 5. Deploy Pocketbase as the auth layer.
|
||||
{ "tool": "auth.create",
|
||||
"params": { "provider": "pocketbase", "name": "auth" } }
|
||||
|
||||
// 6. First real deploy.
|
||||
{ "tool": "apps.deploy", "params": { "uuid": "<app-uuid>" } }
|
||||
|
||||
// 7. Poll.
|
||||
{ "tool": "apps.deployments", "params": { "uuid": "<app-uuid>" } }
|
||||
// → [{ uuid, status: "finished" | "in_progress" | "failed" | "queued" }]
|
||||
```
|
||||
|
||||
The agent hands the user back `https://my-site.mark.vibnai.com`.
|
||||
|
||||
### 8.2 "Add an `api` subdomain to my app"
|
||||
|
||||
```json
|
||||
{ "tool": "apps.domains.set",
|
||||
"params": {
|
||||
"uuid": "<app-uuid>",
|
||||
"domains": ["my-site.mark.vibnai.com", "api.mark.vibnai.com"]
|
||||
} }
|
||||
```
|
||||
|
||||
Valid — both end with `.mark.vibnai.com`. `evil.com` or `my-site.justine.vibnai.com`
|
||||
would return 403.
|
||||
|
||||
### 8.3 "Delete the whole thing"
|
||||
|
||||
Agent must learn the resource names first (or it'll hit the confirm gate):
|
||||
|
||||
```json
|
||||
// Learn the name.
|
||||
{ "tool": "apps.get", "params": { "uuid": "<app-uuid>" } }
|
||||
// → { name: "my-site", ... }
|
||||
|
||||
// Delete with matching confirm.
|
||||
{ "tool": "apps.delete",
|
||||
"params": { "uuid": "<app-uuid>", "confirm": "my-site" } }
|
||||
```
|
||||
|
||||
Wrong confirm returns `409 "Confirmation required"`.
|
||||
|
||||
---
|
||||
|
||||
## 9. Error handling reference
|
||||
|
||||
| Status | Meaning | What the agent should do |
|
||||
|---|---|---|
|
||||
| 400 | Bad request body (invalid JSON, missing required field, invalid type). | Fix the body, retry. |
|
||||
| 401 | No / bad bearer token. | Ask the user to mint a fresh key. |
|
||||
| 403 | **Tenant mismatch** — resource belongs to another workspace, domain outside workspace namespace, or repo not in workspace org. | **Stop.** Do not retry with guessed values. Ask the user. |
|
||||
| 404 | Resource not found (app/db/service/repo uuid wrong). | Re-list to find the right uuid. |
|
||||
| 409 | Delete confirmation missing or wrong. | Fetch the resource name first, then retry with `confirm=<name>`. |
|
||||
| 422 | Coolify validation failure (e.g. malformed domain). | Check the `details` field. |
|
||||
| 502 | Upstream Coolify/Gitea error. | Retry with backoff. |
|
||||
| 503 | Workspace not fully provisioned yet. | Call `POST /provision`, then retry. |
|
||||
|
||||
---
|
||||
|
||||
## 10. Versioning
|
||||
|
||||
The MCP descriptor at `GET /api/mcp` reports a semver `version`. Tool names
|
||||
are append-only within a major version — agents can cache the tool list
|
||||
safely for the duration of a conversation but should re-fetch on 404.
|
||||
|
||||
Current version: **2.4.8**.
|
||||
|
||||
- **1.x** — session-cookie-only MCP, no tenant keys.
|
||||
- **2.0** — `vibn_sk_…` keys, workspace-scoped Gitea bot + Coolify project.
|
||||
- **2.1** — create/update/delete for apps, 8 database flavors, auth
|
||||
provider allowlist, domain policy enforcement, confirm-gated deletes.
|
||||
- **2.2** — per-workspace GCS object storage (`storage.*`), compose-aware
|
||||
domain routing, runtime log tailing (`apps.logs`), in-container command
|
||||
execution (`apps.exec`), and diagnostic `apps.update` responses.
|
||||
- **2.3** — `apps.create` Docker-image and inline-composeRaw pathways (no
|
||||
Gitea repo required for third-party apps), `apps.volumes.list` +
|
||||
`apps.volumes.wipe` for self-service volume recovery.
|
||||
- **2.4** — `apps.create` Coolify-template pathway (`{ template: "twenty" }`
|
||||
etc.) for one-click deploy of 320+ vetted apps, plus `apps.templates.list`
|
||||
/ `apps.templates.search` for catalog discovery.
|
||||
- **2.4.1** — `apps.containers.up` / `apps.containers.ps` to bypass Coolify's
|
||||
unreliable queued-start worker. `apps.create` (template + composeRaw
|
||||
pathways) now auto-falls-back to direct `docker compose up -d` over SSH
|
||||
when Coolify's queue stalls, so a single `apps.create` call really does
|
||||
leave a running stack.
|
||||
- **2.4.2** — `apps.create` no longer reports `started: false` when only a
|
||||
sidecar (worker / scheduler) failed its `depends_on: service_healthy`
|
||||
gate. We now probe the host with `docker ps` after `compose up -d` and
|
||||
return `started: true` whenever any container of the stack is running,
|
||||
surfacing the compose stderr in `startDiag` so agents can decide whether
|
||||
to re-run `apps.containers.up` later. This matches the real-world
|
||||
behavior of slow-booting apps like Twenty (worker waits ~3 min for
|
||||
twenty's healthcheck, exceeds compose's default depends_on timeout).
|
||||
- **2.4.3** — Auto-attached stack containers to the `coolify` proxy network
|
||||
after `compose up`, fixing Traefik 503s on third-party apps.
|
||||
- **2.4.4** — Made the proxy-network attach selective (only `traefik.enable=true`
|
||||
containers) to avoid DNS aliasing collisions where Twenty's `postgres`
|
||||
hostname resolved to `coolify-db`.
|
||||
- **2.4.5** — Architectural overhaul of `apps.create` for service templates.
|
||||
We no longer run `docker compose up -d` over SSH as a deployment fallback
|
||||
(that bypassed Coolify's compose generation, causing internal services to
|
||||
land on the wrong networks). Instead `apps.create` now:
|
||||
1. Calls Coolify's `start` and lets its queue do the full deploy
|
||||
(volumes, internal networking, env interpolation, healthchecks).
|
||||
2. Polls `service.applications[*].status` (the truthful per-app status
|
||||
field — `service.status` itself routinely lies as
|
||||
`starting:unknown` while containers are healthy).
|
||||
3. Applies three surgical post-deploy fixes that Coolify's own
|
||||
pipeline omits but its REST API does not expose:
|
||||
- rewrites `SERVICE_FQDN_*` / `SERVICE_URL_*` in the rendered
|
||||
`.env` so frontends that bake their backend URL into the SPA
|
||||
bundle (Twenty's `SERVER_URL`, etc.) point at the real
|
||||
custom domain instead of the auto-generated sslip.io URL;
|
||||
- injects the missing
|
||||
`traefik.http.services.<svc>.loadbalancer.server.port` label
|
||||
(Coolify generates the routing rules but forgets the port,
|
||||
so Traefik logs `error: port is missing` and returns 503);
|
||||
- connects `coolify-proxy` to the project's Docker network
|
||||
(Coolify writes a `caddy_ingress_network=<uuid>` hint label
|
||||
but never actually runs `docker network connect`), then
|
||||
force-recreates ONLY the public-facing container so the new
|
||||
env+label apply, and restarts the proxy so Traefik
|
||||
re-discovers.
|
||||
|
||||
The response shape gains:
|
||||
- `reachable` — boolean, true when `https://<fqdn>` answers 2xx/3xx
|
||||
- `appStatus` — the truthful per-application status from Coolify
|
||||
- `postDeploy` — step-by-step diagnostic for each of the three fixes
|
||||
The previous `started`/`startMethod`/`startDiag` fields are kept for
|
||||
back-compat. Internal services (Postgres, Redis, worker) stay on
|
||||
their isolated project network — fixing the `password authentication
|
||||
failed` regression introduced in 2.4.4.
|
||||
- **2.4.6** — Two fixes for transient Coolify queue lag observed in
|
||||
2.4.5:
|
||||
- **Polling no longer false-fails on early `exited` status.**
|
||||
Coolify's queue worker can take 60-120s to dequeue a `start`
|
||||
request; during that window `service.applications[*].status`
|
||||
returns the stale `exited` (= "never started") state. Previously
|
||||
we treated that as terminal failure after 90s. Now we require
|
||||
*evidence of activity* (`starting:*` or `running:*` was seen at
|
||||
least once) before treating subsequent `exited` reports as
|
||||
terminal. Until activity is observed, the loop just keeps polling
|
||||
up to the 8-min health timeout. Eliminates the case where
|
||||
`apps.create` returned `started: false` on a stack that was
|
||||
actually about to come up healthy.
|
||||
- **`apps.repair`** — new tool. Re-runs the three post-deploy
|
||||
patches (env rewrite, port label, proxy network attach + recreate
|
||||
+ proxy restart) against an existing service without recreating
|
||||
it. Useful when a deploy succeeded mechanically but ended up
|
||||
serving Traefik 503 or Mixed Content errors, or whenever a user
|
||||
rotates a custom domain. Params: `{ uuid, fqdn, publicAppName,
|
||||
port? }`. Returns `{ reachable, postDeploy: { steps }, probe }`.
|
||||
- **2.4.7** — `applyCoolifyPostDeployFixes` now schedules the
|
||||
`coolify-proxy` restart (step 5) as a fire-and-forget background
|
||||
job (`(sleep 3 && docker restart coolify-proxy) &`) instead of
|
||||
blocking on it synchronously. The proxy restart kills any in-flight
|
||||
TCP connection through the gateway — including the very request
|
||||
that's running `apps.repair` / `apps.create` — so doing it inline
|
||||
caused the agent to see a curl framing error (exit 16) right when
|
||||
the work was in fact succeeding. Now the SSH command returns within
|
||||
~50ms, the HTTP response is delivered, and Traefik re-discovers
|
||||
labels ~3s later.
|
||||
- **2.4.8** — Massive simplification of post-deploy logic. Coolify's
|
||||
template engine is fully capable of generating correct Traefik
|
||||
labels and `SERVICE_FQDN_<APP>` / `SERVICE_URL_<APP>` env vars **if
|
||||
the URL passed to `setServiceDomains` includes the upstream port**
|
||||
(the "Required Port" hint in Coolify's UI: `https://crm.example.com:3000`,
|
||||
not `https://crm.example.com`). 2.4.5–2.4.7 were missing that
|
||||
detail, which is why they had to re-write the `.env` and inject
|
||||
the loadbalancer port label as a workaround.
|
||||
|
||||
In 2.4.8 `apps.create` reads `template.port` from the catalog and
|
||||
passes `https://<fqdn>:<port>` to `setServiceDomains`. Coolify then:
|
||||
- generates `traefik.http.services.<svc>.loadbalancer.server.port=<port>`
|
||||
automatically;
|
||||
- rewrites `.env` so `SERVICE_FQDN_<APP>=<fqdn>` and
|
||||
`SERVICE_URL_<APP>=https://<fqdn>` (no sslip.io leak);
|
||||
- keeps `SERVICE_FQDN_<APP>_<PORT>` magic placeholders correctly
|
||||
pointed at the user's host:port.
|
||||
|
||||
All that's left is the one thing Coolify still skips: connecting
|
||||
`coolify-proxy` to the resource's project Docker network. So
|
||||
`applyCoolifyPostDeployFixes` is now ~30 lines (down from ~200) and
|
||||
no longer SSH-runs an embedded Python script inside a
|
||||
`python:3-alpine` container. The `CoolifyPostDeployResult.steps`
|
||||
shape gains/keeps `proxyNetwork` + `proxyRestart` only; the old
|
||||
`envRewrite` / `portLabel` / `recreate` step keys are removed.
|
||||
`apps.repair` retains its API (`{ uuid, fqdn, publicAppName, port? }`)
|
||||
but `port` is now informational only (not required for the helper
|
||||
to function).
|
||||
|
||||
---
|
||||
|
||||
## 11. Troubleshooting compose apps
|
||||
|
||||
Most real-world app failures fall into a small number of patterns. The
|
||||
recipes below are the canonical diagnostic flow for an agent operating
|
||||
on behalf of a user.
|
||||
|
||||
### 11.1 "Deployment succeeds but the app keeps restarting"
|
||||
|
||||
Agents should NOT trust Coolify's deployment status alone. A successful
|
||||
build + healthcheck-pending response usually means the containers came
|
||||
up but the app logic is crashing. Investigate with:
|
||||
|
||||
1. `apps.logs { uuid, lines: 300 }` — look for `warnings` (empty
|
||||
services indicate containers never ran) and per-service stderr.
|
||||
2. If the logs show repeated DB errors like `relation "xxx" does not
|
||||
exist` or `pq: no such table`, the app skipped its migration step.
|
||||
This is common for Docker Compose apps whose `server` service only
|
||||
runs migrations on a separate `worker` command.
|
||||
3. Run the app's migration CLI via `apps.exec`, e.g. for Twenty:
|
||||
|
||||
```json
|
||||
{
|
||||
"action": "apps.exec",
|
||||
"params": {
|
||||
"uuid": "<app-uuid>",
|
||||
"service": "server",
|
||||
"command": "yarn command:prod database:migrate:prod",
|
||||
"timeout_ms": 300000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
4. Re-check logs — errors should be gone. Then `apps.deploy` (or just
|
||||
wait for the next restart) and verify the container reports
|
||||
`healthy`.
|
||||
|
||||
### 11.2 "`apps.update` returned success but nothing changed"
|
||||
|
||||
Check the `applied` / `ignored` / `rerouted` arrays in the response.
|
||||
The most common reroutes:
|
||||
|
||||
- `fqdn`, `domains`, `docker_compose_domains` → use `apps.domains.set`.
|
||||
- `git_repository` → use `apps.rewire_git` (rewrites the clone URL with
|
||||
the workspace's Gitea PAT embedded).
|
||||
- `build_pack` — changing this mid-life for an existing app is not
|
||||
supported. Recreate the app.
|
||||
|
||||
### 11.3 "Compose app is up but the domain 502s"
|
||||
|
||||
Coolify's API treats compose and single-container apps differently:
|
||||
compose apps use `docker_compose_domains` (array of `{name, domain}`),
|
||||
single-container apps use `domains` (comma-separated string).
|
||||
`apps.domains.set` handles both, but if you're seeing a 502:
|
||||
|
||||
1. `apps.domains.list { uuid }` — confirm the domain is actually
|
||||
attached to a **service** (not just the app).
|
||||
2. `apps.exec { uuid, service: "server", command: "nc -vz localhost <port>" }`
|
||||
— verify the upstream container is listening.
|
||||
3. `apps.logs { uuid, service: "server", lines: 200 }` — look for
|
||||
startup errors like `EADDRINUSE` or config failures.
|
||||
|
||||
### 11.4 "Choosing the right `apps.create` pathway"
|
||||
|
||||
| Situation | Use |
|
||||
|---|---|
|
||||
| User's own code lives in their Gitea org | `repo` (pathway 1) |
|
||||
| Single-container third-party app (nginx, redis, a docker image) | `image` (pathway 2) |
|
||||
| Custom multi-service stack (no upstream template exists) | `composeRaw` (pathway 3) |
|
||||
| **Popular third-party app (Twenty, n8n, Supabase, Ghost, Wordpress, …)** | **`template` (pathway 4) — strongly preferred** |
|
||||
|
||||
**Always check `apps.templates.search { query: "<app name>" }` first.** Coolify ships 320+ vetted one-click templates. Each one has tested env defaults, healthchecks, `depends_on` graphs, and the right volume mounts. The same app deployed via `composeRaw` will hit application-specific quirks (URL validation, DB bootstrap order, secret generation) that the template author already solved.
|
||||
|
||||
**Never** create a Gitea repo just to host a third-party app's compose file.
|
||||
|
||||
**Recipe — deploying any popular app in 3 calls:**
|
||||
|
||||
```json
|
||||
// 1. Find the right template slug
|
||||
{ "action": "apps.templates.search", "params": { "query": "twenty" } }
|
||||
// → { "items": [{ "slug": "twenty", "slogan": "Twenty is a CRM…", "tags": ["crm","self-hosted"], "port": 3000 }] }
|
||||
|
||||
// 2. Deploy it
|
||||
{ "action": "apps.create", "params": { "template": "twenty", "name": "crm" } }
|
||||
// → { "uuid": "...", "domain": "crm.<slug>.vibnai.com", "started": true,
|
||||
// "note": "First boot may take 1-5 min while Coolify pulls images and runs migrations." }
|
||||
|
||||
// 3. Watch it come up
|
||||
{ "action": "apps.logs", "params": { "uuid": "...", "lines": 200 } }
|
||||
```
|
||||
|
||||
For `composeRaw` (only when no template exists), fetch the app's official `docker-compose.yml` (from GitHub/DockerHub) and pass it inline. Override any hard-coded image tags with pinned versions for reproducibility.
|
||||
|
||||
**Browsing the catalog** with `apps.templates.list { tag: "ai" }` returns all AI/ML templates; `{ tag: "crm" }` returns CRMs; etc. Useful when the user asks "what self-hosted analytics tools can I deploy?" or similar open-ended questions.
|
||||
|
||||
### 11.5 "Compose app fails on second+ deploy — relation/table does not exist"
|
||||
|
||||
Classic stale volume problem. Sequence of events:
|
||||
1. First deploy: Postgres starts and auto-creates an empty `default` database (from `POSTGRES_DB` env var)
|
||||
2. App server starts, tries to `CREATE DATABASE` or `DROP DATABASE` inside a transaction → Postgres rejects it
|
||||
3. Deploy fails, containers stop — but the volume persists with the half-initialized DB
|
||||
4. Second deploy: Postgres finds existing data, skips init — but schema is corrupt/incomplete
|
||||
5. Server errors cascade forever
|
||||
|
||||
**Fix:**
|
||||
|
||||
```json
|
||||
// Step 1: find the volume
|
||||
{ "action": "apps.volumes.list", "params": { "uuid": "<app-uuid>" } }
|
||||
// → { "volumes": [{ "name": "abc123_db-data", "sizeBytes": 8192 }] }
|
||||
|
||||
// Step 2: wipe it
|
||||
{ "action": "apps.volumes.wipe", "params": { "uuid": "<app-uuid>", "volume": "abc123_db-data", "confirm": "abc123_db-data" } }
|
||||
|
||||
// Step 3: redeploy clean
|
||||
{ "action": "apps.deploy", "params": { "uuid": "<app-uuid>" } }
|
||||
```
|
||||
|
||||
If Postgres still auto-creates the database before the app server runs migrations, use `apps.exec` to drop it outside a transaction:
|
||||
|
||||
```json
|
||||
{ "action": "apps.exec", "params": { "uuid": "<app-uuid>", "service": "db", "command": "psql -U postgres -c 'DROP DATABASE IF EXISTS \"default\";'" } }
|
||||
```
|
||||
|
||||
Then redeploy.
|
||||
|
||||
### 11.7 "Healthcheck times out on first deploy"
|
||||
|
||||
Docker Compose healthchecks have a `start_period` grace window. Apps
|
||||
that run long-running migrations on first boot (Twenty, Directus,
|
||||
older Strapi versions) need a `start_period` that covers the cold
|
||||
start, typically 120–600s.
|
||||
|
||||
- Fix at the compose level: edit the repo's `docker-compose.yml` to
|
||||
set `healthcheck.start_period: 300s` on the affected service, commit,
|
||||
push, `apps.deploy`.
|
||||
- Alternatively, handle migrations out-of-band via `apps.exec` and let
|
||||
the default healthcheck succeed instantly.
|
||||
|
||||
### 11.8 "I can't tell what's inside the container"
|
||||
|
||||
`apps.exec` is the escape hatch. Useful shell one-liners:
|
||||
|
||||
| Goal | Command |
|
||||
|---|---|
|
||||
| List running processes | `ps -ef` |
|
||||
| Show env vars | `env \| sort` |
|
||||
| Check file exists | `ls -la /path/to/file` |
|
||||
| Test DB connection | `nc -vz postgres 5432` or `psql $POSTGRES_URL -c 'select 1'` |
|
||||
| Tail an app's internal log | `tail -200 /var/log/app.log` |
|
||||
| Run a framework CLI | `yarn <script>`, `npm run <script>`, `python manage.py <cmd>` |
|
||||
| Inspect filesystem diff vs image | `find /app -newer /tmp/marker -type f 2>/dev/null` |
|
||||
|
||||
Output is capped at 1 MB by default (bump with `max_bytes`). Commands
|
||||
that could exceed the wall-clock timeout should bump `timeout_ms`
|
||||
(max 600000 = 10 minutes).
|
||||
|
||||
### 11.9 "The agent wants to run something interactively"
|
||||
|
||||
It can't. `apps.exec` is strictly non-interactive: no TTY, no stdin,
|
||||
no session resumption. For migrations and CLI invocations this is the
|
||||
right shape. For genuinely interactive work (a debug shell), the
|
||||
operator needs SSH + `docker exec -it` directly — outside the
|
||||
platform's AI surface.
|
||||
|
||||
---
|
||||
|
||||
## 12. Where to look in the code
|
||||
|
||||
- `lib/auth/workspace-auth.ts` — `requireWorkspacePrincipal`, the gate.
|
||||
- `lib/auth/secret-box.ts` — AES-256-GCM encryption of Gitea PATs.
|
||||
- `lib/workspaces.ts` — `ensureWorkspaceProvisioned` (the idempotent setup).
|
||||
- `lib/gitea.ts` — Gitea client (orgs, users, PATs, SSH keys).
|
||||
- `lib/coolify.ts` — Coolify client, tenant helpers, all resource CRUD.
|
||||
- `lib/coolify-ssh.ts` — SSH transport for tools that need host-level
|
||||
docker access (`apps.logs`, `apps.exec`). Uses a dedicated
|
||||
`vibn-logs` user on the Coolify host with docker-group membership
|
||||
and no shell.
|
||||
- `lib/coolify-containers.ts` — container enumeration + service
|
||||
resolution, shared between logs and exec paths.
|
||||
- `lib/coolify-logs.ts` — compose-aware log tailing.
|
||||
- `lib/coolify-exec.ts` — one-shot `docker exec` over SSH with
|
||||
timeout, output caps, and audit logging.
|
||||
- `lib/naming.ts` — domain policy, slugify, SSH URL templates.
|
||||
- `lib/ssh-keys.ts` — ed25519 keypair generation + OpenSSH formatting.
|
||||
- `app/api/workspaces/[slug]/…` — REST surface.
|
||||
- `app/api/mcp/route.ts` — MCP dispatcher and tool implementations.
|
||||
- `components/workspace/WorkspaceKeysPanel.tsx` — settings UI.
|
||||
# Vibn AI Capabilities (Condensed)
|
||||
|
||||
> **Note:** The definitive, ground-truth list of AI capabilities and instructions is maintained in the codebase at `vibn-frontend/lib/ai/vibn-tools.ts`.
|
||||
|
||||
## Core Architecture
|
||||
Vibn uses an MCP (Model Context Protocol) adapter to expose backend systems to the AI.
|
||||
The primary systems are:
|
||||
1. **Coolify:** For orchestrating Docker containers, PostgreSQL databases, reverse proxies (Traefik), and deploying third party apps.
|
||||
2. **Gitea:** For hosting source code and managing repositories.
|
||||
3. **Dev Containers:** Persistent, per-project Docker environments (`vibn-dev-*`) where the AI can read, write, and execute code interactively before shipping.
|
||||
|
||||
## Tool Categories
|
||||
- **Workspace & Identity:** Retrieve Gitea credentials and workspace metadata.
|
||||
- **Projects & Planning:** Create projects, read/write objective documents (`plan_vision_set`), manage tasks, log decisions.
|
||||
- **File System (`fs_*`):** Read, write, edit (with line-number granularity), grep, and tree codebase directories.
|
||||
- **Shell (`shell_exec`):** Run terminal commands inside the dev container (e.g. `npm install`).
|
||||
- **Dev Servers (`dev_server_*`):** Spin up background processes (like `npm run dev`), view their logs, and return live Preview URLs (`*.preview.vibnai.com`) backed by Traefik.
|
||||
- **Apps & Databases:** Create, list, configure, and delete Coolify applications and databases.
|
||||
- **Domains & Auth:** Manage DNS records via OpenSRS and deploy auth providers (NextAuth, Supabase, etc).
|
||||
- **GitHub & Web (`github_*`, `http_fetch`):** Source open-source reference material, read documentation, and import repositories.
|
||||
|
||||
*Refer to the system prompt in `vibn-frontend/app/api/chat/route.ts` for exact rules on how the AI should behave.*
|
||||
|
||||
@@ -73,14 +73,6 @@ a slow loop until this lands.
|
||||
|
||||
| # | Task | Owner | Effort | Status |
|
||||
|---|---|---|---|---|
|
||||
| 1.1 | Sign up for Cloudflare; add `vibnai.com`; verify imported records (MX, SPF, wildcard A, apex A) | Mark | 15 min | ✓ done |
|
||||
| 1.2 | Switch Namecheap nameservers to Cloudflare-assigned NS pair | Mark | 2 min | ✓ done |
|
||||
| 1.3 | Wait for propagation; verify `dig @1.1.1.1` from multiple resolvers | AI | 30–120 min | ✓ done — `34.19.250.135` from CF + Google resolvers |
|
||||
| 1.4 | Generate Cloudflare API token (DNS edit, `vibnai.com` only) | Mark | 2 min | ✓ done — stored in `.coolify.env` |
|
||||
| 1.5 | Configure Traefik Let's Encrypt DNS-01 with the Cloudflare token | AI | 20 min | ✓ done — `letsencrypt-dns` resolver wired in `coolify-proxy` |
|
||||
| 1.6 | Test wildcard cert issues for `*.preview.vibnai.com` (curl, browser) | AI | 10 min | ✓ done — both `*.vibnai.com` and `*.preview.vibnai.com` certs issued; `curl https://test.preview.vibnai.com` returns valid LE cert |
|
||||
| 1.7 | Wire `dev_server.start` to mint Traefik labels with the wildcard host | AI | 1 hr | ✓ done — pre-baked labels for ports 3000–3009 in `vibn-dev` compose; YAML escape bug fixed; cert resolver fixed to `letsencrypt-dns` |
|
||||
| 1.8 | Spike: WebSocket / Vite HMR through Traefik against `vibn-dev` container | AI | 30 min | ✓ done — `101 Switching Protocols`, `vite-hmr` subprotocol negotiated, `js-update` messages fire within ~1s of file edit. See verified config below. |
|
||||
|
||||
**Definition of done:** ✅ AI says "open a Vite dev server", user clicks the URL,
|
||||
sees Vite's welcome page, edits a file via `fs.edit`, change appears in
|
||||
@@ -111,13 +103,6 @@ server: {
|
||||
|---|---|---|---|---|
|
||||
| 2.1 | Reproduce + diagnose `ERR_HTTP_HEADERS_SENT` from prod logs | AI | 1–2 hrs | Likely a server action / API route returning twice |
|
||||
| 2.2 | Reproduce + diagnose `TypeError: reading 'z'/'j'/'aa'` in prod bundle | AI | 1–2 hrs | Minified prod error; suspect `react-markdown` server/client boundary |
|
||||
| 2.3 | Wire Sentry (or alternative) for both client + server runtime errors | AI | ✓ done 2026-05-01 | `@sentry/nextjs` v10 wired in `vibn-frontend`. `instrumentation.ts` (server+edge), `instrumentation-client.ts` (browser w/ Session Replay free tier, all text masked), `app/global-error.tsx`, `next.config.ts` wrapped with `withSentryConfig`. `NEXT_PUBLIC_SENTRY_DSN` and `SENTRY_AUTH_TOKEN` in Coolify env, with matching `ARG` lines in `vibn-frontend/Dockerfile`. End-to-end verified via `/sentry-example-page` 2026-05-01: client + server errors capture, breadcrumbs work, **stack traces de-minify to real filenames** (`app/sentry-example-page/page.tsx:49`). |
|
||||
| 2.4 | Wire deployment-failed Coolify webhook → Slack/email | AI | ✓ done 2026-05-01 | Slack webhook wired into `slack_notification_settings` for both Coolify teams. Defaults: failure events on (deploy, backup, scheduled task, docker cleanup, server unreachable, disk usage), success events off. Tested with a manual webhook ping — confirmed in user's Slack. |
|
||||
| 2.5 | Tighten Coolify docker prune to every 6 hrs (vs daily) | AI | ✓ done 2026-05-01 | Already configured: both servers use `docker_cleanup_frequency: "0 */6 * * *"` with `force_docker_cleanup: true`. Verified via `/api/v1/servers`. |
|
||||
| 2.6 | Bake `HEALTHCHECK 127.0.0.1` into `vibn-frontend/Dockerfile` so future apps inherit | AI | ✓ done 2026-05-01 | Already in `vibn-frontend/Dockerfile:67-68`; comment explains the IPv6 trap |
|
||||
| 2.7 | Audit other Dockerfile-based apps for the same `localhost`/IPv6 trap | AI | ✓ done 2026-05-01 | Audited `vibn-dev/Dockerfile` and `vibn-agent-runner/Dockerfile` — neither defines a HEALTHCHECK, so neither can hit the localhost/IPv6 trap. No action needed today; revisit when either gets a healthcheck added. |
|
||||
| 2.8 | **Tool-error recovery middleware** (AI_HARNESS_GAPS.md §1) — pattern-match known-recoverable tool errors and inject synthetic instructions before the model's next round | AI | ✓ done 2026-05-01 | `vibn-frontend/lib/ai/error-recovery.ts`. Initial rules: orphan container conflict, image pull denied, port allocated. Wired into `app/api/chat/route.ts` tool-result loop. |
|
||||
| 2.9 | **Sentry-as-product loop** (SENTRY_AS_PRODUCT.md) — auto-provision per-project Sentry, bake into scaffolds, expose error feed to AI as MCP tools, auto-surface unresolved errors at chat-turn start | AI | ✓ done 2026-05-01 | All 4 stages shipped: (1) `lib/integrations/sentry.ts` provisions per-project Sentry under shared `vibnai` org from `POST /api/projects/create` and lazily on `apps.create`; injects `NEXT_PUBLIC_SENTRY_DSN` + `SENTRY_AUTH_TOKEN` into Coolify app env. (2) `lib/scaffold/sentry-snippets.ts` ships canonical Next.js + Vite snippets; AI system prompt instructs it to wire Sentry on every new app; `projects.get` returns `sentry: {slug, dsn}`. (3) Three MCP tools: `project_recent_errors`, `project_error_detail`, `project_error_resolve` (tenant-safe). (4) `app/api/chat/route.ts` injects `[PROJECT HEALTH]` block at chat-turn start when ≥2-occurrence unresolved issues exist in last 6h. End-to-end verification deferred to smoke test (4.1). |
|
||||
|
||||
**Definition of done:** force-fail a route in staging → Sentry alert lands in
|
||||
< 1 min. Force-fail a Coolify deploy → notification fires. Reproduce an
|
||||
@@ -136,13 +121,9 @@ or gets out of the way. No screens that exist "to teach the data model".
|
||||
| 3.1 | **Hosting tab rewrite** — focus on the domain (live URL, redeploy, env, logs) instead of master-detail of "live + previews" | AI | 4 hrs | Mark flagged earlier |
|
||||
| 3.2 | Replace the chat's "⚠️ Failed to get response. Please try again." with structured errors that show what tool failed and why | AI | 2 hrs | Critical — currently zero feedback |
|
||||
| 3.3 | Empty states across Plan/Product/Infrastructure/Hosting that suggest the **next** AI prompt to try (not just "nothing here") | AI | 2 hrs | Vibe coders need a nudge |
|
||||
| 3.4 | Project header URL chips: collapse to a "+N" pill when there are >3 endpoints | AI | ✓ done 2026-05-01 | `components/project/project-header-urls.tsx`: bumped MAX_VISIBLE to 3, replaced title-tooltip with click-to-open popover (closes on outside-click + Escape). Each row in the popover is a real clickable link with icon + label + host. |
|
||||
| 3.5 | Status pill: tooltip should link directly to Coolify build logs | AI | ✓ done 2026-05-01 | `components/project/project-stage-pill.tsx`: "Logs" affordance now appears on `deploying`, `down`, and `build_failed` (not just failures). Deep-links to `<COOLIFY_URL>/project/<coolifyProjectUuid>` — one click from build logs. (Direct deployment-uuid link blocked on extending anatomy to surface deployment UUIDs; tracked but low priority.) |
|
||||
| 3.6 | Product tab: confirm it's actually useful day-to-day. Revise scope if not | Mark + AI | 1 hr | Open question |
|
||||
| 3.7 | **Scope-doc upload in Plan tab** — drop a PDF/.md/.docx/.txt as the project brief; server extracts text, stores on `fs_projects.brief_text` + `brief_meta`, exposes via `[PROJECT BRIEF]` block in system prompt and a `project_brief` MCP tool for on-demand grep. New file: `lib/integrations/brief-extract.ts`. Empty state replaces "nothing here" on Plan. | AI | 3 hrs | Came up during smoke test prep — users will arrive with scope docs (PDF/Notion-export/Doc); right now there's no way to hand the AI the source of truth except paste-into-chat. |
|
||||
| 3.8 | **"Stop at something tangible" — three layers** | AI | partially done | Came up watching Manifest scaffold — AI stopped at "everything is wired together" with no preview, leaving the user to wonder if any of it was real. Code on disk is invisible; preview URL is the proof. |
|
||||
| 3.8a | System-prompt rule: dedicated "Stop at something the user can see" section + tightened build-me-X recipe so `previewUrl` is the explicit stopping point | AI | ✓ done 2026-05-04 | `app/api/chat/route.ts` `buildSystemPrompt`. For multi-service stacks, instructs AI to start the user-facing service first even if other services aren't done. |
|
||||
| 3.8b | ~~Persistent quick-action chips above the chat input~~ **REVERTED 2026-05-04** | AI | reverted | Tried it; pulled it. The chip menu was prescriptive ("here's what to type") which conflicts with the principle that the AI should drive toward the goal without presenting the user a menu of homework. Welcome-screen suggested prompts kept (different context — empty conversation, user genuinely needs a starting nudge). The `sendMessage(override)` refactor + welcome-screen auto-send shipped from this work survived; only the composer chip row was removed. |
|
||||
| 3.8c | Server-side enforcement: if a turn called `fs_write` ≥10 times for source files but never `dev_server_start` or `apps_deploy`, append a synthetic recovery instruction telling the model to either start a server or explain the blocker | AI | 1 hr | Safety net for when the model ignores the prompt rule under load. Add a tracker in `app/api/chat/route.ts` tool loop, fire the instruction inside the round 2 system message. |
|
||||
|
||||
**Definition of done:** a stranger lands on every tab in turn. None of them
|
||||
@@ -160,10 +141,8 @@ concrete next action.
|
||||
|---|---|---|---|---|
|
||||
| 4.1 | End-to-end smoke test on a fresh account: signup → workspace → project → first chat → first preview → first deploy | Mark + AI | 2 hrs | Walk through with an empty cookie jar; fix everything broken. **Runbook below.** |
|
||||
| 4.2 | Landing page at `vibnai.com` that explains the product in 30s | Mark + AI | 4 hrs | Currently a login screen |
|
||||
| 4.3 | "Delete project" UI in project settings (and underlying Coolify cleanup) | AI | ✓ done 2026-05-04 | `app/api/projects/delete/route.ts` now cascades: stops + deletes the dev container service (with volumes + docker-cleanup), deletes every linked Coolify resource via `fs_project_resources`, deletes the per-project Coolify project shell when no other Vibn project shares it, drops `fs_project_dev_containers` + `fs_project_resources` rows, unlinks `fs_sessions`, then deletes `fs_projects`. Gitea repo + Sentry project are deliberately preserved (returned in the response so the user can recover code/error history). Failure inside cascade is logged but doesn't abort; partial failure leaves the orphan in Coolify for manual cleanup, which is strictly better than rolling back to a half-state. Smoke test 2026-05-04 found 2 ghost containers from previously-deleted projects consuming the user's full quota; cleaned up manually + shipped this fix to prevent recurrence. |
|
||||
| 4.4 | "Delete workspace" UI — same | AI | 1 hr | |
|
||||
| 4.5 | Auth hardening pass: NextAuth session expiry, CSRF on mutating routes, GitHub OAuth scope review | AI | 2 hrs | |
|
||||
| 4.6 | Per-workspace compute quota: max N Coolify projects, max N dev containers, soft cap with friendly error | AI | ✓ done 2026-05-01 | `lib/quotas.ts`: 3 active projects + 3 active dev containers per workspace (suspended containers don't count). Overridable via `VIBN_QUOTA_MAX_PROJECTS_PER_WORKSPACE` / `VIBN_QUOTA_MAX_DEV_CONTAINERS_PER_WORKSPACE` env. Hits return HTTP 402 with structured payload; AI's error-recovery middleware has a `workspace-quota-exceeded` rule that explains the cap to the user without blind retries. Wired into `POST /api/projects/create` and `lib/dev-container.ts` ensure/resume paths. |
|
||||
| 4.7 | Per-workspace audit log of mutating MCP calls (apps/databases/services create/delete) | AI | 2 hrs | We need this when something goes wrong |
|
||||
| 4.8 | Invite link / waitlist page (manual approval) so we control who joins | Mark + AI | 1 hr | |
|
||||
|
||||
@@ -179,13 +158,11 @@ that aren't covered above.
|
||||
|
||||
| # | Task | Owner | Effort | Notes |
|
||||
|---|---|---|---|---|
|
||||
| 5.1 | Build `ghcr.io/vibnai/vibn-dev:latest` on the live Coolify host (`ssh + setup-on-coolify.sh`) | AI | ✓ done 2026-05-01 | Image `vibn-dev:latest` built 2026-04-30 on Coolify host (589 MB, last Dockerfile change Apr 28 so build is current). Smoke-tested as `vibn` user: ripgrep, git, mise all functional. Toolchains install on demand via mise. |
|
||||
| 5.2 | Hard-remove `gitea_file_*` from the AI tool list; keep REST routes alive 30 days with deprecation header | AI | 1 hr | Path B week 3 task |
|
||||
| 5.3 | Update `AI_CAPABILITIES.md` to reflect everything that shipped | AI | 1 hr | |
|
||||
| 5.4 | Eval harness: 10 reference prompts, measure time-to-first-preview, time-to-shipped, tool-call count, success rate | AI | 1–2 days | The actual proof Path B works |
|
||||
| 5.5 | Theia / openvscode-server toggle: "Open IDE" button in chat → `https://ide-{ws}-{project}.vibnai.com` | AI | 4 hrs | Week 4 nice-to-have; gates the "user becomes developer" graduation |
|
||||
| 5.6 | Idle-suspend cron — wire `POST /api/admin/path-b/idle-sweep` to a 5-min schedule once we trust it | AI | 30 min | Keeps cost bounded |
|
||||
| 5.7 | **Persistent dev container ↔ Gitea wiring** — auto-clone repo into `/workspace/<slug>/` on first chat turn; auto-commit + push at end of every turn so AI work surfaces in the Product tab without manual `gitea_*` calls | AI | ✓ done 2026-05-04 | `lib/dev-container-git.ts` (`ensureProjectRepoCloned`, `commitAndPushIfDirty`) wired into `app/api/chat/route.ts` pre-loop + turn-end. Tri-state probe (`git` / `dir` / `absent`) so projects with files-but-no-git auto-heal on next turn. Production fix shipped today: `GITEA_USERNAME` was missing from prod env so `isGiteaConfigured()` silently no-op'd; added the env value AND a defensive fallback to `GITEA_ADMIN_USER` in code. Backfilled `vibn-mark/manifest` repo manually from the dev container after the env fix. Smoke-tested by inspecting `/workspace/manifest/` over SSH bridge — 64 tracked files pushed, all 6 phase directories present. |
|
||||
|
||||
**Definition of done:** eval harness reports ≥3× speedup on time-to-first-preview
|
||||
vs. Path A baseline, ≥80% success rate across the 10 reference prompts.
|
||||
|
||||
@@ -1,292 +1,5 @@
|
||||
# Agent telemetry & live execution stream — project spec
|
||||
# Agent Telemetry Streaming (Historical)
|
||||
|
||||
This document captures **concrete product and engineering additions** discussed for Vibn: moving from **poll-based session updates** and **in-memory jobs** to a **durable, ordered, push-friendly execution timeline**—the web equivalent of a terminal agent’s clarity (step-by-step visibility, tool boundaries, failures, and later multi-agent signals).
|
||||
> **Note:** This historical spec covered the implementation of real-time streaming for the AI agent loop (Server-Sent Events) and timeline rendering.
|
||||
|
||||
---
|
||||
|
||||
## 1. Why this exists
|
||||
|
||||
### Current behavior (baseline)
|
||||
|
||||
| Surface | How progress reaches the user | Limits |
|
||||
|--------|------------------------------|--------|
|
||||
| **Agent sessions** (`agent_sessions`) | Runner `PATCH`es `output`, `status`, `changed_files` to Next; UI **polls** `GET …/agent/sessions/[id]`. | Latency, reconnect story, no single ordered stream; rich semantics encoded only in `text`. |
|
||||
| **Jobs** (`/api/agent/run`, `/api/jobs/:id`) | In-memory `job-store` (`progress`, `toolCalls[]`); UI polls job endpoint. | Lost on restart; not shared across runner replicas; not unified with session UI. |
|
||||
| **Orchestrator / Atlas chat** | Request/response to runner; advisor path may be remote URL. | No execution timeline for “long COO run” in-product unless you add the same event layer. |
|
||||
|
||||
### Product intent
|
||||
|
||||
- **Trust during long runs**: users see *what* happened, *when*, and *whether something was blocked*—not only a final status.
|
||||
- **Differentiation**: “Ink-like” clarity in the browser—structured steps, not a blob of logs.
|
||||
- **Foundation for multi-agent**: handoffs, child work, and safety events need a **common event pipe**, not ad-hoc strings.
|
||||
|
||||
---
|
||||
|
||||
## 2. Goals
|
||||
|
||||
1. **Append-only execution events** with **monotonic ordering** (per session or per job), suitable for replay after refresh.
|
||||
2. **Server-push to the client** (recommend **SSE** first; WebSocket if you need bi-directional on the same channel).
|
||||
3. **Persistence** so reconnect, refresh, and horizontal scaling do not lose history.
|
||||
4. **Single conceptual model** (`AgentEvent`) usable by:
|
||||
- Build → **Agent** tab (sessions),
|
||||
- **Job** flows (create/analyze-style),
|
||||
- optionally **orchestrator** long runs later.
|
||||
5. **Backward compatibility** during rollout: existing `PATCH` + `output` can remain as a fallback or be fed from the same emitter.
|
||||
|
||||
### Non-goals (for v1)
|
||||
|
||||
- Full **OpenTelemetry** export (optional later).
|
||||
- **Real-time collaborative** multi-user cursors on the same session.
|
||||
- Merging **claude-code-fork**—this spec is **API + UI + persistence** only.
|
||||
|
||||
---
|
||||
|
||||
## 3. Concept: `AgentEvent`
|
||||
|
||||
### Core shape (suggested)
|
||||
|
||||
```ts
|
||||
type AgentEvent = {
|
||||
seq: number; // monotonic per stream (session_id or job_id)
|
||||
ts: string; // ISO-8601
|
||||
runId: string; // session UUID or job id — ties events to a run
|
||||
runKind: 'session' | 'job';
|
||||
phase: 'queued' | 'running' | 'completed' | 'failed' | 'stopped';
|
||||
|
||||
type: AgentEventType;
|
||||
payload: Record<string, unknown>; // type-specific
|
||||
};
|
||||
|
||||
type AgentEventType =
|
||||
| 'run.started'
|
||||
| 'run.phase' // e.g. planning, executing, committing
|
||||
| 'llm.turn.start'
|
||||
| 'llm.turn.end'
|
||||
| 'tool.start'
|
||||
| 'tool.end'
|
||||
| 'tool.output' // chunked stdout/stderr if needed
|
||||
| 'safety.block' // policy / protected path / command denied
|
||||
| 'file.changed' // maps to today’s changed_files semantics
|
||||
| 'git.commit'
|
||||
| 'deploy.triggered'
|
||||
| 'deploy.status'
|
||||
| 'error'
|
||||
| 'run.completed'
|
||||
| 'handoff' // v2: parent → child agent
|
||||
| 'child_job.started' // v2: linked run id
|
||||
;
|
||||
```
|
||||
|
||||
### Mapping from today’s session `outputLine`
|
||||
|
||||
| Today (`outputLine.type`) | Suggested event(s) |
|
||||
|---------------------------|--------------------|
|
||||
| `step` / `info` | `run.phase` or `llm.turn.*` with summary in `payload.message` |
|
||||
| `stdout` / `stderr` | `tool.output` or dedicated stream events |
|
||||
| `error` | `error` + optional `safety.block` if policy-driven |
|
||||
| `done` | `run.completed` |
|
||||
|
||||
Keep **human-readable `message`** on events for UI defaults; add **structured fields** (`tool`, `argsSummary`, `durationMs`) for timeline rendering and filters.
|
||||
|
||||
---
|
||||
|
||||
## 4. Architecture (high level)
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph runner [vibn-agent-runner]
|
||||
RA[runSessionAgent / runAgent]
|
||||
EMIT[emitAgentEvent]
|
||||
end
|
||||
subgraph api [vibn-frontend Next.js]
|
||||
ING[POST internal ingest or PATCH extend]
|
||||
DB[(Postgres agent_events)]
|
||||
SSE[SSE GET /api/.../stream]
|
||||
end
|
||||
subgraph browser [Browser]
|
||||
UI[Timeline + live log]
|
||||
end
|
||||
RA --> EMIT
|
||||
EMIT -->|HTTPS + secret or mTLS| ING
|
||||
ING --> DB
|
||||
UI -->|EventSource| SSE
|
||||
SSE --> DB
|
||||
```
|
||||
|
||||
**Principles**
|
||||
|
||||
- **Runner remains stateless** regarding “truth”: it emits events; **Next + DB** are the source of truth for the UI (matches today’s session model).
|
||||
- Alternatively, runner could expose **SSE directly**—usually worse for **auth**, **CORS**, and **one domain** for the product. Prefer **Next as SSE endpoint** reading from DB.
|
||||
|
||||
---
|
||||
|
||||
## 5. Backend: `vibn-agent-runner`
|
||||
|
||||
### 5.1 Emit from execution paths
|
||||
|
||||
| Location | Action |
|
||||
|----------|--------|
|
||||
| `agent-session-runner.ts` | Replace or supplement `patchSession` output-only updates with **`emitAgentEvent`** each turn / tool / error. |
|
||||
| `runAgent` / tool loop (`executeTool`) | Same emitter for **job** runs. |
|
||||
| `server.ts` `/agent/execute` | Emit `run.started` after 202; `run.completed` / `error` on exit. |
|
||||
| Security / blocked tools (`security.ts` or equivalent) | Emit `safety.block` with reason code (no secrets in payload). |
|
||||
|
||||
### 5.2 Transport runner → Next
|
||||
|
||||
**Option A (recommended):** extend existing **PATCH** or add **`POST /api/internal/agent-events`** (or per-session batch append):
|
||||
|
||||
- Headers: `x-agent-runner-secret` (same as today’s PATCH).
|
||||
- Body: single event or small batch `{ events: AgentEvent[] }` with server-assigned `seq` to avoid races.
|
||||
|
||||
**Option B:** Runner writes to **Redis/Postgres** directly—couples runner to DB credentials; only do if you already run runner inside the same trust zone with DB URL.
|
||||
|
||||
### 5.3 Jobs store
|
||||
|
||||
- **Short term:** continue in-memory for job metadata; **persist events** to Postgres keyed by `jobId`.
|
||||
- **Medium term:** optional **Redis** for job status + pub/sub to Next for low-latency SSE fanout (only if DB polling becomes a bottleneck).
|
||||
|
||||
---
|
||||
|
||||
## 6. Backend: `vibn-frontend` (Next.js)
|
||||
|
||||
### 6.1 Persistence
|
||||
|
||||
**New table (example): `agent_run_events`**
|
||||
|
||||
| Column | Notes |
|
||||
|--------|--------|
|
||||
| `id` | UUID |
|
||||
| `run_id` | Session id or job id (text) |
|
||||
| `run_kind` | `'session' \| 'job'` |
|
||||
| `seq` | BIGSERIAL or per-run sequence enforced with unique constraint `(run_id, seq)` |
|
||||
| `project_id` | Nullable for jobs if not scoped |
|
||||
| `event` | JSONB — full `AgentEvent` or `{ type, ts, payload }` |
|
||||
| `created_at` | default now() |
|
||||
|
||||
Index: `(run_id, seq)` for range queries (`WHERE run_id = $1 AND seq > $lastSeen`).
|
||||
|
||||
**Optional:** migrate legacy `agent_sessions.output` to be **derived** (last N lines for email export) or **dual-write** during transition.
|
||||
|
||||
### 6.2 SSE route (example contract)
|
||||
|
||||
- **`GET /api/projects/[projectId]/agent/sessions/[sessionId]/events/stream`**
|
||||
- Auth: session cookie / same as GET session (user must own project).
|
||||
- Query: `?afterSeq=123` for replay.
|
||||
- Response: `text/event-stream`; each message: `data: {JSON}\n\n`.
|
||||
- Heartbeat comments every ~15–30s to keep proxies alive.
|
||||
|
||||
For **jobs** (if not project-scoped): `GET /api/jobs/[jobId]/events/stream` with appropriate auth.
|
||||
|
||||
### 6.3 Ingest route (runner-only)
|
||||
|
||||
- **`POST /api/internal/agent-events`** (or nested under project/session as you prefer).
|
||||
- Validates `x-agent-runner-secret`.
|
||||
- Inserts rows with **server-generated `seq`** (transaction per run or advisory lock per `run_id`).
|
||||
|
||||
---
|
||||
|
||||
## 7. Frontend (product UI)
|
||||
|
||||
### 7.1 Agent tab — timeline
|
||||
|
||||
- **EventSource** (SSE) subscription when session is `running`; on load, **fetch historical** events (`GET …/events?afterSeq=0` or SSE from 0).
|
||||
- **Timeline components**:
|
||||
- Group by `llm.turn` / `tool.start`–`tool.end`.
|
||||
- Expandable tool args (sanitized).
|
||||
- Distinct styling for `safety.block` and `error`.
|
||||
- **Reconnect**: on `EventSource` error, reopen with `lastSeq` from last received event.
|
||||
|
||||
### 7.2 Jobs / analyze flows
|
||||
|
||||
- Same timeline component keyed by `jobId` if you surface those runs in UI.
|
||||
- Unifies mental model: “every run has a stream.”
|
||||
|
||||
### 7.3 Deprecate slow polling
|
||||
|
||||
- Reduce `GET …/agent/sessions/[id]` poll interval when SSE connected; keep **single poll** for `status` / `changed_files` if those stay on session row only, or **also** emit `file.changed` events and drive UI from stream + one final consistency read.
|
||||
|
||||
---
|
||||
|
||||
## 8. Security & privacy
|
||||
|
||||
- **Never** put tokens, env values, or full file contents in events by default; use **truncation** and **hashes** where needed.
|
||||
- **`safety.block`**: log reason **code** + user-safe message; align with `security.ts` behavior.
|
||||
- **Rate limits** on ingest endpoint (per `run_id` / per IP) to avoid abuse if misconfigured.
|
||||
|
||||
---
|
||||
|
||||
## 9. Environment variables
|
||||
|
||||
| Variable | Where | Purpose |
|
||||
|----------|--------|---------|
|
||||
| `AGENT_RUNNER_SECRET` | Runner + Next | Ingest / extended PATCH auth |
|
||||
| `VIBN_API_URL` | Runner | Base URL for callbacks |
|
||||
| `AGENT_RUNNER_URL` | Next | Start runs (unchanged) |
|
||||
|
||||
Add if needed:
|
||||
|
||||
| Variable | Purpose |
|
||||
|----------|---------|
|
||||
| `AGENT_EVENTS_INGEST_PATH` | Optional override for ingest URL |
|
||||
| `SSE_MAX_BUFFER` | Cap replay batch size |
|
||||
|
||||
---
|
||||
|
||||
## 10. Phased roadmap (suggested)
|
||||
|
||||
### Phase 1 — Foundation
|
||||
|
||||
- [ ] Define `AgentEvent` TypeScript types in a **shared package** or duplicated minimal types in runner + frontend.
|
||||
- [ ] Create `agent_run_events` (or equivalent) + migration.
|
||||
- [ ] Implement **ingest** endpoint; wire **runner session path** to emit core events: `run.started`, `tool.start` / `tool.end`, `error`, `run.completed`, `file.changed`.
|
||||
- [ ] **Dual-write**: keep existing `PATCH` `outputLine` so nothing breaks.
|
||||
|
||||
### Phase 2 — Push
|
||||
|
||||
- [ ] SSE route + **EventSource** in Agent tab.
|
||||
- [ ] Backfill UI from DB on mount; then live tail.
|
||||
- [ ] Lower or gate polling on `GET` session.
|
||||
|
||||
### Phase 3 — Jobs + durability
|
||||
|
||||
- [ ] Emit same events from **job** execution path; persist by `jobId`.
|
||||
- [ ] Optional: replace in-memory job list with DB for **multi-instance** runner (later).
|
||||
|
||||
### Phase 4 — Rich semantics
|
||||
|
||||
- [ ] `safety.block` from policy layer.
|
||||
- [ ] `deploy.*` events if Coolify integration is user-visible.
|
||||
- [ ] **Multi-agent**: `handoff`, `child_job.*` with links in payload.
|
||||
|
||||
---
|
||||
|
||||
## 11. Success metrics
|
||||
|
||||
- Time-to-first-visible-step after **Run** < **1s** p95 (SSE).
|
||||
- After hard refresh mid-run, user sees **consistent history** (no duplicate seq, no gaps if you guarantee at-least-once ingest with idempotency keys later).
|
||||
- Support tickets / confusion drops on “what is the agent doing?” (qualitative).
|
||||
|
||||
---
|
||||
|
||||
## 12. Related code (repo anchors)
|
||||
|
||||
Use these when implementing:
|
||||
|
||||
- Runner session loop + PATCH bridge: `vibn-agent-runner/src/agent-session-runner.ts`
|
||||
- Runner HTTP: `vibn-agent-runner/src/server.ts` (`/agent/execute`, `/agent/stop`, `/agent/approve`, `/api/agent/run`, `/api/jobs/:id`)
|
||||
- In-memory jobs: `vibn-agent-runner/src/job-store.ts`
|
||||
- Next session API + runner callback: `vibn-frontend/app/api/projects/[projectId]/agent/sessions/[sessionId]/route.ts`
|
||||
- Session create + fire-and-forget execute: `vibn-frontend/app/api/projects/[projectId]/agent/sessions/route.ts`
|
||||
|
||||
---
|
||||
|
||||
## 13. Open decisions
|
||||
|
||||
1. **Single table** for sessions + jobs vs **two tables** (simpler queries vs flexibility).
|
||||
2. **Seq generation**: DB sequence per `run_id` vs global monotonic with `(run_id, seq)` composite only in app logic.
|
||||
3. **Idempotency**: runner retries may duplicate events—use **`event_id` UUID** from runner for dedupe on ingest.
|
||||
4. **Orchestrator chat**: treat as v2 unless you need a **COO run** timeline immediately.
|
||||
|
||||
---
|
||||
|
||||
*Document version: 1.0 — aligned with discussion of runner ↔ frontend telemetry, SSE-first delivery, Postgres persistence, and future multi-agent event types.*
|
||||
The streaming system is fully implemented in `app/api/chat/route.ts` and rendered in the frontend via `Timeline`, `ThinkingBubble`, and `TimelineToolGroup` components inside `chat-panel.tsx`.
|
||||
|
||||
@@ -1,673 +1,5 @@
|
||||
# Vibn AI Capability Roadmap
|
||||
# AI Capabilities Roadmap (Historical)
|
||||
|
||||
> **⚠ See also:** [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md)
|
||||
> — proposed pivot to a Claude-Code-style persistent dev container per
|
||||
> project. Once approved, that doc supersedes any "code authoring" item
|
||||
> in this roadmap; this file remains the source of truth for
|
||||
> infrastructure primitives (P5.x, P6.x, P7.x).
|
||||
>
|
||||
> The ordered plan for closing the gap between what the Vibn agent can do
|
||||
> today and what it needs to do for a real customer to ship, operate, and
|
||||
> scale a SaaS through it.
|
||||
>
|
||||
> **Companion to:** [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (current state).
|
||||
>
|
||||
> **Prioritization framing:**
|
||||
> 1. Does it unblock *shipping a real product* (not a demo)?
|
||||
> 2. Does it unblock *surviving past the first paying customer*?
|
||||
> 3. Does it only matter once usage scales?
|
||||
>
|
||||
> Tier 1 = (1). Tier 2 = (2). Tier 3 = (3). Tier 4 = revisit when demanded.
|
||||
>
|
||||
> **Sequencing rule:** complete Tier 1 before any Tier 2 item. The trap
|
||||
> is polishing safety rails (audit, scopes, quotas) before the product is
|
||||
> actually shippable.
|
||||
> **Note:** This is a historical roadmap document. Most of the core Path B capabilities (persistent dev containers, Gitea mirroring, Traefik wildcard proxies) have been successfully shipped.
|
||||
|
||||
---
|
||||
|
||||
## 0. Substrate & constraints
|
||||
|
||||
Vibn runs on a two-cloud substrate, constrained to Canadian data residency:
|
||||
|
||||
| Layer | Provider | Region | Purpose |
|
||||
|---|---|---|---|
|
||||
| **App hosting** | Coolify (self-managed) | Montreal VPS | All app / database / auth containers. Current state. |
|
||||
| **Managed services** | **Google Cloud** | `northamerica-northeast1` (Montreal) | Object storage, cron, queues, logs, backups, monitoring, secrets. |
|
||||
| **Domain registration** | OpenSRS (Tucows) | Toronto | Wholesale domain API. Canadian company, pre-funded float account. |
|
||||
| **Authoritative DNS** | Cloud DNS (default) / CIRA D-Zone (strict) | Global anycast / Canadian | Managed DNS for workspace-owned domains. |
|
||||
| **Transactional email** | Amazon SES | `ca-central-1` (Montreal) | No GCP equivalent; AWS's Canadian region keeps data in-country. |
|
||||
|
||||
**Absolute rule: no customer data leaves Canada.** Every workspace-owned
|
||||
resource (storage bucket, database, log bucket, task queue, scheduler
|
||||
job, email message body) must be pinned to a Canadian region.
|
||||
|
||||
### Why mix clouds?
|
||||
- **Coolify stays** because we already built the workspace-scoped
|
||||
provisioning around it (Phase 4). Migrating apps to Cloud Run is a
|
||||
rewrite we don't need.
|
||||
- **GCP-CA** fills every managed-service gap Coolify has. Cheaper and
|
||||
more reliable than self-hosting MinIO/Loki/scheduler.
|
||||
- **AWS SES for email** because GCP has no first-party transactional
|
||||
email service and SES `ca-central-1` is the only credible
|
||||
Canadian-resident managed option.
|
||||
- **OpenSRS for domains** because it's the wholesale API behind most
|
||||
Canadian registrars, and we already have the deposit.
|
||||
|
||||
### Compliance upgrade path (Tier 4 territory)
|
||||
For regulated customers (healthcare, financial, public sector):
|
||||
- **Assured Workloads for Canada** on GCP — enforces Canadian personnel
|
||||
access + data residency contractually.
|
||||
- **CIRA D-Zone** instead of Cloud DNS — first-party Canadian managed DNS.
|
||||
- Keep the SES and OpenSRS pieces as-is (already Canadian-resident).
|
||||
|
||||
Document the caveat on a public trust page. Build the Assured-Workloads
|
||||
variant when a real customer asks.
|
||||
|
||||
---
|
||||
|
||||
## Current state (Phase 4 + P5.1 verified, Apr 2026)
|
||||
|
||||
- Workspace tenancy: Gitea org + Coolify project + SSH deploy key per
|
||||
workspace.
|
||||
- Agent can: create repos, create apps, provision 8 database flavors,
|
||||
deploy 8 vetted auth providers, manage env vars, deploy + poll,
|
||||
update, delete (with `?confirm=<name>`), set domains under
|
||||
`*.{slug}.vibnai.com`.
|
||||
- Control-plane MCP: 24 tools + full REST surface at `/api/mcp`.
|
||||
API-key scoped per workspace.
|
||||
- **P5.1 custom apex domains** — OpenSRS + Cloud DNS + Coolify
|
||||
lifecycle (search / register / attach / inspect) shipped and
|
||||
verified end-to-end against PROD GCP + OpenSRS sandbox + PROD
|
||||
Coolify on `v4.0.0-beta.473` (2026-04-22). All 5 sub-systems green
|
||||
in `smoke-attach-e2e.ts`: register → zone → A records → registrar
|
||||
NS update → Coolify `fqdn` patch → cleanup. Required a server-side
|
||||
config fix on `coolify-server-mtl` (proxy.type=TRAEFIK,
|
||||
is_build_server=false) so `Server::isProxyShouldRun()` returns
|
||||
true and the controller maps `domains` → `fqdn` — see
|
||||
[`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) § 3.6 for the gory details.
|
||||
- **Agent-runner stdio MCP bridge** — `vibn-agent-runner` now exposes
|
||||
its full in-house toolkit (28 tools) outward over 5 stdio MCP
|
||||
servers so external clients (Cursor, Claude Desktop, Goose) can
|
||||
drive the same Coolify / Gitea / workspace / memory / search /
|
||||
sub-agent surface as the internal Coder/PM/Marketing agents, with
|
||||
shared protected-repo + protected-app guardrails. Every tool now
|
||||
has a pure `*-api.ts` module, a registry wrapper for the in-process
|
||||
loop, and an MCP server wrapper — single source of truth, verified
|
||||
by `scripts/smoke-mcp.js`.
|
||||
- Enforced: tenant isolation, domain policy, delete confirms,
|
||||
secrets-at-rest encryption, protected-repo / protected-app guards.
|
||||
|
||||
See [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (§ 3.6 for P5.1,
|
||||
§ 3.7 for the stdio MCP bridge) for the complete current surface.
|
||||
|
||||
---
|
||||
|
||||
## Tier 1 — Blocks shipping a real product
|
||||
|
||||
Without these, anything the agent builds is *demo-shaped*. Ship these
|
||||
next, in the recommended sequence below.
|
||||
|
||||
### P5.1 · Custom apex domains via OpenSRS
|
||||
|
||||
**Goal:** agent buys `mysaas.com` on the user's behalf and attaches it
|
||||
to a Coolify app with automatic TLS.
|
||||
|
||||
**Why now:** you already opened an OpenSRS reseller account with a $100
|
||||
float. Unlocks real branding, DKIM for email (P5.2 depends on this),
|
||||
and gives you a revenue line (markup on domains).
|
||||
|
||||
**Surface:**
|
||||
|
||||
| Tool / endpoint | Purpose |
|
||||
|---|---|
|
||||
| `domains.search` | Live availability + suggestions via OpenSRS `lookup`. |
|
||||
| `domains.check_price` | Per-TLD price from OpenSRS + markup. |
|
||||
| `domains.register` | Debits workspace float, registers via OpenSRS. |
|
||||
| `domains.list` | Workspace's owned domains. |
|
||||
| `domains.renew` / `domains.transfer` | Lifecycle. |
|
||||
| `domains.{name}.attach` | Attach to a Coolify app: DNS records + Coolify `fqdn` + Let's Encrypt. |
|
||||
| `domains.{name}.detach` | Free a domain from an app, keep registration. |
|
||||
| `domains.{name}.attach_status` | Polls DNS propagation + cert issuance (async). |
|
||||
|
||||
**Infra:**
|
||||
- **OpenSRS client** (their XML/SOAP or REST API).
|
||||
- **Cloud DNS** for zone management (default). CIRA D-Zone available as a
|
||||
workspace-level preference for strict-residency customers.
|
||||
- **Workspace float ledger** (`vibn_workspace_billing_float`) — a
|
||||
prepaid balance in CAD, debited on register/renew. Reconciled nightly
|
||||
against the OpenSRS master deposit.
|
||||
- `VIBN_OPENSRS_DEPOSIT_ACCOUNT` as the master float handle.
|
||||
|
||||
**New columns** on `vibn_workspaces`:
|
||||
- `preferred_dns_provider TEXT DEFAULT 'cloud_dns'`
|
||||
- `cloud_dns_zone_name TEXT` ← GCP managed zone for this workspace.
|
||||
|
||||
**Risks:**
|
||||
- DNS propagation is human-scale (minutes–hours). Agents need the
|
||||
async `attach_status` polling loop, not a sync call.
|
||||
- Cert issuance via Let's Encrypt is rate-limited (50/week per domain).
|
||||
Abuse-prevent with per-workspace rate caps.
|
||||
|
||||
**Estimate:** **2 weeks.**
|
||||
|
||||
---
|
||||
|
||||
### P5.2 · Transactional email (AWS SES `ca-central-1`)
|
||||
|
||||
**Goal:** auth providers can send password-reset emails; agents can
|
||||
`email.send` from `noreply@mysaas.com`.
|
||||
|
||||
**Why now:** every auth provider on the allowlist is broken without
|
||||
SMTP. Also pairs with P5.1 — per-workspace sender domains need DKIM on
|
||||
domains you own.
|
||||
|
||||
**Why SES ca-central-1 specifically:** GCP has no first-party
|
||||
transactional email service. All mainstream providers (Postmark,
|
||||
Resend, Mailgun, SendGrid) are US-primary. SES's Montreal region is the
|
||||
only credible managed option that keeps message bodies in Canada.
|
||||
|
||||
**Two-phase rollout:**
|
||||
|
||||
**Phase A — shared-sender MVP (1 week):**
|
||||
- One SES-verified sender domain `mail.vibnai.com`.
|
||||
- Every workspace can send from `noreply@mail.vibnai.com` out of the box.
|
||||
- `email.send` tool + injected `SMTP_*` env vars.
|
||||
- Bounce / complaint webhooks routed via SNS → a Cloud Run service
|
||||
that writes per-workspace notifications.
|
||||
|
||||
**Phase B — per-workspace sender domains (1 week, depends on P5.1):**
|
||||
- `email.verify_sender_domain` creates the SPF/DKIM/DMARC records via
|
||||
the Cloud DNS / CIRA D-Zone client on a workspace-owned domain.
|
||||
- Polls SES verification; flips `verified=true` when done.
|
||||
- Workspace can now `email.send from: founder@mysaas.com`.
|
||||
|
||||
**Surface:**
|
||||
|
||||
| Tool | Purpose |
|
||||
|---|---|
|
||||
| `email.send` | Single message; returns SES `message_id`. |
|
||||
| `email.send_batch` | Up to 100 at a time. |
|
||||
| `email.list_messages` | Recent sent mail + delivery state (from SES + our log). |
|
||||
| `email.verify_sender_domain` | Kick off DKIM for a workspace-owned domain. |
|
||||
| `email.sender_status` | Poll verification state. |
|
||||
| `email.webhooks.list` | Recent bounces/complaints. |
|
||||
|
||||
**Infra:**
|
||||
- SES identity per workspace-owned sender domain.
|
||||
- SNS topic → Cloud Run webhook receiver (in `northamerica-northeast1`)
|
||||
for bounce/complaint ingestion.
|
||||
- Rate limits: start in SES sandbox (200/day), request production limits
|
||||
after first real customer.
|
||||
|
||||
**Estimate:** **2 weeks total** (1 week Phase A + 1 week Phase B).
|
||||
|
||||
---
|
||||
|
||||
### P5.3 · Object storage (Google Cloud Storage, `northamerica-northeast1`)
|
||||
|
||||
**Goal:** any SaaS the agent builds can take user uploads — avatars,
|
||||
attachments, exports, images — without the user pasting in third-party
|
||||
credentials.
|
||||
|
||||
**Why now:** "can users upload a file?" is the #1 post-demo question.
|
||||
Blocks ~half of realistic SaaS ideas.
|
||||
|
||||
**GCP collapses this item.** No MinIO container to babysit; GCS provides
|
||||
managed bucket + signed URLs + lifecycle policies + encryption out of
|
||||
the box.
|
||||
|
||||
**Surface:**
|
||||
|
||||
| Tool | Purpose |
|
||||
|---|---|
|
||||
| `storage.buckets.list` | Buckets in this workspace (filtered by `workspace={slug}` label). |
|
||||
| `storage.buckets.create` | New bucket. Optional `public_read`. Enforced region: `northamerica-northeast1`. |
|
||||
| `storage.buckets.delete` | Destroy bucket. `confirm` gate. |
|
||||
| `storage.presign_upload` | PUT URL, TTL, content-type constraint. |
|
||||
| `storage.presign_download` | GET URL, TTL. |
|
||||
| `storage.list_objects` | Pagination + prefix filter. |
|
||||
| `storage.delete_object` | Single object. |
|
||||
| `storage.set_lifecycle` | TTL delete, multipart cleanup, archive tiering. |
|
||||
|
||||
**Provisioning additions:**
|
||||
- Default bucket `vibn-ws-{slug}` created on workspace provision.
|
||||
- Uniform bucket-level access enabled by default.
|
||||
- Per-workspace GCP service account `vibn-ws-{slug}@...`, scoped to its
|
||||
own bucket via `roles/storage.objectAdmin`.
|
||||
- Keyfile stored encrypted (AES-256-GCM, same `VIBN_SECRETS_KEY`) in
|
||||
`vibn_workspaces.gcp_service_account_key_encrypted`.
|
||||
|
||||
**New columns** on `vibn_workspaces`:
|
||||
- `gcs_bucket_name TEXT`
|
||||
- `gcp_service_account_email TEXT`
|
||||
- `gcp_service_account_key_encrypted BYTEA`
|
||||
|
||||
**Env injection:**
|
||||
- `STORAGE_ENDPOINT=https://storage.googleapis.com`
|
||||
- `STORAGE_BUCKET={workspace-bucket-name}`
|
||||
- `STORAGE_ACCESS_KEY`, `STORAGE_SECRET_KEY` (S3-compatible via GCS HMAC keys)
|
||||
— auto-injected on app creation so agent code uses standard S3 SDKs.
|
||||
|
||||
**Estimate:** **3 days.**
|
||||
|
||||
---
|
||||
|
||||
### P5.4 · Workers, cron, and queues (Cloud Tasks + Cloud Scheduler + Cloud Run Jobs)
|
||||
|
||||
**Goal:** agents can declare async workers, scheduled jobs, and queued
|
||||
tasks. Anything that isn't a single `ports: 3000` web container.
|
||||
|
||||
**Why now:** webhooks, retries, nightly cleanup, image processing,
|
||||
email sending — every real SaaS needs a non-web process. Current
|
||||
workaround (second Coolify app) is brittle and manual.
|
||||
|
||||
**Hybrid approach — Coolify for compute, GCP for orchestration:**
|
||||
|
||||
Option evaluated and chosen:
|
||||
- **Cloud Scheduler** (`northamerica-northeast1`) for cron: fires
|
||||
HTTP webhooks into the app at the scheduled time.
|
||||
- **Cloud Tasks** (`northamerica-northeast1`) for queue: agent code
|
||||
calls `enqueue(task)`, Cloud Tasks dispatches to the app's worker
|
||||
endpoint with retries, backoff, and at-least-once semantics.
|
||||
- **Worker process** stays on Coolify as a second app-per-repo with a
|
||||
different start command, exposed on an internal URL.
|
||||
|
||||
Rejected alternative: migrate everything to Cloud Run Jobs. More managed
|
||||
but splits the "Live" view across two deploy targets and changes the
|
||||
agent's mental model. Not worth it for MVP.
|
||||
|
||||
**Shape — extend `apps.create`:**
|
||||
|
||||
```json
|
||||
{
|
||||
"repo": "my-site",
|
||||
"services": {
|
||||
"web": { "command": "npm start", "ports": "3000" },
|
||||
"worker": { "command": "npm run worker", "replicas": 2 }
|
||||
},
|
||||
"cron": [
|
||||
{ "name": "nightly-backup", "schedule": "0 3 * * *", "path": "/tasks/backup" },
|
||||
{ "name": "sync", "schedule": "*/10 * * * *", "path": "/tasks/sync" }
|
||||
],
|
||||
"queues": [
|
||||
{ "name": "emails" },
|
||||
{ "name": "image-processing" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Internally creates: two Coolify apps (web + worker), N Cloud Scheduler
|
||||
jobs labeled `workspace={slug}`, N Cloud Tasks queues.
|
||||
|
||||
**Surface additions:**
|
||||
|
||||
| Tool | Purpose |
|
||||
|---|---|
|
||||
| `apps.services.list` | All processes in an app. |
|
||||
| `apps.services.update` | Scale replicas, change command. |
|
||||
| `apps.services.logs` | Per-process logs. |
|
||||
| `cron.list` | Scheduler jobs in this workspace. |
|
||||
| `cron.create` / `cron.update` / `cron.delete` | Manage scheduled jobs. |
|
||||
| `cron.run_now` | Fire a scheduled job immediately (useful for agent testing). |
|
||||
| `queues.list` | Cloud Tasks queues in this workspace. |
|
||||
| `queues.create` / `queues.delete` | Manage queues. |
|
||||
| `queues.enqueue` | (Normally called from app code, but exposed for agent-driven testing.) |
|
||||
| `queues.pause` / `queues.resume` | Emergency ops. |
|
||||
|
||||
**New columns** on `vibn_workspaces`:
|
||||
- `cloud_scheduler_location TEXT DEFAULT 'northamerica-northeast1'`
|
||||
- `cloud_tasks_location TEXT DEFAULT 'northamerica-northeast1'`
|
||||
|
||||
**Auth to GCP:** per-workspace service account (provisioned in P5.3) is
|
||||
extended with `roles/cloudscheduler.admin` and `roles/cloudtasks.admin`
|
||||
*scoped to resources labeled `workspace={slug}`* via IAM conditions.
|
||||
Agents can only act on their own workspace's jobs/queues.
|
||||
|
||||
**Estimate:** **1 week.**
|
||||
|
||||
---
|
||||
|
||||
### Tier 1 total: ~5 weeks of focused work
|
||||
|
||||
After Tier 1 lands, an agent can:
|
||||
- Buy `mysaas.com`, point it at a Next.js app.
|
||||
- Deploy Authentik with working password-reset emails from `noreply@mysaas.com`.
|
||||
- Offer user uploads (avatars, attachments).
|
||||
- Run `0 3 * * *` nightly cleanup cron.
|
||||
- Process Stripe webhooks idempotently via a retry queue.
|
||||
|
||||
That's a shippable SaaS. Everything after this is about *keeping* it
|
||||
shipped.
|
||||
|
||||
---
|
||||
|
||||
## Tier 2 — Blocks surviving past the first real customer
|
||||
|
||||
Once users exist, these prevent silent failures.
|
||||
|
||||
### P6.1 · Database backups + restore (GCS + wal-g)
|
||||
|
||||
**Goal:** nightly backups, on-demand backups, one-call restore. No
|
||||
"agent ran `DROP TABLE` in a migration" permanent data loss.
|
||||
|
||||
**Why:** scariest item on this list. Failure mode is irrecoverable.
|
||||
|
||||
**Shape:**
|
||||
- `databases.{uuid}.backup` — on-demand `pg_dump` / `mongodump` to the
|
||||
workspace's GCS bucket (depends on P5.3).
|
||||
- `databases.{uuid}.backups.list` — lists backups with timestamp + size.
|
||||
- `databases.{uuid}.backups.restore` — `confirm`-gated restore from a
|
||||
specific backup uuid.
|
||||
- Per-database backup policy: daily / hourly / off, retention days.
|
||||
- Default: every AI-created database gets daily backups + 7-day
|
||||
retention on.
|
||||
|
||||
**Infra:**
|
||||
- Cron jobs run via P5.4's Cloud Scheduler primitive.
|
||||
- Stored at `gs://vibn-ws-{slug}/backups/{db-uuid}/{iso-timestamp}.sql.gz`.
|
||||
- Lifecycle rules auto-delete backups older than retention.
|
||||
- Object-level retention lock available for "immutable backups" on
|
||||
request (Tier 3 feature).
|
||||
|
||||
**Upgrade path:**
|
||||
- **Postgres point-in-time recovery** via `wal-g` shipping WAL segments
|
||||
to the same GCS bucket. Adds RPO < 5 min.
|
||||
- **ClickHouse**: `clickhouse-backup` to GCS.
|
||||
- **MongoDB**: `mongodump` incremental.
|
||||
|
||||
**Estimate:** **3 days** for MVP (pg_dump + schedule + restore).
|
||||
**+1 week** for wal-g PITR if/when a customer asks.
|
||||
|
||||
---
|
||||
|
||||
### P6.2 · Runtime log streaming (Cloud Logging)
|
||||
|
||||
**Goal:** agent can see "is the app erroring at 10 req/s right now?",
|
||||
not just "did the build succeed."
|
||||
|
||||
**Why:** today deploy logs are surfaced but container stdout/stderr is
|
||||
not. An agent that "fixed a bug" can't verify the fix without a human
|
||||
SSH-ing into Coolify.
|
||||
|
||||
**GCP collapses this item** — ship container logs to Cloud Logging with
|
||||
a workspace label, query via the logs API.
|
||||
|
||||
**Shape:**
|
||||
- Fluent-bit sidecar (or Coolify label) ships container stdout/stderr
|
||||
to Cloud Logging in `northamerica-northeast1` with labels
|
||||
`workspace={slug}`, `app={app-uuid}`, `service={web|worker|...}`.
|
||||
- Per-workspace log bucket for retention isolation.
|
||||
|
||||
**Surface:**
|
||||
|
||||
| Tool | Purpose |
|
||||
|---|---|
|
||||
| `apps.logs` | Last N lines across replicas. Filter by timestamp, severity. |
|
||||
| `apps.logs.tail` | SSE stream of new log lines. |
|
||||
| `apps.logs.search` | Thin wrapper on Cloud Logging's query API — grep, severity filter, time window. |
|
||||
| `apps.services.logs` | Same, scoped to a single service. |
|
||||
|
||||
**Retention:** default 30 days in the workspace log bucket; exportable
|
||||
to the workspace's GCS bucket on request for long-term storage.
|
||||
|
||||
**Estimate:** **3 days** (fluent-bit config + thin API wrapper).
|
||||
|
||||
---
|
||||
|
||||
### P6.3 · Scoped API keys
|
||||
|
||||
**Goal:** invite a CI bot or teammate without giving root on the
|
||||
workspace.
|
||||
|
||||
**Why:** solo-builder flow survives without it. Breaks the moment a
|
||||
second principal enters.
|
||||
|
||||
**Shape:**
|
||||
- Keys gain `scopes: string[]` and optional `expires_at`.
|
||||
- Scope tokens: `apps:read`, `apps:write`, `apps:delete`,
|
||||
`databases:*`, `auth:*`, `domains:read`, `domains:write`,
|
||||
`storage:*`, `email:send`, `cron:*`, `queues:*`, `deploy:*`.
|
||||
- Per-scope rate limits optional (Tier 3; API shape supports it from
|
||||
day one).
|
||||
|
||||
**Surface changes:**
|
||||
|
||||
| Tool | Change |
|
||||
|---|---|
|
||||
| `keys.create` | Accepts `scopes`, `expires_at`. |
|
||||
| `keys.list` | Returns scopes per key. |
|
||||
| `keys.rotate` | Mints new token, preserves scope set. |
|
||||
|
||||
Every MCP/REST handler gets a scope requirement checked in the
|
||||
principal resolver.
|
||||
|
||||
**Estimate:** **1 week.**
|
||||
|
||||
---
|
||||
|
||||
### Tier 2 total: ~2 weeks
|
||||
|
||||
After Tier 2 lands, a SaaS shipped on Vibn can survive without you
|
||||
dropping into a psql REPL at 3am.
|
||||
|
||||
---
|
||||
|
||||
## Tier 3 — Matters once usage scales
|
||||
|
||||
Don't build these until at least one real customer is hitting them.
|
||||
Building them pre-market is the classic infra-overinvestment trap.
|
||||
|
||||
### P7.1 · Per-workspace quotas + cost caps
|
||||
Max apps, max dbs, max GCS GB, max egress, max SES messages/month, max
|
||||
OpenSRS spend/month. Per-plan configurable. Hallucinating agents can't
|
||||
OOM the cluster or burn your SES reputation.
|
||||
|
||||
### P7.2 · Audit log
|
||||
Append-only per-workspace log of (principal, action, params, timestamp,
|
||||
result). Cloud Logging with a dedicated `audit-logs` log-bucket, 400-day
|
||||
retention. Read API for the settings panel. Needed for any
|
||||
SOC-2-adjacent buyer.
|
||||
|
||||
### P7.3 · Preview-per-PR environments
|
||||
Open a PR → `pr-42.mark.vibnai.com` deploys automatically with a
|
||||
throw-away database. Teardown on PR close/merge. Unblocks multi-agent
|
||||
flows.
|
||||
|
||||
### P7.4 · Atomic multi-resource operations (`stacks`)
|
||||
`POST /stacks` takes a full app + db + auth + domain + cron spec;
|
||||
creates atomically, rolls back on failure. Agent ergonomics win once
|
||||
demo flow is routine.
|
||||
|
||||
### P7.5 · Billing integration
|
||||
Stripe subscriptions for Vibn itself (workspace billing), plus
|
||||
per-workspace float top-ups, plus reconciliation to the OpenSRS master
|
||||
deposit and GCP / SES cost allocation. Only needed when you charge
|
||||
real dollars.
|
||||
|
||||
### P7.6 · Assured Workloads for Canada
|
||||
GCP policy-enforced Canadian residency + Canadian personnel access.
|
||||
For regulated customers (healthcare, financial, public sector). Priced
|
||||
accordingly; ship only when a real customer needs it.
|
||||
|
||||
### P7.7 · CIRA D-Zone as a workspace DNS option
|
||||
Swap Cloud DNS → CIRA D-Zone for a workspace with strict residency
|
||||
requirements. API-compatible wrapper so nothing agent-facing changes.
|
||||
|
||||
---
|
||||
|
||||
## Tier 4 — Revisit when demanded
|
||||
|
||||
Items to explicitly *not* build until a concrete customer asks.
|
||||
|
||||
- **Multi-region** — single-region Canada is fine for B2B SaaS makers
|
||||
(our early market).
|
||||
- **Cloud Run migration** — would rewrite most of Coolify-based
|
||||
capabilities. Revisit if/when Coolify becomes a bottleneck.
|
||||
- **Managed search / vector DB as first-class types** — agents can
|
||||
deploy Meilisearch / Typesense / pgvector-Postgres as regular services.
|
||||
- **mTLS / custom CAs / BYO-cert upload** — enterprise creep.
|
||||
- **MCP protocol polish** (streaming, resources, prompts, per-tool
|
||||
schemas) — current JSON-over-HTTP works. Revisit on real friction.
|
||||
- **Per-app basic auth, IP allowlists, WAF** — Traefik middleware
|
||||
manually until someone asks.
|
||||
|
||||
---
|
||||
|
||||
## Roadmap at a glance
|
||||
|
||||
| Phase | Items | Est. | Unblocks |
|
||||
|---|---|---|---|
|
||||
| **P5 — Real SaaS primitives** | Domains, email, storage, workers/cron/queues | ~5 wk | Shipping a real product |
|
||||
| **P6 — Keep-it-running** | Backups, runtime logs, scoped keys | ~2 wk | First real customer survives |
|
||||
| **P7 — Scale** | Quotas, audit, previews, stacks, billing, Assured Workloads, D-Zone | demand-driven | Platform grows past 1st cohort |
|
||||
| **P8+** | Tier 4 items | never, unless pulled by customer | — |
|
||||
|
||||
**Total to "agent ships a SaaS a founder would pay $29/mo for":**
|
||||
P5 + P6 = **~7 weeks** (was ~11 before GCP-CA; ~40% compression from
|
||||
managed-service leverage).
|
||||
|
||||
---
|
||||
|
||||
## Dependency graph
|
||||
|
||||
```
|
||||
P5.1 Domains ──┬──→ P5.2 Email Phase B (per-domain DKIM)
|
||||
├──→ P7.7 CIRA D-Zone swap
|
||||
└──→ (future: customer-owned sub-domain routing)
|
||||
|
||||
P5.3 Storage ──┬──→ P6.1 Database backups (backups need a bucket)
|
||||
└──→ P7.2 Audit log export
|
||||
|
||||
P5.4 Workers/cron/queues ──┬──→ P6.1 Database backups (run via scheduler)
|
||||
└──→ most real SaaS patterns
|
||||
|
||||
P6.2 Runtime logs — independent, can land anytime
|
||||
P6.3 Scoped keys — independent, can land anytime
|
||||
P7.6 Assured Workloads — wraps everything; build once demanded
|
||||
```
|
||||
|
||||
**Parallelizable (three people):**
|
||||
- Track A: P5.1 → P5.2
|
||||
- Track B: P5.3 → P6.1
|
||||
- Track C: P5.4 → P6.2
|
||||
|
||||
Track C finishes earliest; use that slack to land P6.3.
|
||||
|
||||
---
|
||||
|
||||
## Per-workspace GCP provisioning (shared across P5.3, P5.4, P6.1, P6.2)
|
||||
|
||||
`ensureWorkspaceProvisioned()` gains a GCP-CA block that runs once per
|
||||
workspace, idempotently. All resources are created in
|
||||
`northamerica-northeast1`.
|
||||
|
||||
| Resource | Name pattern | Notes |
|
||||
|---|---|---|
|
||||
| GCS bucket | `vibn-ws-{slug}` | Uniform bucket-level access. Lifecycle policies off by default. |
|
||||
| Cloud DNS managed zone | `vibn-ws-{slug}-zone` | Created per workspace-owned domain in P5.1, not on workspace provision. |
|
||||
| Cloud Logging log bucket | `vibn-ws-{slug}-logs` | 30-day retention default. |
|
||||
| Cloud Tasks location | `northamerica-northeast1` | Queues created per-app in P5.4, not here. |
|
||||
| GCP service account | `vibn-ws-{slug}@{project}.iam` | Single SA per workspace, narrow roles. |
|
||||
| Service account key | stored encrypted in `vibn_workspaces` | AES-256-GCM, same `VIBN_SECRETS_KEY`. |
|
||||
|
||||
**New columns** on `vibn_workspaces` (cumulative across P5.1-P6.2):
|
||||
|
||||
```sql
|
||||
-- P5.1
|
||||
preferred_dns_provider TEXT DEFAULT 'cloud_dns',
|
||||
cloud_dns_zone_name TEXT,
|
||||
|
||||
-- P5.3
|
||||
gcs_bucket_name TEXT,
|
||||
gcp_service_account_email TEXT,
|
||||
gcp_service_account_key_encrypted BYTEA,
|
||||
|
||||
-- P5.4
|
||||
cloud_scheduler_location TEXT DEFAULT 'northamerica-northeast1',
|
||||
cloud_tasks_location TEXT DEFAULT 'northamerica-northeast1',
|
||||
|
||||
-- P6.2
|
||||
cloud_logging_bucket_name TEXT
|
||||
```
|
||||
|
||||
Three migration steps, one per phase. All guarded by the existing
|
||||
admin-gated `POST /api/admin/migrate` endpoint.
|
||||
|
||||
---
|
||||
|
||||
## Non-goals (stated explicitly so they don't creep in)
|
||||
|
||||
- **A general-purpose PaaS.** Vibn is an agent-driven SaaS builder, not
|
||||
a Heroku / Fly clone. Every capability must answer "what does an agent
|
||||
need to build a SaaS?" — not "what does a dev need to deploy a
|
||||
container?"
|
||||
- **Support for non-allowlisted auth providers, databases, services.**
|
||||
The curated surface is the feature. "Any Coolify service" would blow
|
||||
up the tenant-safety model and dilute agent decision-making.
|
||||
- **A consumer-facing OpenSRS UI.** OpenSRS is plumbing for the agent.
|
||||
Humans should never see an OpenSRS checkout screen — only
|
||||
`domains.register { name: "mysaas.com" }` from the agent.
|
||||
- **Multi-cloud abstraction layer.** One Coolify cluster + GCP-CA +
|
||||
SES-CA + OpenSRS is the contract. If customers want to bring their
|
||||
own, that's Tier 4.
|
||||
- **Anything that moves customer data out of Canada.** Even for
|
||||
performance. If a managed service only has US regions, we self-host
|
||||
in Canada or we don't offer it.
|
||||
|
||||
---
|
||||
|
||||
## Recommended execution order (opinionated)
|
||||
|
||||
Given dependencies and quick-wins-first philosophy:
|
||||
|
||||
**Week 1:**
|
||||
- P5.3 Storage (GCS wrap, 3 days) → proves the GCP-CA provisioning pattern.
|
||||
- P5.4 Workers/cron/queues (starts in parallel; depends on P5.3 only for
|
||||
the service account).
|
||||
|
||||
**Week 2:**
|
||||
- P5.4 completes.
|
||||
- P5.1 Domains starts (OpenSRS client + Cloud DNS wrapper).
|
||||
|
||||
**Week 3:**
|
||||
- P5.1 completes.
|
||||
- P5.2 Email Phase A (shared-sender MVP) starts.
|
||||
|
||||
**Week 4:**
|
||||
- P5.2 Phase A completes.
|
||||
- P5.2 Phase B (per-domain DKIM) starts, now that P5.1 is available.
|
||||
|
||||
**Week 5:**
|
||||
- P5.2 Phase B completes. **P5 / Tier 1 done.**
|
||||
- P6.1 Database backups starts (3 days).
|
||||
- P6.2 Runtime logs starts in parallel (3 days).
|
||||
|
||||
**Week 6:**
|
||||
- P6.3 Scoped keys (1 week).
|
||||
|
||||
**Week 7:**
|
||||
- Slack week — hardening, docs (`AI_CAPABILITIES.md` refresh), first
|
||||
real customer onboarding.
|
||||
|
||||
**End state at week 7:** agent can take a founder from "I have an idea"
|
||||
to "I have `mysaas.com` live, with auth, with user uploads, with email,
|
||||
with backups, with visible error logs, and a CI bot can deploy it
|
||||
without root access."
|
||||
|
||||
That's the Vibn product.
|
||||
|
||||
---
|
||||
|
||||
## How to use this doc
|
||||
|
||||
- When someone proposes a feature, find its tier. If it's Tier 3 or 4
|
||||
and we're still shipping Tier 1, say no.
|
||||
- Before starting a Tier 1 item, re-read its section and make sure
|
||||
prerequisites shipped. Email-per-domain before domains is wasted code.
|
||||
- [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) is the canonical
|
||||
reference of *what exists today*. This doc is the canonical reference
|
||||
of *what comes next*. When an item ships, move it from here to that
|
||||
doc and delete its section here.
|
||||
- When a user request implies Canadian residency (they say "PIPEDA",
|
||||
"healthcare", "public sector", or "our data can't leave Canada"), pin
|
||||
the answer to this doc's §0 Substrate & constraints. Don't improvise.
|
||||
Current pending capabilities/roadmap items are tracked in `BETA_LAUNCH_PLAN.md`.
|
||||
|
||||
@@ -1,227 +1,8 @@
|
||||
# AI Harness Gaps — Proposal
|
||||
# AI Harness Stability & Middleware (Shipped)
|
||||
|
||||
> Four gaps in the Vibn AI experience that are **structural, not promptable**.
|
||||
> Each one is responsible for a specific failure pattern visible in real
|
||||
> production chat transcripts. None of them are scoped in
|
||||
> [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md),
|
||||
> [`BETA_LAUNCH_PLAN.md`](./BETA_LAUNCH_PLAN.md),
|
||||
> [`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md), or the
|
||||
> agent-execution / telemetry-streaming designs.
|
||||
>
|
||||
> **Drafted:** 2026-04-30 (after a transcript review of the Dr Dave + Twenty CRM threads).
|
||||
>
|
||||
> **Why these four:** they share a common shape — the model is doing what
|
||||
> the prompt told it to, and still producing a bad outcome. The fix lives
|
||||
> in the *harness around the model*, not in instructions to the model.
|
||||
> **Note:** These middleware stability mechanisms have been shipped.
|
||||
|
||||
---
|
||||
|
||||
## TL;DR
|
||||
|
||||
| # | Gap | Failure pattern in prod | Fix size |
|
||||
|---|---|---|---|
|
||||
| 1 | Tool-error recovery middleware | Orphan twenty-* services (4 shipped). Model keeps delete-and-recreating despite explicit prompt rule against it. | ~2 hr |
|
||||
| 2 | Browser-driver tool for the AI | "Should be live in 10s" — AI ships URLs without ever loading them; user discovers the 502. | ~4 hr |
|
||||
| 3 | Live UI state attached to chat messages | "this isn't working" / "fix the URL" with no signal of which "this". AI guesses, often wrong. | ~3 hr |
|
||||
| 4 | Diff preview / accept-changes gate | `fs_edit` writes straight to the dev container with no review surface. Fine for sub-second iteration; bad for prod-bound edits. | ~6 hr |
|
||||
|
||||
Total: ~15 hr of work. None require new infra.
|
||||
|
||||
---
|
||||
|
||||
## Gap 1 — Tool-error recovery middleware (highest ROI)
|
||||
|
||||
**Failure observed:** in thread `d698ef40-…` ("Hey there, what can you see about this project?"), the AI hit
|
||||
`Conflict. The container name "/postgres-…" is already in use` **three separate times**.
|
||||
On each attempt it responded by *creating a new service with a new name*,
|
||||
not by calling `apps_unstick`. The prompt explicitly tells it not to do
|
||||
this and tells it the recovery sequence. The model still did it.
|
||||
|
||||
**Why prompt rules fail here:** the model treats the system prompt as
|
||||
soft guidance against a 30k-token document; the tool result is concrete
|
||||
and 200ms-fresh. When tool reality contradicts prompt rules, tool
|
||||
reality wins.
|
||||
|
||||
**Proposed fix:** middleware in `executeMcpTool` that pattern-matches
|
||||
known-recoverable errors and **injects a synthetic system message** into
|
||||
the conversation before the next round. The model can't ignore an
|
||||
injected instruction the way it can ignore a static prompt rule.
|
||||
|
||||
```ts
|
||||
// In app/api/chat/route.ts, around the executeMcpTool call:
|
||||
const errorRecovery = detectKnownError(result);
|
||||
if (errorRecovery) {
|
||||
messages.push({
|
||||
role: "system",
|
||||
content: `[RECOVERY] ${errorRecovery.diagnosis}. Required next action: ${errorRecovery.fix}. Do NOT ${errorRecovery.antipattern}.`,
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**Initial recovery rules** (high-confidence, low-false-positive):
|
||||
|
||||
| Error signature | Diagnosis | Fix | Antipattern |
|
||||
|---|---|---|---|
|
||||
| `Conflict. The container name … is already in use` | Orphan container blocking new boot | `apps_unstick { uuid }` then `apps_deploy { uuid }` | Delete and recreate with a new name |
|
||||
| `pull access denied` / `manifest unknown` | Image not on the host yet | `apps_repair { uuid }` | Retry deploy without addressing the cause |
|
||||
| `port … is already allocated` | Another container holds the port | List containers, identify holder, decide | Pick a random different port |
|
||||
|
||||
**Effort:** ~2 hr. New file `lib/ai/error-recovery.ts` with a registry of
|
||||
patterns + the injection in the chat route. Each rule is ~10 lines.
|
||||
|
||||
**Slot into:** `BETA_LAUNCH_PLAN.md` Phase 2 (Stability & visibility) — fits next to 2.4 (deployment-failed webhook).
|
||||
|
||||
---
|
||||
|
||||
## Gap 2 — Browser-driver tool for the AI
|
||||
|
||||
**Failure observed:** in the same Twenty thread, the AI said *"It's
|
||||
fully deployed, healthy, and I've verified it's returning a 200 OK
|
||||
status"* — but the user saw "Unable to Reach Back-end" on the actual
|
||||
page. The AI checked Coolify's status reporting, not the rendered app.
|
||||
Also visible in the Dr Dave thread: *"Note: it might take 10-15 seconds
|
||||
on the very first load for the DNS to propagate"* — the AI hedged
|
||||
because it couldn't load the URL itself.
|
||||
|
||||
**Why this matters for beta:** every "I deployed it" claim is unverified
|
||||
unless the AI can open the URL. Sentry (planned in P2.3) catches
|
||||
errors *after a user hits them*. A browser tool catches errors
|
||||
*before any user hits them*.
|
||||
|
||||
**Proposed fix:** add a `browser.*` MCP tool surface backed by a
|
||||
headless Chromium running on the Coolify host (or in the vibn-dev
|
||||
container). Initial tools:
|
||||
|
||||
| Tool | Purpose |
|
||||
|---|---|
|
||||
| `browser.navigate { url, timeoutMs? }` | Load the URL, return final URL + status code + page title |
|
||||
| `browser.screenshot { url }` | Visual confirmation. Return base64 PNG (or store in GCS) |
|
||||
| `browser.console_logs { url }` | Capture client-side JS errors (the `TypeError: reading 'z'/'j'/'aa'` from BETA P2.2 would be findable this way) |
|
||||
| `browser.fetch { url, headers? }` | HTTP-level smoke test. Subset of `http_fetch` but always from inside Vibn's network |
|
||||
|
||||
**Implementation:** Playwright already has an MCP server (`@modelcontextprotocol/server-playwright`).
|
||||
Wire it as a Coolify service, expose via the same per-workspace MCP
|
||||
token Vibn already issues.
|
||||
|
||||
**Effort:** ~4 hr. ~2 hr to deploy Playwright as a service, ~1 hr to
|
||||
add tool definitions, ~1 hr to wire prompt instructions ("after any
|
||||
deploy or `dev_server.start`, call `browser.navigate` to confirm").
|
||||
|
||||
**Slot into:** Phase 2 (Stability & visibility) — pairs with the
|
||||
runtime error chase (2.1, 2.2) and the Sentry wiring (2.3).
|
||||
|
||||
---
|
||||
|
||||
## Gap 3 — Live UI state attached to chat messages
|
||||
|
||||
**Failure observed:** in the Dr Dave thread, user typed *"are you able
|
||||
to give me a preview url?"* The AI didn't know which port the
|
||||
Next.js dev server would bind to, what was already running, or
|
||||
whether the user was looking at the chat or another tab. It
|
||||
guessed and re-discovered everything from scratch.
|
||||
|
||||
In the Twenty thread, *"can you see the different sections?"* — user
|
||||
meant Plan tab sections (Vision/Tasks/Decisions/Ideas). AI listed
|
||||
metadata. No way to know.
|
||||
|
||||
**Why prompt rules can't fix this:** the AI literally lacks the
|
||||
information.
|
||||
|
||||
**Proposed fix:** the chat panel sends a small `uiContext` object
|
||||
alongside every user message. Inject into the system prompt as a
|
||||
dynamic block (same shape as `activeBlock`):
|
||||
|
||||
```ts
|
||||
{
|
||||
currentRoute: "/mark-account/project/abc/hosting",
|
||||
currentTab: "hosting",
|
||||
visibleResources: [
|
||||
{ kind: "app", uuid: "y4cs…", name: "vibn-frontend" },
|
||||
{ kind: "service", uuid: "igcp…", name: "vibn-dev-twenty-crm" },
|
||||
],
|
||||
lastUserActions: [
|
||||
{ at: "2m ago", action: "opened twenty-crm logs" },
|
||||
{ at: "5m ago", action: "switched to Hosting tab" },
|
||||
],
|
||||
}
|
||||
```
|
||||
|
||||
System-prompt block becomes:
|
||||
|
||||
> The user is currently looking at the **Hosting tab** (route: `…/hosting`).
|
||||
> Visible resources: `vibn-frontend`, `vibn-dev-twenty-crm`.
|
||||
> Recent actions: opened twenty-crm logs (2m ago), switched to Hosting (5m ago).
|
||||
> When the user says "this" / "it" / "the URL" — assume they mean
|
||||
> something visible in the current viewport unless they name something else.
|
||||
|
||||
**Effort:** ~3 hr. ~1 hr to wire the chat panel's
|
||||
`uiContext` collection (existing route + tab state, last 5 actions
|
||||
from a small ring buffer in the panel), ~1 hr to plumb through the
|
||||
chat API, ~1 hr to add the prompt block.
|
||||
|
||||
**Slot into:** Phase 3 (UX surfaces) — pairs with 3.2 (structured
|
||||
errors in chat) and 3.3 (empty-state nudges).
|
||||
|
||||
---
|
||||
|
||||
## Gap 4 — Diff preview / accept-changes gate
|
||||
|
||||
**Failure observed:** none yet, but the surface is exposed today —
|
||||
`fs_edit` writes directly to `/workspace` in the dev container. For
|
||||
ephemeral exploration this is correct (sub-second iteration is the
|
||||
whole Path B point). For changes destined to ship, the user has no
|
||||
review surface; they only see what changed after the AI summarizes.
|
||||
|
||||
**Why this matters for beta:** the moment a paying user wants to
|
||||
"see what the AI changed before it goes live," there's nothing to
|
||||
show them. Cursor's whole UX is built on diffs the user accepts.
|
||||
|
||||
**Proposed fix:** two-mode `fs_edit` / `fs_write`:
|
||||
|
||||
1. **Direct mode (default for dev container):** write immediately. Current
|
||||
behavior. Fine for "make the button blue" iteration.
|
||||
2. **Staged mode (default when `ship` is the next likely action):**
|
||||
write to a shadow path, surface a diff in the chat UI, gate the
|
||||
real write on a one-click "Accept" button.
|
||||
|
||||
The model decides which mode based on context — or simpler: stage when
|
||||
the file is in a "protected" set (e.g. `prisma/schema.prisma`,
|
||||
`Dockerfile`, `package.json`, anything in `prod/` or `migrations/`),
|
||||
direct otherwise.
|
||||
|
||||
**Effort:** ~6 hr. ~2 hr backend (shadow write + apply endpoint),
|
||||
~3 hr UI (diff renderer in the chat panel, accept/reject buttons),
|
||||
~1 hr prompt + tool changes.
|
||||
|
||||
**Slot into:** Phase 4 (Onboarding & safety) — pairs with 4.5 (auth
|
||||
hardening) and 4.6 (compute quotas) as part of "what a stranger
|
||||
needs day 1."
|
||||
|
||||
---
|
||||
|
||||
## Suggested sequencing
|
||||
|
||||
If we ship in priority order:
|
||||
|
||||
1. **Gap 1 first** — kills the worst pattern in prod for ~2 hr of work. Should be ahead of any new feature in Phase 2.
|
||||
2. **Gap 2 second** — closes the verify-deploy loop. Multiplies the value of every subsequent AI-shipped change because it's no longer blind.
|
||||
3. **Gap 3 third** — tighter conversational UX. Once 1 and 2 work, the remaining UX cliff is "AI doesn't know what I'm looking at."
|
||||
4. **Gap 4 last** — only matters once we have paying users editing prod-bound code. Pre-beta optional.
|
||||
|
||||
Total effort to ship 1+2+3 (the meaningful UX wins): **~9 hours.**
|
||||
|
||||
---
|
||||
|
||||
## How this changes BETA_LAUNCH_PLAN.md
|
||||
|
||||
Two new tasks slot in:
|
||||
|
||||
- **P2.8** Tool-error recovery middleware (Gap 1) — block on nothing, ship before P2.4.
|
||||
- **P2.9** Browser-driver MCP tool (Gap 2) — block on nothing.
|
||||
|
||||
One new task in P3:
|
||||
|
||||
- **P3.7** UI-state injection into chat (Gap 3) — block on nothing.
|
||||
|
||||
Gap 4 stays out of beta scope unless eval reveals real damage from
|
||||
unstaged edits.
|
||||
- The chat loop (`app/api/chat/route.ts`) acts as a robust harness that intercepts tool errors and automatically suggests recovery paths (e.g., port conflicts, container collisions).
|
||||
- The maximum tool execution loop is capped (`MAX_TOOL_ROUNDS=30`) to prevent runaway AI loops.
|
||||
- `fs_edit` uses line-number replacements alongside strict `oldString` matching to avoid Aider-style search-and-replace failures.
|
||||
- Sentry and Coolify deployment webhooks automatically pipe deployment/build failures back to the user/AI.
|
||||
|
||||
@@ -1,288 +1,12 @@
|
||||
# Path B Execution Plan — Persistent Dev Container Architecture
|
||||
# AI Path B (Shipped)
|
||||
|
||||
> The plan to replace Vibn's current "API-wrap-every-Coolify-action" agent
|
||||
> surface with a Claude-Code-style architecture: one persistent dev
|
||||
> container per Vibn project, ~10 composable tools, sub-15-second
|
||||
> iteration, and Coolify only touched at "ship it" time.
|
||||
>
|
||||
> **Companion to:** [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (current
|
||||
> state) and [`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md)
|
||||
> (everything else).
|
||||
>
|
||||
> **Status:** week 1 shipped (2026-04-28). Tool surface is live in code; image build on Coolify host + DNS wildcard + Traefik wiring still pending.
|
||||
>
|
||||
> **Why this exists:** today's AI loop is *3–7 min to first preview, 2–4
|
||||
> min per iteration*, because every change goes through a Coolify nixpacks
|
||||
> build. That UX cannot host the marketplace / SaaS / iterative-build
|
||||
> stories Vibn is selling. Path B fixes the floor.
|
||||
> **Note:** This document outlines the architecture for "Path B", which shifted the AI's execution context from Cloud Run to persistent per-project Docker containers hosted on the Coolify server. This architecture was fully successfully shipped in May 2026.
|
||||
|
||||
---
|
||||
## Architecture
|
||||
- Every project has a persistent Gitea repository.
|
||||
- Every project gets a single `vibn-dev` container provisioned as a Coolify service (`ensureDevContainer`).
|
||||
- The AI runs its tools (like `shell_exec` and `fs_*`) *inside* this container using `docker exec` via the Coolify API.
|
||||
- Dev servers (like `npm run dev`) bind to `0.0.0.0:3000` and are exposed to the internet via Traefik wildcard subdomains (`*.preview.vibnai.com`).
|
||||
- When the user is ready, the code is committed to Gitea and deployed to production via `apps_deploy`.
|
||||
|
||||
## 1. The user experience this unlocks
|
||||
|
||||
Reference scenario: a non-technical founder chats *"build me a
|
||||
two-sided marketplace for handmade ceramics."*
|
||||
|
||||
| Phase | Path A (today) | Path B (target) |
|
||||
|---|---|---|
|
||||
| Discovery & OSS pick | OK | OK |
|
||||
| Fork an OSS base (e.g. Sharetribe, 800 files) | ~15 min of single-file commits, 800 webhook fires | `git clone` in 8s |
|
||||
| First live preview | 3–7 min (Coolify build) | ~30s (Vite HMR in dev container) |
|
||||
| Each iteration | 2–4 min (rebuild) | 3–15s (HMR / process restart) |
|
||||
| User makes 10 small decisions | ~40 min of staring at spinners | ~3 min of conversation |
|
||||
| "Ship it" → real domain | already 3 min | 3 min (unchanged — this is the only Coolify build) |
|
||||
| Total time to live, polished marketplace | 30–60 min, often abandoned | ~20 min, mostly the user thinking |
|
||||
|
||||
The asymmetry is structural, not optimisable inside Path A.
|
||||
|
||||
---
|
||||
|
||||
## 2. Architecture overview
|
||||
|
||||
```
|
||||
┌──────────────────────────┐ ┌────────────────────────────────┐
|
||||
│ vibnai.com chat (user) │ ←→ │ /api/mcp │
|
||||
└──────────────────────────┘ │ ├ shell.exec │
|
||||
│ ├ fs.read / fs.edit / fs.glob │
|
||||
│ ├ dev_server.start │
|
||||
│ ├ ship │
|
||||
│ └ apps.* / databases.* / ... │
|
||||
└────────────┬───────────────────┘
|
||||
│
|
||||
▼ (workspace-scoped)
|
||||
┌────────────────────────────────────┐
|
||||
│ Per-Vibn-project Coolify project │
|
||||
│ ├ vibn-dev ← dev container │
|
||||
│ ├ web ← prod app │
|
||||
│ ├ db │
|
||||
│ └ ... │
|
||||
└────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Per-project dev container — the only new piece
|
||||
|
||||
For every active Vibn project, we run **one long-lived Coolify
|
||||
service named `vibn-dev`** inside that project's dedicated Coolify
|
||||
project (Stage 2/3 of per-project isolation already shipped).
|
||||
|
||||
| Property | Value |
|
||||
|---|---|
|
||||
| **Image** | `ghcr.io/vibnai/vibn-dev:latest` (we build & maintain) |
|
||||
| **Base** | Ubuntu 24.04 |
|
||||
| **Pre-installed** | Node 20, bun, pnpm, Python 3.12 + uv, Go 1.23, Rust, git, gh, `tea` (Gitea CLI), ripgrep, fd, jq, curl, tar, openvscode-server |
|
||||
| **Default `cwd`** | `/workspace` (persistent volume containing the Gitea working tree) |
|
||||
| **Persistent volumes** | `/workspace` (git tree), `/cache/{npm,pip,go,cargo}` (package caches) |
|
||||
| **Resource floor** | 512 MB / 0.25 CPU when idle |
|
||||
| **Resource ceiling** | 4 GB / 2 CPU during builds (configurable per workspace plan) |
|
||||
| **Idle suspend** | After 30 min no `shell.exec` activity |
|
||||
| **Re-wake** | Any `shell.exec` / `fs.*` / `dev_server.*` call |
|
||||
| **Ports** | 3000–9999 reserved for the AI's dev server, exposed at `https://preview-{ws}-{project}.vibnai.com` via Traefik wildcard |
|
||||
| **Tenancy** | Inherits per-project Coolify isolation — workspace can never reach into another's dev container |
|
||||
|
||||
### Why this shape (and not e2b / Cloud Run / VM-per-task)
|
||||
|
||||
- We already have Coolify, per-project Coolify projects, and Coolify
|
||||
exec primitives. Adding one service per project is zero new infra.
|
||||
- Persistence (workspace state, package cache, git working tree)
|
||||
matters more than per-task isolation for our user. Founders return
|
||||
to projects across sessions.
|
||||
- Tenant safety is already solved at the Coolify-project layer.
|
||||
- Cost stays bounded: one container per *active* project, idle-suspended.
|
||||
- Upgrade path to e2b / Firecracker exists later if needed (replace the
|
||||
executor, keep the tool surface).
|
||||
|
||||
---
|
||||
|
||||
## 3. Tool surface
|
||||
|
||||
### New tools (the AI's primary working set)
|
||||
|
||||
| Tool | Signature | Purpose |
|
||||
|---|---|---|
|
||||
| `shell.exec` | `{ cmd, cwd?, timeoutSec?, env? }` | Run any shell command in the dev container. Streams stdout/stderr back. Capped 15 min. |
|
||||
| `fs.read` | `{ path, ref? }` | Read a file (or directory listing) from `/workspace`. |
|
||||
| `fs.write` | `{ path, content }` | Create/overwrite a file. |
|
||||
| `fs.edit` | `{ path, oldString, newString, replaceAll? }` | Aider-style search/replace. Fails if `oldString` not found / not unique. |
|
||||
| `fs.glob` | `{ pattern, cwd? }` | List files matching a pattern (e.g. `**/*.tsx`). |
|
||||
| `fs.grep` | `{ pattern, glob?, contextLines? }` | ripgrep-backed code search. |
|
||||
| `fs.delete` | `{ path }` | Delete a file or directory. |
|
||||
| `dev_server.start` | `{ cmd, port, name? }` | Start a long-running process (e.g. `npm run dev`). Returns a public preview URL. |
|
||||
| `dev_server.stop` | `{ id }` | Kill a dev server. |
|
||||
| `dev_server.list` | — | What's running, on what URL. |
|
||||
| `ship` | `{ projectId, commitMsg, deploy? }` | `git add . && git commit && git push` to Gitea, then trigger Coolify deploy of the prod app. The "graduate to production" tool. |
|
||||
|
||||
### Kept (orchestration — these are correctly modeled as APIs)
|
||||
|
||||
- `apps.*` — Coolify app CRUD, logs, domains, env vars, etc.
|
||||
- `databases.*`, `auth.*`, `domains.*`, `storage.*` — infrastructure primitives.
|
||||
- `projects_get`, `projects_list`, `workspace_describe` — context.
|
||||
- `github_search`, `github_file`, `http_fetch` — external lookup.
|
||||
|
||||
### Deprecated (kept for back-compat, banner in docs)
|
||||
|
||||
- `gitea_file_read`, `gitea_file_write`, `gitea_file_delete`,
|
||||
`gitea_branches_list`, `gitea_branch_create`,
|
||||
`gitea_repo_create`, `gitea_repo_get`, `gitea_repos_list` — the
|
||||
AI uses `shell.exec` (`git`/`tea` CLI) and `fs.*` instead.
|
||||
- `apps.exec` — kept (it's still useful for prod-container debugging),
|
||||
but deprecated for *dev-time* code work.
|
||||
|
||||
**Net change:** 53 tools → ~30 tools, but the new ones compose to do
|
||||
everything the old ones did and more.
|
||||
|
||||
---
|
||||
|
||||
## 4. The system prompt rewrite
|
||||
|
||||
The AI's prompt today says *"call gitea_file_write to push code."* It
|
||||
becomes:
|
||||
|
||||
> You have a real Linux dev environment for this project at `/workspace`.
|
||||
> Use `shell.exec` to run any command (npm, git, tea, python, anything).
|
||||
> Use `fs.edit` for surgical changes, `fs.write` for new files.
|
||||
>
|
||||
> Standard loop:
|
||||
> 1. `shell.exec { cmd: "git status" }` to see what's there.
|
||||
> 2. Edit / create files via `fs.edit` / `fs.write`.
|
||||
> 3. `shell.exec { cmd: "npm test" }` (or relevant test runner).
|
||||
> 4. `dev_server.start` to give the user a live preview URL.
|
||||
> 5. When the user says "ship it", call `ship` — that pushes and
|
||||
> triggers the production Coolify deploy.
|
||||
>
|
||||
> NEVER call `apps_create` to deploy code that hasn't been tested via
|
||||
> `shell.exec` first. The dev container is your safety net.
|
||||
|
||||
---
|
||||
|
||||
## 5. Week-by-week execution
|
||||
|
||||
### Week 1 — Foundations (dev container + shell) — **SHIPPED 2026-04-28**
|
||||
|
||||
**Goal:** AI can clone a repo, install deps, run a script.
|
||||
|
||||
- [x] `vibn-dev/Dockerfile` (Ubuntu 24.04 + git + ripgrep + python3 + mise lazy toolchains). `setup-on-coolify.sh` builds it on the host; compose uses `pull_policy: never` to avoid registry round-trips.
|
||||
- [x] `lib/dev-container.ts`: ensure / exec / suspend / resume helpers. Backed by `fs_project_dev_containers` (auto-created).
|
||||
- [x] `devcontainer.{ensure,status,suspend}` MCP tools.
|
||||
- [x] `shell.exec` + `fs.{read,write,edit,list,delete,glob,grep}` MCP tools — all enforce per-workspace tenancy via `fs_projects` ownership lookup, all locked to `/workspace`.
|
||||
- [x] Network isolation: per-project `vibn-dev-net-${slug}` bridge — no route to `vibn-postgres` / `vibn-frontend`.
|
||||
- [x] Kill switch: `/api/admin/path-b/{disable,enable}` flips a feature flag in <10s.
|
||||
- [x] `vibn-tools.ts`: 11 new Gemini tool defs, smoke test passes (63 tools accepted).
|
||||
- [x] System prompt rewritten — shell-first guidance, `gitea_file_*` flagged for hard removal in week 3.
|
||||
|
||||
**Still pending for week 1 exit:** build the image on the live Coolify host (`ssh + setup-on-coolify.sh`), end-to-end verify `devcontainer.ensure → shell.exec ls` against a real project once the frontend deploy lands.
|
||||
|
||||
### Week 2 — Preview URLs + iteration — **PARTIALLY SHIPPED 2026-04-28**
|
||||
|
||||
**Goal:** AI starts a dev server, user clicks a preview URL, sees their app.
|
||||
|
||||
- [ ] DNS: `*.preview.vibnai.com → coolify-host-ip` in OpenSRS. **Manual step, not yet done.**
|
||||
- [ ] Traefik wildcard cert via DNS-01 against OpenSRS. **Config staged in `vibn-dev/PREVIEWS.md`, not yet applied to live Traefik.**
|
||||
- [x] `dev_server.{start,stop,list,logs}` MCP tools. Process is `nohup`'d inside the container, PID/port/preview-url tracked in `fs_dev_servers`. Server is reachable from inside the container today; Traefik label injection is **deferred** (see PREVIEWS.md for the recommended pre-allocated-port-range approach).
|
||||
- [x] `fs.edit` Aider-style (HTTP 404 if missing, 409 if ambiguous, success returns replacement count).
|
||||
- [x] Per-container CPU/RAM caps: 1 vCPU / 1 GiB by default. Tier scaling via env var.
|
||||
- [x] System prompt rewritten with shell-first recipe.
|
||||
|
||||
**Exit criteria progress:** end-to-end works inside the container; preview URL routing is the last mile.
|
||||
|
||||
### Week 3 — Ship-it path + cleanup — **PARTIALLY SHIPPED 2026-04-28**
|
||||
|
||||
**Goal:** the dev container's working tree graduates to production.
|
||||
|
||||
- [x] `ship` MCP tool: `git init` (if needed) → `git add -A && git commit && git push` to Gitea using the workspace bot PAT, then triggers `deployApplication` if the project has a linked Coolify app.
|
||||
- [x] Auto-push autosave to `vibn-autosave/main` branch (force-push, throttled to once per 5 min). Endpoint: `POST /api/admin/path-b/autosave { projectId | sweep:true }`.
|
||||
- [x] Idle-suspend sweep: `POST /api/admin/path-b/idle-sweep[?minutes=30]`. Wire to a 5-min cron once we trust the suspend path.
|
||||
- [ ] Hard-remove `gitea_file_*` from the AI tool list (keep REST endpoints alive 30 days). **Deferred to next week so we can A/B the new tools first.**
|
||||
- [ ] Update `AI_CAPABILITIES.md`. **Deferred — will rewrite once eval data is in.**
|
||||
|
||||
**Exit criteria progress:** ship loop is functionally complete. Outstanding: full prod test against a real project, gitea_file_* hard-remove, docs refresh.
|
||||
|
||||
### Week 4 — Eval, polish, IDE drop-in
|
||||
|
||||
**Goal:** measure that this actually delivers the promised UX, ship the optional graduation path.
|
||||
|
||||
- [ ] **Eval harness:** 10 reference prompts (TODO app, marketplace, blog with auth, kanban, image-uploader, AI chatbot, simple e-commerce, dashboard, REST API + DB, static site). Measure: time-to-first-preview, time-to-shipped, AI tool-call count, success rate. Compare to a baseline run on Path A.
|
||||
- [ ] **Theia drop-in:** expose openvscode-server (already in the image) at `https://ide-{ws}-{project}.vibnai.com`. Optional toggle in chat UI: "Open IDE." Lets a user-becoming-developer drop into the same `/workspace` the AI's been editing.
|
||||
- [ ] **Bug fixes** found during eval.
|
||||
- [ ] **Docs:** update Vibn's user-facing pages to reflect the new "describe → live preview in seconds → iterate → ship" flow.
|
||||
|
||||
**Exit criteria:** eval shows ≥3× speedup on time-to-first-preview vs.
|
||||
Path A, ≥80% success rate on the 10 reference prompts.
|
||||
|
||||
---
|
||||
|
||||
## 6. OSS we will lean on (not reinvent)
|
||||
|
||||
| Need | OSS choice | Notes |
|
||||
|---|---|---|
|
||||
| Dev container image base | Ubuntu 24.04 + toolchains | We bake & maintain. ~1 GB. |
|
||||
| In-browser IDE (week 4 graduation path) | `openvscode-server` (`gitpod-io/openvscode-server`, MIT) | Pre-installed in the image. Optional toggle. |
|
||||
| Edit format | **Aider's search/replace block format** (`Aider-AI/aider`, Apache 2.0) | Borrow the format + error semantics. |
|
||||
| Process supervision inside the container | `tini` (already standard) + a tiny in-house supervisor for `dev_server.*` | No need for full systemd. |
|
||||
| Code search inside the container | `ripgrep` (`BurntSushi/ripgrep`, MIT) | Pre-installed. `fs.grep` is a thin wrapper. |
|
||||
| Git inside the container | `git` + `tea` (Gitea CLI, MIT) | `tea` lets the AI do PR ops without us building gitea_pr_* tools. |
|
||||
| Reference for end-to-end agent loops | `All-Hands-AI/OpenHands` (MIT) | Read their runtime + tool design. Don't import their code. |
|
||||
| Reference for fast iteration UX | `bolt.new` (`stackblitz/bolt.new`) | UX north star, not a code source. |
|
||||
|
||||
---
|
||||
|
||||
## 7. Risks & open questions
|
||||
|
||||
| Risk | Mitigation |
|
||||
|---|---|
|
||||
| **Dev containers eat money.** 100 active projects × 24/7 = ~$50/mo wasted. | Idle-suspend after 30 min. Resume in <5s. Per-plan caps. Auto-delete suspended-and-untouched volumes after 30 days. |
|
||||
| **`shell.exec` is the universal escape hatch — security?** AI inside a single workspace's container can do anything that container can do. | (a) Per-project Coolify isolation. (b) **Network policy: dev containers have NO route to internal Vibn services (vibn-postgres, vibn-frontend, Coolify control plane). Implemented via Docker network rules in week 1, not deferred.** (c) Audit log on every `shell.exec` call. (d) Per-container CPU/RAM caps absorb fork-bomb / coin-mining attempts. |
|
||||
| **Preview URL leaks.** `https://preview-mark-ceramic-market.vibnai.com` is publicly resolvable. | Default: random suffix in subdomain (`preview-mark-ceramic-market-7a3f.vibnai.com`) — ~64 bits of unguessability. Optional Vibn-session-cookie auth as paid-tier feature later. |
|
||||
| **Hot reload through Traefik.** WebSocket / HMR can be finicky over a reverse proxy. | **Spike on week 1, day 1**: bring up a Vite dev server inside vibn-dev, expose via Traefik, edit a file, verify HMR fires. Failure here is the biggest "things look fine until you actually test" risk; de-risk early. |
|
||||
| **Image size / pull time on first project.** ~1 GB pull adds 30–60s to first dev container spin-up. | (a) Pre-pull image on every Coolify host on deploy. (b) **Keep base image small (~500 MB: OS + git + ripgrep + supervisord + IDE server). Lazy-install language toolchains via `mise` on first project use.** Prevents the image from bloating to 4 GB six months from now. |
|
||||
| **Dependency cache poisoning.** Cached `node_modules` from project A bleeds into project B. | Caches are per-project (volume `vibn-dev-cache-{projectId}`). Never share. Take the slower-first-install hit; add a Verdaccio mirror later only if it bothers anyone. |
|
||||
| **AI keeps calling `gitea_file_*` instead of `shell.exec`.** | **Hard removal from AI's tool list in week 3, not soft deprecation.** Keep REST endpoints alive for a 30-day grace period for any external MCP client. After 30 days, return 410 Gone. The AI has no muscle memory; no graceful migration needed. |
|
||||
| **What if the user has no Vibn project yet?** | First chat creates a project + provisions its Coolify project + spins up `vibn-dev` lazily. ~10s overhead, one-time. Stream progress to the chat ("creating workspace... installing tools..."). Same UX bolt.new uses while WebContainers boot. |
|
||||
| **Coolify host disk dies → users lose unshipped `/workspace` work.** | **Auto-push to Gitea `vibn-autosave/main` branch every 5 min of activity, plus before idle-suspend.** Treat Gitea as canonical, container disk as ephemeral. Built in week 1, day 2 (not optional). |
|
||||
| **Path B turns out to be wrong; we need to revert.** | **Kill-switch admin endpoint (`POST /api/admin/path-b/disable`) flips a feature flag — all new chat sessions go back to Path A; existing dev containers drain.** ~10-min revert window. Built week 1. |
|
||||
|
||||
---
|
||||
|
||||
## 8. Success metrics
|
||||
|
||||
We're not done until **all four** are true on the eval harness:
|
||||
|
||||
| Metric | Target | Today (Path A) |
|
||||
|---|---|---|
|
||||
| Time-to-first-preview (10 reference prompts, p50) | ≤ 60 s | ~5 min |
|
||||
| Iteration loop (small edit → user sees change) p50 | ≤ 15 s | ~3 min |
|
||||
| Tool calls per "build me X" task (median) | ≤ 30 | ~80 |
|
||||
| End-to-end success rate (live deployable result) | ≥ 80% | ~50% |
|
||||
|
||||
---
|
||||
|
||||
## 9. What this changes about the existing roadmap
|
||||
|
||||
- **Tier 1.5 ("Code authoring capability") is collapsed into this doc.** C1–C9 mostly disappear (replaced by `shell.exec` + `fs.edit`); C10 ("persistent agent dev workspace") **is** Path B.
|
||||
- **Tier 1 P5.1–P5.4 are unchanged.** Domains, email, storage, workers — still the right next infra primitives. Path B doesn't replace them; it makes the AI capable enough to actually use them.
|
||||
- **Tier 2 P6.x** (backups, runtime logs, scoped keys) — unchanged.
|
||||
- **`gitea_*` tools shipped 2026-04-28** are now legacy. Mark deprecated in week 3. Remove in a future cleanup once telemetry confirms zero usage.
|
||||
|
||||
---
|
||||
|
||||
## 10. Decision needed before week 1 starts
|
||||
|
||||
1. **Approve Path B as the primary architecture for code authoring.** (If no, this doc dies here.)
|
||||
2. **Approve the dev-container-as-Coolify-service implementation choice.** Alternatives: separate dev-host, e2b self-host, Cloud Run jobs. Picked Coolify-service for zero new infra; flag if you want to revisit.
|
||||
3. **Approve the deprecation of `gitea_file_*` tools.** They were shipped today; deprecating them within 3 weeks is fine if the path forward is clearer, embarrassing if we keep them around as half-working alternates.
|
||||
4. **Approve the resource cap defaults** (free: 1 GB / 0.5 CPU, paid: 4 GB / 2 CPU). Or set different numbers.
|
||||
|
||||
Once those four are decided, week 1 starts.
|
||||
|
||||
---
|
||||
|
||||
## How to use this doc
|
||||
|
||||
- This is the *architectural* execution plan. The detailed task list
|
||||
goes into the agent's TodoWrite per-week, not into this file.
|
||||
- When an item ships, **move it from "planned" to "shipped"** in
|
||||
[`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) and link the commit/PR.
|
||||
- When a risk in §7 turns out to be real, document the mitigation
|
||||
outcome inline so future readers see what actually happened.
|
||||
- This doc supersedes the proposed Tier 1.5 in
|
||||
[`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md). Add a
|
||||
one-line pointer there once approved.
|
||||
*(Refer to `lib/ai/vibn-tools.ts` and `app/api/mcp/route.ts` for the live implementation).*
|
||||
|
||||
@@ -1,275 +1,11 @@
|
||||
# Project Page Architecture — Product / Infrastructure / Hosting
|
||||
# Project Page Architecture
|
||||
|
||||
> The plan to collapse the 16-page sidebar mess at
|
||||
> `/[workspace]/project/[projectId]/*` into 3 founder-friendly
|
||||
> sections, and to make `/project/<id>` actually reflect what the AI
|
||||
> is doing in the dev container instead of stale Gitea/prod-Coolify
|
||||
> data.
|
||||
>
|
||||
> **Companion to:** [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md)
|
||||
> (Path B is the engine; this doc is the dashboard for it).
|
||||
>
|
||||
> **Status:** week 1 doc + home-page redesign in flight (2026-04-28).
|
||||
> **Note:** The UI was heavily refactored. The primary surfaces for a project are now:
|
||||
|
||||
---
|
||||
1. **The Plan Tab (`/plan`):** Contains the project's vision/objective document, tasks, decisions, and raw ideas. The AI acts as a scribe here.
|
||||
2. **The Product Tab (`/product`):** Lists the live codebases (Gitea) and running images (Docker containers).
|
||||
3. **The Infrastructure Tab (`/infrastructure`):** Lists the underlying resources (PostgreSQL databases, Redis, etc.) managed by Coolify.
|
||||
4. **The Hosting Tab (`/hosting`):** Lists live runtime environments, logs, and preview URLs.
|
||||
5. **The Chat Panel:** Available on all project surfaces as a slide-out, used to orchestrate work.
|
||||
|
||||
## 1. Why this exists
|
||||
|
||||
Today the project page (`/[workspace]/project/[projectId]`) shows two
|
||||
tiles — Code + Infrastructure — and links to a sidebar with 16
|
||||
sub-routes (`build`, `run`, `infrastructure`, `deployment`,
|
||||
`overview`, `insights`, `analytics`, `prd`, `tasks`, `settings`,
|
||||
`assist`, `design`, `growth`, `grow`, `mvp-setup`, `code` — the last
|
||||
of which doesn't exist as a route, so the home tile is a dead link).
|
||||
|
||||
Two structural problems:
|
||||
|
||||
1. **The sidebar grew without an anchor concept.** Founders have no
|
||||
mental model of what the 16 pages map to; they just see a list
|
||||
and click around hoping for the right one. Half the pages are
|
||||
placeholders ("Coming soon"); the rest overlap.
|
||||
2. **None of the data sources have been updated for Path B.** The
|
||||
Code tile reads the Gitea repo (production master branch), but the
|
||||
AI now writes to the dev container's `/workspace`, often without
|
||||
pushing for hours. The Infrastructure tile reads production
|
||||
Coolify apps; new `dev_server.start` previews don't show up
|
||||
anywhere. So when AI does great work in chat, the project page
|
||||
doesn't update — the user has to tab back to chat to see anything.
|
||||
|
||||
---
|
||||
|
||||
## 2. The framing
|
||||
|
||||
Three sections, founder-friendly names, every project on Vibn maps
|
||||
cleanly into all three:
|
||||
|
||||
| Section | What it is | Founder asks… |
|
||||
|---|---|---|
|
||||
| **Product** | Custom code, design, content built for THIS vision | *"What did I build?"* |
|
||||
| **Infrastructure** | Reusable, swappable third-party services (auth, db, email, payments…) | *"What do I depend on?"* |
|
||||
| **Hosting** | Where the product runs and how people reach it (Coolify, domain, observability, cost) | *"Where does it live?"* |
|
||||
|
||||
### The boundary rule
|
||||
|
||||
> **Custom code = Product. Third-party service = Infrastructure.**
|
||||
> Runtime + reachability = Hosting.
|
||||
|
||||
Concrete edge cases:
|
||||
|
||||
- A custom `/api/upload` endpoint that calls S3 → endpoint is
|
||||
**Product**, S3 bucket + credentials are **Infrastructure**.
|
||||
- Custom job that sends a welcome email → job is **Product**, the
|
||||
job runner (Sidekiq/BullMQ) and email service (Resend) are
|
||||
**Infrastructure**.
|
||||
- Webhook handler that processes Stripe events → handler is
|
||||
**Product**, Stripe is **Infrastructure**.
|
||||
- Coolify scheduled task that runs your code → your code is
|
||||
**Product**, Coolify itself is **Hosting**.
|
||||
|
||||
---
|
||||
|
||||
## 3. Charters
|
||||
|
||||
### Product
|
||||
|
||||
Everything custom-built for this specific vision. The unique IP that
|
||||
wouldn't exist without this product.
|
||||
|
||||
**Includes:**
|
||||
- Frontend web app
|
||||
- Marketing site
|
||||
- Custom backend code & APIs
|
||||
- Custom business logic
|
||||
- Custom jobs / runners (the code, not the runner)
|
||||
- Brand, copy, design system
|
||||
- The repository itself
|
||||
- Customer base — the actual users you've earned
|
||||
|
||||
**Rule:** if you wrote it for this product, it's Product. If it's
|
||||
`node_modules` or a third-party SDK, it's not.
|
||||
|
||||
### Infrastructure
|
||||
|
||||
The reusable, swappable services your product depends on. The
|
||||
annoying multi-vendor world where you have to pick a provider.
|
||||
|
||||
**Includes:**
|
||||
- Auth provider (Clerk, Pocketbase, Authentik, Google OAuth, …)
|
||||
- Database (Postgres, MySQL, MongoDB, Redis, …)
|
||||
- File storage (S3, R2, MinIO)
|
||||
- Email (Resend, SendGrid, SES)
|
||||
- Payments (Stripe, Paddle, Lemon Squeezy)
|
||||
- Analytics (Plausible, PostHog, GA)
|
||||
- Search (Algolia, Meili, Typesense)
|
||||
- LLM provider (OpenAI, Anthropic, Gemini, Vertex)
|
||||
- Queues, maps, SMS, push notifications, …
|
||||
- Secrets and API keys that wire all of the above
|
||||
|
||||
**Rule:** if you could swap the vendor without changing your product
|
||||
code, it's Infrastructure.
|
||||
|
||||
### Hosting
|
||||
|
||||
Where the product physically runs and how people reach it.
|
||||
|
||||
**Includes:**
|
||||
- Container runtime (Coolify in our case)
|
||||
- Domain + DNS + SSL
|
||||
- CDN / edge
|
||||
- Observability (logs, errors, uptime)
|
||||
- Backups
|
||||
- Monthly cost
|
||||
|
||||
**Rule:** it's about *runtime and reachability,* not about what the
|
||||
software does.
|
||||
|
||||
---
|
||||
|
||||
## 4. Future sections (deferred)
|
||||
|
||||
Add as separate top-level cards once they become real concerns:
|
||||
|
||||
- **Models** — for AI-heavy products: which LLMs, which embedding
|
||||
model, prompt versions, eval scores, cost-per-call.
|
||||
- **Analytics** — when there are real users worth measuring.
|
||||
- **Marketing** — campaigns, blog, SEO, social, when there's a
|
||||
growth motion.
|
||||
- **Compliance** — Terms, Privacy, GDPR, SOC2, when shipping to
|
||||
paying customers.
|
||||
- **Support** — helpdesk, chat, status page, when there are
|
||||
customers complaining.
|
||||
- **Team** — when the project has more than one collaborator.
|
||||
|
||||
Same charter template each time. Same rule: code = Product,
|
||||
swappable = Infrastructure, runs/reachable = Hosting, otherwise it
|
||||
needs its own section.
|
||||
|
||||
---
|
||||
|
||||
## 5. Mapping today → tomorrow
|
||||
|
||||
| Today's page | Where it goes | Notes |
|
||||
|---|---|---|
|
||||
| `(home)/page.tsx` | New `(home)/page.tsx` (3-card grid) | Full redesign |
|
||||
| `code` (404) | `product/` (new) | Stub the route, point home tile at it |
|
||||
| `build` | Subroute under `product/files` (later) | Heavy 1626 lines; preserve the file tree component |
|
||||
| `run` | `hosting/` | Production runtime |
|
||||
| `infrastructure` | `hosting/` | Same data, different name |
|
||||
| `deployment` | `hosting/deploys` (later) | Deploy history is Hosting |
|
||||
| `overview` | Subroute under `product/` or merged into home | Decide once we see how home feels |
|
||||
| `prd` | Subroute under `product/` (vision) | Or its own "Define" section if we add one |
|
||||
| `tasks` | Subroute under `product/` (roadmap) | Or its own section later |
|
||||
| `assist` | `product/` (it's emails/chat your product sends) | These ARE product features |
|
||||
| `design` | `product/design` | Custom for this vision |
|
||||
| `growth`, `grow`, `analytics`, `insights`, `mvp-setup` | Defer, probably absorbed into a future "Analytics" or "Marketing" section | Many are placeholders today |
|
||||
| `settings` | Top-right gear (lives outside the 3 sections) | Project-level meta |
|
||||
|
||||
**Net:** 16 routes → 3 sections (+ settings). 8+ pages get rationalized
|
||||
into nothing because they were duplicating their neighbors.
|
||||
|
||||
---
|
||||
|
||||
## 6. Phased delivery
|
||||
|
||||
### Phase 1 — Tab navigation + section stubs (this session)
|
||||
|
||||
The three sections are TABS at the project level, not a card-grid
|
||||
landing page. A founder lands on the project URL and is immediately
|
||||
inside Product (the default tab); flipping to Infrastructure or
|
||||
Hosting is one click and stays in the same view. No
|
||||
intermediate "click a tile to drill in" step.
|
||||
|
||||
URL shape:
|
||||
|
||||
```
|
||||
/[workspace]/project/[id] → 308 redirect to /product
|
||||
/[workspace]/project/[id]/product → Product tab
|
||||
/[workspace]/project/[id]/infrastructure → Infrastructure tab
|
||||
/[workspace]/project/[id]/hosting → Hosting tab
|
||||
```
|
||||
|
||||
A shared layout at the project root renders:
|
||||
|
||||
- Project header (name, vision, stage pill, settings gear)
|
||||
- Tab bar (Product · Infrastructure · Hosting) — active tab
|
||||
highlighted; each tab carries a tiny status dot (green/amber/grey)
|
||||
- Slot for the active tab's page
|
||||
|
||||
The current `(home)/page.tsx` (the two-tile landing) is replaced by
|
||||
the redirect.
|
||||
|
||||
**Don't kill anything in `(workspace)/`.** Existing 16 routes stay
|
||||
alive while we migrate. Sidebar still works for them.
|
||||
|
||||
### Phase 2 — Wire data sources
|
||||
|
||||
- **Product card** reads from the dev container's `/workspace`:
|
||||
- File count + recent edits via `fs.list` against the project's
|
||||
dev container
|
||||
- User count from the project's auth provider (Pocketbase /
|
||||
Clerk / etc.)
|
||||
- Frontend URL from `dev_server.list` or production `apps_list`
|
||||
- **Infrastructure card** reads from Coolify databases, env vars,
|
||||
and known integrations:
|
||||
- Database type + size
|
||||
- Auth provider name
|
||||
- Wired services (any env var matching `STRIPE_*`, `RESEND_*`,
|
||||
etc.)
|
||||
- **Hosting card** reads from Coolify apps + domains + container metrics:
|
||||
- Production URL, SSL status, last deploy
|
||||
- Monthly cost (Coolify resource usage × pricing)
|
||||
- Recent error count (from logs)
|
||||
|
||||
### Phase 3 — Section detail pages
|
||||
|
||||
Build each of `/product`, `/infrastructure`, `/hosting` as a real,
|
||||
useful surface. Each page can have internal subnav for the bits
|
||||
listed in its charter (e.g., Product has Frontend, Backend, Jobs,
|
||||
Brand, Customers; Infrastructure has Auth, DB, Storage, Email,
|
||||
Payments, …).
|
||||
|
||||
### Phase 4 — Migration / deletion
|
||||
|
||||
Once the new structure is proven, redirect the legacy routes:
|
||||
|
||||
- `code` → `product`
|
||||
- `build` → `product/files`
|
||||
- `run` → `hosting`
|
||||
- `infrastructure` → `hosting`
|
||||
- `deployment` → `hosting/deploys`
|
||||
- `prd`, `tasks`, `assist` → `product/...`
|
||||
- `growth`, `grow`, `analytics`, `insights`, `mvp-setup` → soft-delete
|
||||
with a tombstone redirect to `product` or to a future section page.
|
||||
|
||||
---
|
||||
|
||||
## 7. Open questions
|
||||
|
||||
- **Where do the chat threads live?** They're a per-project
|
||||
conversation surface today (right rail in the chat panel). I'd
|
||||
argue they're not a section — they're *across* sections, like the
|
||||
AI is. Keep as the persistent right rail.
|
||||
- **Settings is technically project-level meta**, not one of the
|
||||
three sections. Where does it surface? Gear icon in the page
|
||||
header, opens settings as a side sheet or as a separate route.
|
||||
Decide when we get there.
|
||||
- **Mobile layout** — three cards stack vertically; no special
|
||||
layout needed. The section detail pages need a layout pass when
|
||||
we get to phase 3.
|
||||
|
||||
---
|
||||
|
||||
## 8. Success criteria
|
||||
|
||||
You should be able to look at `/project/<id>` after AI activity in
|
||||
chat and immediately see:
|
||||
|
||||
1. *"What did the AI just build?"* → Product card updated count of
|
||||
files + recent diffs.
|
||||
2. *"What's it depending on?"* → Infrastructure card shows the new
|
||||
Postgres, the new Stripe key, etc.
|
||||
3. *"Is it live?"* → Hosting card shows the dev preview URL or the
|
||||
production URL with status.
|
||||
|
||||
If any of those three answers requires going back to the chat or
|
||||
checking another page, the redesign hasn't worked.
|
||||
*(Refer to `vibn-frontend/app/[workspace]/project/[projectId]` for the UI implementation).*
|
||||
|
||||
@@ -1,258 +1,9 @@
|
||||
# Sentry-as-Product — Proposal
|
||||
# Sentry as a Product (Shipped)
|
||||
|
||||
> Today's Sentry wiring catches errors in **the Vibn platform**.
|
||||
> The bigger opportunity is wiring Sentry into **every project Vibn
|
||||
> ships**, then feeding those errors back into the user's AI chat.
|
||||
> Difference between "an AI that codes" and "an AI that owns the
|
||||
> product."
|
||||
> **Note:** This spec was implemented in May 2026.
|
||||
|
||||
## TL;DR
|
||||
|
||||
Today, when a Vibn user's deployed app crashes for real users:
|
||||
|
||||
```
|
||||
real user → site 500s → user closes tab, never tells founder
|
||||
→ founder finds out hours/days later (or never)
|
||||
→ AI in Vibn chat has zero idea anything is wrong
|
||||
```
|
||||
|
||||
The fix is to make every Vibn project ship with Sentry pre-wired,
|
||||
then expose the error feed to the AI as a tool. Total effort:
|
||||
**~8 hours**, in 4 stages, each independently shippable.
|
||||
|
||||
| Stage | Capability | Effort | Unlocks |
|
||||
|---|---|---|---|
|
||||
| 1 | Auto-provision a Sentry project per Vibn project on first deploy | ~3 hr | Real-user errors captured at all |
|
||||
| 2 | Bake Sentry into every scaffold template | ~2 hr | Capture works without user setup |
|
||||
| 3 | Add `project_recent_errors` MCP tool for the AI | ~2 hr | AI can answer "is anything broken?" |
|
||||
| 4 | Auto-surface unresolved errors at chat-turn start | ~1 hr | AI proactively offers fixes |
|
||||
|
||||
Total: **~8 hr**, no new infra (we already have Sentry org access,
|
||||
Coolify env API, scaffold templates, MCP tool registry).
|
||||
|
||||
---
|
||||
|
||||
## Why this is the right next investment
|
||||
|
||||
### The current loop is broken at the seam between user and platform
|
||||
|
||||
Vibn's value proposition is "the AI is your technical co-founder."
|
||||
That promise breaks the moment the AI's last commit causes a real
|
||||
user error and the AI doesn't know about it. The current loop:
|
||||
|
||||
```
|
||||
1. User describes feature in chat
|
||||
2. AI ships code
|
||||
3. AI says "deployed, give it a try"
|
||||
4. (silence)
|
||||
5. Real users hit edge cases → 500s → bounce
|
||||
6. Founder eventually notices via support ticket / analytics dip
|
||||
7. Founder pastes error back to AI
|
||||
8. AI fixes
|
||||
```
|
||||
|
||||
Steps 4–6 are dead air for the founder, **and the AI cannot help
|
||||
during them.** This is the gap that separates Vibn from "any IDE
|
||||
with an LLM."
|
||||
|
||||
### What it looks like with this proposal shipped
|
||||
|
||||
```
|
||||
1. User describes feature in chat
|
||||
2. AI ships code
|
||||
3. AI says "deployed, give it a try"
|
||||
4. Real users hit edge cases → 500s → Sentry captures
|
||||
5. (Founder opens Vibn chat 3 hrs later for unrelated reason)
|
||||
6. AI: "Hey — checkout has 500'd for 3 users in the last hour
|
||||
because `customer.email` is undefined on
|
||||
app/checkout/route.ts:47. Want me to fix it?"
|
||||
7. AI fixes, deploys, marks issue resolved in Sentry
|
||||
```
|
||||
|
||||
The AI becomes the on-call engineer. This is what "technical
|
||||
co-founder" actually means and we are 8 hours away from it.
|
||||
|
||||
### Why now (not Phase 4)
|
||||
|
||||
- The Sentry wiring we just shipped for vibn-frontend gave us:
|
||||
- A working Sentry org (`vibnai`)
|
||||
- An auth token with project-management scope
|
||||
- Verified knowledge that the build args / source maps flow works
|
||||
- A working `withSentryConfig` recipe in `vibn-frontend/next.config.ts`
|
||||
- All of those are reusable for stage 1 and 2 of this proposal.
|
||||
- Doing this **before** the beta means user projects start emitting
|
||||
error data on day one, so by the time we're debugging real beta
|
||||
user pain, we have a month of history to reason about.
|
||||
- Doing it after the beta means we'd have to retroactively
|
||||
instrument projects that have already been deployed for weeks.
|
||||
|
||||
---
|
||||
|
||||
## Stage 1 — Auto-provision a Sentry project per Vibn project (~3 hr)
|
||||
|
||||
**Goal:** when a user creates a Vibn project, the platform creates a
|
||||
matching Sentry project under the `vibnai` org and stashes the DSN
|
||||
+ auth token in Coolify env vars on the user's app.
|
||||
|
||||
**What gets built:**
|
||||
|
||||
1. **A `provisionSentryProject(projectId, name)` helper** in
|
||||
`vibn-frontend/lib/integrations/sentry.ts`. Calls Sentry's
|
||||
`POST /api/0/teams/vibnai/{team}/projects/` with the project
|
||||
slug, returns the DSN.
|
||||
2. **Hook into project-create flow** — on first successful deploy,
|
||||
call the helper and write the resulting DSN + auth token into
|
||||
Coolify env vars (`NEXT_PUBLIC_SENTRY_DSN`,
|
||||
`SENTRY_AUTH_TOKEN`) for that app via the same Coolify API we
|
||||
used today.
|
||||
3. **Idempotency** — if the Sentry project already exists, fetch
|
||||
its DSN instead of creating a duplicate. Same project name
|
||||
convention every time: `vibn-{workspace}-{projectSlug}`.
|
||||
4. **Storage** — store `sentryProjectSlug` and `sentryAuthTokenId`
|
||||
on the Postgres `projects` row so we can look them up later
|
||||
without re-walking the Sentry org.
|
||||
|
||||
**Risk:** Sentry's API rate-limits team-project creation. We bypass
|
||||
this by reading-before-writing, so the only API cost on subsequent
|
||||
deploys is one GET.
|
||||
|
||||
**Definition of done:** create a fresh Vibn project → check Sentry
|
||||
org → see a project named `vibn-{ws}-{slug}` → check Coolify env on
|
||||
that app → see DSN populated.
|
||||
|
||||
---
|
||||
|
||||
## Stage 2 — Bake Sentry into every scaffold template (~2 hr)
|
||||
|
||||
**Goal:** every Next.js / Vite / etc. starter template Vibn ships
|
||||
already has Sentry wired up. User does nothing.
|
||||
|
||||
**What gets built:**
|
||||
|
||||
1. **For each scaffold template in `vibn-frontend/lib/scaffold/`**,
|
||||
add the same files we shipped today:
|
||||
- `instrumentation.ts`
|
||||
- `instrumentation-client.ts`
|
||||
- `app/global-error.tsx` (Next.js) / equivalent boundary (Vite)
|
||||
- `next.config.ts` wrapped with `withSentryConfig` (Next.js)
|
||||
- `vite.config.ts` with `sentryVitePlugin` (Vite)
|
||||
- `Dockerfile` ARG declarations for `NEXT_PUBLIC_SENTRY_DSN` +
|
||||
`SENTRY_AUTH_TOKEN`
|
||||
2. **Add `@sentry/nextjs` (or `@sentry/react` + `@sentry/vite-plugin`)
|
||||
to each template's `package.json` `dependencies`.**
|
||||
3. **Document in template README** that Sentry is pre-wired and the
|
||||
user doesn't need to do anything.
|
||||
|
||||
**Risk:** Sentry's wrapper sometimes interacts badly with custom
|
||||
build configs (e.g. monorepos, custom webpack rules). Mitigation:
|
||||
the `errorHandler` we set today (`console.warn` instead of throw)
|
||||
ensures source map upload failures don't break builds.
|
||||
|
||||
**Definition of done:** scaffold a fresh Next.js project from Vibn
|
||||
templates → deploy → throw a test error → see it in Sentry,
|
||||
de-minified.
|
||||
|
||||
---
|
||||
|
||||
## Stage 3 — Expose error feed to the AI as MCP tools (~2 hr)
|
||||
|
||||
**Goal:** the AI can ask Sentry "what's broken in project X?" and
|
||||
get a real answer.
|
||||
|
||||
**What gets built:**
|
||||
|
||||
Three new MCP tools in `vibn-frontend/lib/ai/vibn-tools.ts`:
|
||||
|
||||
1. **`project_recent_errors { projectId, since?, limit? }`**
|
||||
- Returns: `[{ id, title, count, lastSeen, culprit, level }]`
|
||||
- Default `since`: 24h. Default `limit`: 10.
|
||||
- Filters to unresolved issues only.
|
||||
- Implementation: read `sentryProjectSlug` off the project row,
|
||||
call Sentry's `GET /api/0/projects/{org}/{slug}/issues/`.
|
||||
|
||||
2. **`project_error_detail { projectId, issueId }`**
|
||||
- Returns: `{ stacktrace, breadcrumbs, request, user, replay_url }`
|
||||
- Implementation: Sentry's `GET /api/0/issues/{id}/events/latest/`.
|
||||
|
||||
3. **`project_error_resolve { projectId, issueId }`**
|
||||
- Side-effect: marks the issue resolved in Sentry.
|
||||
- Used by the AI after it ships a fix and confirms via tests.
|
||||
- Implementation: Sentry's `PUT /api/0/issues/{id}/` with
|
||||
`status: "resolved"`.
|
||||
|
||||
**Auth:** token storage is per-project (from Stage 1's `projects`
|
||||
row). Each project's AI sees only its own project's errors. No
|
||||
cross-project leakage.
|
||||
|
||||
**Definition of done:** in a Vibn chat for a project with known
|
||||
errors, ask the AI "any errors lately?" → AI calls
|
||||
`project_recent_errors` → shows real list.
|
||||
|
||||
---
|
||||
|
||||
## Stage 4 — Auto-surface unresolved errors at chat-turn start (~1 hr)
|
||||
|
||||
**Goal:** the AI doesn't wait to be asked. When the user opens a
|
||||
chat and there are unresolved errors, the AI mentions them on the
|
||||
first turn.
|
||||
|
||||
**What gets built:**
|
||||
|
||||
In `vibn-frontend/app/api/chat/route.ts`, at the start of each chat
|
||||
turn (before calling the model):
|
||||
|
||||
1. Call the same `project_recent_errors` logic Stage 3 exposed.
|
||||
2. If `count > 0`, prepend a synthetic system message:
|
||||
|
||||
```
|
||||
[PROJECT HEALTH]
|
||||
{N} unresolved Sentry issues in the last 24 hours:
|
||||
- {title} (×{count}, last seen {time}) — {culprit}
|
||||
- ...
|
||||
|
||||
If the user's first message is unrelated to these, you may still
|
||||
proactively mention them: "Quick FYI before we get into that —
|
||||
{X} has been failing for users."
|
||||
|
||||
If their message IS about a broken thing, prefer the matching
|
||||
Sentry issue's stack trace over guessing.
|
||||
```
|
||||
|
||||
3. Only fire this once per N chat turns (configurable, default 1
|
||||
per session opening) — we don't want to spam every turn.
|
||||
|
||||
**Risk:** false alarms (Sentry issue from yesterday's deploy that
|
||||
no one cares about anymore) make the AI annoying. Mitigation:
|
||||
tighten the `since` window to the last 6h, and only surface issues
|
||||
with `count >= 2` (one-off errors don't count).
|
||||
|
||||
**Definition of done:** intentionally break a deployed user
|
||||
project, open chat, type "what's up?" → AI's first response
|
||||
mentions the issue, with file path.
|
||||
|
||||
---
|
||||
|
||||
## Out of scope for this proposal
|
||||
|
||||
- **User-owned Sentry orgs.** Some users will eventually want their
|
||||
own Sentry account, not the shared `vibnai` org. Ship-later;
|
||||
doesn't block the loop. Easy retrofit because storage is already
|
||||
per-project.
|
||||
- **Performance / Tracing data.** Sentry also captures spans /
|
||||
traces. Useful for "this endpoint is slow" but not the urgent
|
||||
product loop. Ship-later.
|
||||
- **Front-end UI for errors in Vibn.** A "Health" tab showing the
|
||||
Sentry feed in the Vibn UI is nice but not required for the AI
|
||||
loop to work. Ship-later.
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
Add a **Phase 2.9 (Sentry-as-product loop)** to `BETA_LAUNCH_PLAN.md`
|
||||
covering Stages 1–4 as a single bundle. Estimate: **8 hr engineering**.
|
||||
|
||||
This is the second-highest-leverage item still ahead of beta,
|
||||
behind only the deploy-failed webhook (which is 30 min). Every
|
||||
hour spent here directly upgrades the value of every other beta
|
||||
test session that follows it.
|
||||
## Architecture
|
||||
- Sentry is automatically provisioned for every new project (`lib/integrations/sentry.ts`).
|
||||
- Environment variables (`NEXT_PUBLIC_SENTRY_DSN` and `SENTRY_AUTH_TOKEN`) are injected into the Coolify app.
|
||||
- The AI has access to `project_recent_errors`, `project_error_detail`, and `project_error_resolve` MCP tools to automatically read, diagnose, and fix exceptions directly from the Sentry API.
|
||||
- If unhandled exceptions are firing, the AI is prompted at the start of a conversation to address them (`app/api/chat/route.ts`).
|
||||
|
||||
Reference in New Issue
Block a user