From 057115a9fc4e93bd02688b068086446e6b7592e5 Mon Sep 17 00:00:00 2001 From: mawkone Date: Thu, 7 May 2026 15:07:31 -0700 Subject: [PATCH] docs: heavily compress and simplify remaining reference files to represent current state --- AI_CAPABILITIES.md | 926 +--------------------- BETA_LAUNCH_PLAN.md | 23 - docs/AGENT_TELEMETRY_STREAMING_PROJECT.md | 293 +------ docs/AI_CAPABILITIES_ROADMAP.md | 674 +--------------- docs/AI_HARNESS_GAPS.md | 231 +----- docs/AI_PATH_B_EXECUTION_PLAN.md | 294 +------ docs/PROJECT_PAGE_ARCHITECTURE.md | 280 +------ docs/SENTRY_AS_PRODUCT.md | 263 +----- 8 files changed, 58 insertions(+), 2926 deletions(-) diff --git a/AI_CAPABILITIES.md b/AI_CAPABILITIES.md index 66a86f8..85df5e7 100644 --- a/AI_CAPABILITIES.md +++ b/AI_CAPABILITIES.md @@ -1,904 +1,22 @@ -# Vibn AI Capabilities - -> The full set of actions an AI agent can take on behalf of a Vibn workspace, -> along with the REST endpoints, MCP tools, and safety rails that back them. -> -> **Audience:** agent authors, Cursor rule writers, MCP tool designers, and -> anyone building on the Vibn control plane. -> -> **Scope:** everything an agent sees through `https://vibnai.com/api/*` and -> the `/api/mcp` bridge. No Firestore, no internal agent orchestration — -> just the tenant-safe capability surface. - ---- - -## 1. Mental model - -Every capability in this document operates on a single **workspace**. A -workspace is Vibn's tenant boundary and maps 1:1 to: - -| Vibn concept | External identity | Example (`mark`) | -|---|---|---| -| Workspace | `vibn_workspaces.slug` | `mark` | -| Gitea org | `gitea_org` | `vibn-mark` | -| Gitea bot user | `gitea_bot_username` | `mark-bot` | -| SSH deploy keypair | `coolify_private_key_uuid` + `gitea_bot_ssh_key_id` | registered on both sides | -| Coolify project | `coolify_project_uuid` | `vibn-ws-mark` | -| Coolify environment | `coolify_environment_name` | `production` | -| Domain namespace | `*.{slug}.vibnai.com` | `*.mark.vibnai.com` | -| AI token | `vibn_sk_…` | one per agent/device | - -A single agent token can only act on the workspace it was minted for. Cross- -workspace access is structurally impossible — enforced in -[`lib/coolify.ts`](./vibn-frontend/lib/coolify.ts) by matching every Coolify -resource's `environment_id` against the workspace's project environments -(`ensureResourceInProject`). - -### The three views - -All capabilities roll up into three user-facing surfaces: - -- **Code** — every Gitea repo under `vibn-{slug}/`. -- **Live** — every Coolify app/database/service in `vibn-ws-{slug}`, each - reachable under `*.{slug}.vibnai.com`. -- **IDE** — Browser-based agent workspace sessions (outside the scope of this doc). - ---- - -## 2. Authentication - -Every agent-facing endpoint accepts **either**: - -- `Authorization: Bearer vibn_sk_` — a workspace-scoped API key - minted in the settings panel. Stored as a sha256 hash server-side; the - plaintext is shown exactly once on creation. Can be revoked at any time. -- A NextAuth session cookie — used for the dashboard UI and for browser - debugging. Not suitable for long-running agents. - -Helper: [`requireWorkspacePrincipal()`](./vibn-frontend/lib/auth/workspace-auth.ts) -resolves either to a `WorkspacePrincipal { workspace, user?, source }`. - -**403 on a tenant mismatch means:** the token is valid, but the resource -belongs to another workspace. The agent should stop and ask the user. - ---- - -## 3. MCP surface - -The MCP bridge lives at `POST https://vibnai.com/api/mcp`. It takes -JSON-over-HTTP bodies shaped like: - -```json -{ "tool": "", "params": { /* tool-specific */ } } -``` - -The Cursor / Claude Desktop config block is auto-generated in the settings -panel and looks like: - -```json -{ - "mcpServers": { - "vibn-mark": { - "url": "https://vibnai.com/api/mcp", - "headers": { "Authorization": "Bearer vibn_sk_…" } - } - } -} -``` - -`GET /api/mcp` returns a self-description with the current tool list. -Version: **2.1.0**. - -### 3.1 Workspace & identity tools - -| Tool | Purpose | Params | -|---|---|---| -| `workspace.describe` | Returns slug, Coolify project uuid, Gitea org, provision status. | — | -| `gitea.credentials` | Returns the bot's username, PAT, clone URL template, and SSH remote template. Use this for every `git clone`/push — never other credentials. | — | - -### 3.2 Project tools - -| Tool | Purpose | Params | -|---|---|---| -| `projects.list` | Lists Vibn projects (PRDs, imports, etc.) in the workspace. | — | -| `projects.get` | Single project details. | `{ projectId }` | - -### 3.3 Application tools - -| Tool | Purpose | Params | -|---|---|---| -| `apps.list` | All Coolify apps in the workspace. | — | -| `apps.get` | Single app details (status, fqdn, domains, git info). | `{ uuid }` | -| `apps.create` | Create a Coolify app. **Four pathways** — pick the one that matches your source. **(1) Gitea repo** (user's own code): pass `repo`. Clones over HTTPS+PAT; no SSH. **(2) Docker image** (pre-built single-container third-party app, e.g. `nginx:alpine`): pass `image`. **(3) Inline Docker Compose YAML** (custom multi-service stack): pass `composeRaw`. **(4) Coolify one-click template** (RECOMMENDED for popular apps — Twenty, n8n, Supabase, Ghost, etc): pass `template` with a slug from `apps.templates.search`. Templates have battle-tested env defaults, healthchecks, and `depends_on` graphs. **Use pathway 4 over pathway 3 whenever a template exists** — it is dramatically more reliable. Auto-domain `{name}.{slug}.vibnai.com` for all pathways. | **(1) repo:** `{ repo, branch?, name?, ports?, buildPack?, domain?, envs?, instantDeploy?, dockerComposeLocation?, dockerfileLocation?, baseDirectory? }` **(2) image:** `{ image, name?, ports?, domain?, envs?, instantDeploy? }` **(3) composeRaw:** `{ composeRaw, name?, domain?, envs?, instantDeploy? }` **(4) template:** `{ template, name?, domain?, envs?, instantDeploy? }` | -| `apps.update` | PATCH a whitelisted set of fields (name, description, git branch/commit, ports, build commands, base directory, Dockerfile location, docker-compose location…). Returns `applied`, `ignored`, and `rerouted` arrays so the agent can see exactly what persisted; setting `fqdn`/`domains`/`docker_compose_domains` returns a `rerouted` entry pointing at `apps.domains.set`, and setting `git_repository` returns one pointing at `apps.rewire_git`. | `{ uuid, patch }` | -| `apps.rewire_git` | Re-point an app's `git_repository` at the canonical HTTPS+PAT clone URL. Use to recover older apps that were created with SSH URLs, or to refresh a rotated bot PAT. | `{ uuid, repo? }` — `repo` optional; inferred from current URL if omitted | -| `apps.delete` | Destroy the app. Volumes kept by default. | `{ uuid, confirm }` — `confirm` must equal the app's exact name | -| `apps.deploy` | Trigger a new deployment. | `{ uuid, force? }` | -| `apps.deployments` | List recent deployments + status. | `{ uuid }` | -| `apps.logs` | Runtime logs for a running app. Compose-aware: returns per-service logs for `dockercompose` build packs, single stream for `dockerfile`/`nixpacks`. Includes container status and any diagnostic warnings. | `{ uuid, service?, lines? }` — `service` filter (compose only), `lines` default 200, max 5000 | -| `apps.volumes.list` | List Docker volumes belonging to an app (name + size in bytes). Use before `apps.volumes.wipe` to know exact volume names. | `{ uuid }` | -| `apps.volumes.wipe` | **Destructive / irreversible.** Stop all app containers, remove a specific volume, leave it ready for a fresh `apps.deploy`. Use to recover from stale DB state on first boot (the most common compose app failure). `confirm` must equal the exact volume name. | `{ uuid, volume, confirm }` | -| `apps.containers.up` | Run `docker compose up -d` directly on the Coolify host for a compose app or service. Bypasses Coolify's queued-start worker (which routinely fails to actually invoke compose). Use after env or domain changes to recreate containers, or as a recovery path when `apps.create`/`apps.deploy` returned `started: false`. Idempotent — already-running containers are no-op'd. Up to 10 min timeout. Returns `{ ok, code, stdout, stderr, durationMs }`. | `{ uuid }` | -| `apps.containers.ps` | `docker compose ps -a` against the rendered compose dir. Quick diagnostic for "why isn't my stack running?" — distinguishes `Created` (queued-start failure → use `apps.containers.up`), `Exited` (app crash → use `apps.logs`), `Restarting` (boot loop → use `apps.logs`), and `Up healthy/unhealthy`. | `{ uuid }` | -| `apps.templates.list` | Browse the full Coolify one-click template catalog (320+ vetted apps: CRMs, AI tools, CMSes, dashboards, databases, …). Each entry is deployable via `apps.create({ template: })`. Returns `{ total, offset, limit, items: [{ slug, slogan, tags, port, documentation, logo }] }`. Catalog is fetched from upstream and cached for 1h. | `{ limit?, offset?, tag? }` — `limit` default 50, max 500; `tag` substring filter (e.g. `"crm"`, `"ai"`) | -| `apps.templates.search` | Find templates by name, tag, or slogan. Ranked: exact-slug > slug-starts-with > slug-contains > tag-exact > tag-contains > slogan. Use this **before** `apps.create` to discover the right slug (e.g. `"twenty"`, `"n8n-with-postgres-and-worker"`, `"forgejo-with-postgresql"`). | `{ query, tag?, limit? }` — `limit` default 25, max 100. Either `query` or `tag` must be set | -| `apps.exec` | Run a one-shot command inside an app container (via `docker exec` on the Coolify host). Compose-aware: pass `service` when the app has >1 container. Returns `{ container, service, code, stdout, stderr, truncated, durationMs, containerHealth }`. Default timeout 60s (max 10 min); default output cap 1 MB (max 5 MB). Command is run through `sh -lc` so shell syntax works. Use this for database migrations, seeds, CLI invocations, and ad-hoc debugging. Every call is audit-logged (command + target, not output). | `{ uuid, command, service?, user?, workdir?, timeout_ms?, max_bytes? }` | -| `apps.domains.list` | Current domain set. | `{ uuid }` | -| `apps.domains.set` | Replace the domain set. All entries must end with `.{slug}.vibnai.com`. Compose-aware: for `dockercompose` apps the domain is attached to a specific service (`server` by default; override with `service`). | `{ uuid, domains: string[], service? }` | -| `apps.envs.list` | List env vars. Values returned are redacted for `shown-once` secrets. | `{ uuid }` | -| `apps.envs.upsert` | Create or update an env var. `is_build_time` is **ignored** — Coolify derives build-vs-runtime from Dockerfile `ARG` usage. | `{ uuid, key, value, isPreview?, isMultiline?, isLiteral?, isShownOnce? }` | -| `apps.envs.delete` | Delete an env var. | `{ uuid, key }` | - -### 3.4 Database tools - -| Tool | Purpose | Params | -|---|---|---| -| `databases.list` | All databases in the workspace, across all flavors. | — | -| `databases.create` | Provision a database. Supported `type`: `postgresql`, `mysql`, `mariadb`, `mongodb`, `redis`, `keydb`, `dragonfly`, `clickhouse`. | `{ type, name?, isPublic?, publicPort?, image?, credentials?, limits? }` | -| `databases.get` | Details + internal connection URL. | `{ uuid }` | -| `databases.update` | PATCH name, public visibility, image, limits. | `{ uuid, patch }` | -| `databases.delete` | Destroy the database. Volumes kept by default. | `{ uuid, confirm }` — `confirm` must equal the db's exact name | - -### 3.5 Auth provider tools - -Authentication is a first-class capability. An agent cannot spin up arbitrary -Coolify services — only vetted auth providers from an allowlist. - -| Tool | Purpose | Params | -|---|---|---| -| `auth.list` | Auth providers currently deployed in the workspace (classified by Coolify's `service_type`). | — | -| `auth.create` | Provision one of the allowed providers. | `{ provider, name?, description?, instantDeploy? }` | -| `auth.delete` | Destroy an auth provider. Volumes (user data) kept by default. | `{ uuid, confirm }` — `confirm` must equal the service's exact name | - -**Allowed providers** (keys passed as `provider`): - -- `pocketbase` — lightweight (SQLite) auth + data, single container. -- `authentik` — feature-rich self-hosted IDP. -- `keycloak` / `keycloak-with-postgres` — industry-standard OIDC/SAML. -- `pocket-id` / `pocket-id-with-postgresql` — passkey-first OIDC. -- `logto` — dev-first IDP. -- `supertokens-with-postgresql` — session/auth backend. - -Requesting anything outside this list returns 400 with a hint listing the -allowed ones, so the agent can self-correct. - -### 3.6 Domain tools (P5.1 — custom apex domains) - -Custom apex domains are owned end-to-end by Vibn: the registrar is OpenSRS -(Tucows), authoritative DNS is Google Cloud DNS in the Canadian project, and -domains are pinned to the workspace that registered them. All four lifecycle -steps — search, register, attach, inspect — are agent-callable. - -| Tool | Purpose | Params | -|---|---|---| -| `domains.search` | Check availability + price for one or more candidate apex domains via OpenSRS. Stateless; does not reserve anything. | `{ names: string[], period?: number }` — `names` up to 25, `period` in years (auto-bumped for quirky TLDs like `.ai` which requires 2y minimum). | -| `domains.register` | Register a domain through OpenSRS. Registers unlocked; locking happens automatically after `domains.attach` completes. Idempotent per `(workspace, domain)`. | `{ domain, period?, whoisPrivacy?, contact, nameservers?, ca?: { cprCategory, legalType } }` — `ca.*` required for `.ca`. | -| `domains.list` | List all domains owned by the workspace with their status, registrar order id, expiry, and DNS provider/zone. | — | -| `domains.get` | Full record + last 20 lifecycle events. | `{ domain }` | -| `domains.attach` | Wire a registered domain to a Coolify app (or arbitrary IP/CNAME): create Cloud DNS zone, write A/CNAME rrsets, update registrar-side nameservers, append FQDNs to the Coolify app's domain list. Idempotent; safe to retry. | `{ domain, appUuid? \| ip? \| cname?, subdomains?: string[] (default ["@","www"]), updateRegistrarNs? }` | - -### Object storage (GCS via S3-compatible HMAC) - -Every workspace gets a Canada-hosted GCS bucket, a dedicated service -account, and an HMAC keypair so agent-built apps can use any AWS S3 -SDK. The HMAC *secret* is never returned through the API — it's written -directly into Coolify apps via `storage.inject_env`. - -| Tool | Purpose | Params | -|---|---|---| -| `storage.describe` | Report the workspace bucket name, region, S3 endpoint, access-key id, and provision status. No secret returned. | — | -| `storage.provision` | Idempotently create/reconcile the workspace's GCP service account, JSON keyfile, bucket (`vibn-ws-{slug}-{rand}`), IAM binding, and HMAC key. Safe to re-run. | — | -| `storage.inject_env` | Push `STORAGE_*` env vars (endpoint, region, bucket, access key id, secret access key, force_path_style) into a Coolify app. The secret is written server-side with `is_shown_once=true`; it never transits the response body. | `{ uuid, prefix? }` — `prefix` defaults to `STORAGE_`; use `S3_` for apps that expect AWS-standard names | - -The bucket is S3-compatible: point any `aws-sdk` / `@aws-sdk/client-s3` -/ `boto3` at `STORAGE_ENDPOINT` with `force_path_style=true` (`STORAGE_*` -env vars are set by `storage.inject_env`). - -**Residency note:** Cloud DNS is global anycast — configuration is not -Canadian-pinned at the storage layer. The workspace-level `dns_provider` -flag (default `cloud_dns`) will let us swap in CIRA D-Zone for strict -Canadian residency without touching the MCP surface. - -**Billing:** Every successful `domains.register` writes a `debit` row to -`vibn_billing_ledger` with the OpenSRS order id as `ref_id`. The -`vibn_domain_events` table keeps an append-only audit of every lifecycle -call (`register.attempt`, `register.success`, `register.failed`, -`attach.success`). - -**Verified end-to-end (2026-04-22)** against PROD GCP + OpenSRS sandbox + -PROD Coolify (Coolify `v4.0.0-beta.473`); see -`vibn-frontend/scripts/smoke-attach-e2e.ts`. **All 5 sub-systems green.** - -- ✓ OpenSRS register against Horizon (sandbox) returns order id, response 200. -- ✓ Cloud DNS managed zone created in `master-ai-484822` with public anycast NS. -- ✓ A records (`@`, `www`) written to the zone. -- ✓ Registrar-side nameserver update accepts Cloud DNS NS values - (trailing-dot normalization in `lib/opensrs.ts`); sandbox returns 480 - because its mock registry doesn't know real Google NS hosts, which is - expected — live mode talks to real registries that accept any resolvable NS. -- ✓ Unlock → update NS → relock fallback path verified (sandbox-recognized - nameservers return 200; the unlock/relock sequence is exercised when the - registry returns 405 lock-conflict). -- ✓ Coolify domain-list PATCH adds the apex + `www` to the application - `fqdn` column and the smoke test re-fetches it to confirm. - -> **Operational gotcha — the destination server must be proxy-enabled.** -> Coolify's `update_by_uuid` controller accepts `domains` as a comma-separated -> list and only maps it onto the model's `fqdn` column when the destination -> server's `Server::isProxyShouldRun()` returns `true`. That helper requires -> **both** `proxy.type ∈ {TRAEFIK, CADDY}` *and* `is_build_server = false`. -> If either is misconfigured the PATCH returns 200 but the field is silently -> dropped (Laravel mass-assignment ignores `domains` because it isn't in -> `$fillable`, and the controller never copies it into `fqdn`). We hit this -> on `coolify-server-mtl` (`zg4cwgc44ogc08804000gggo`), which had -> `proxy=null` and `is_build_server=true`. Fixed by: -> -> ```sql -> UPDATE servers -> SET proxy = jsonb_set(coalesce(proxy,'{}'::jsonb), '{type}', '"TRAEFIK"') -> WHERE uuid = 'zg4cwgc44ogc08804000gggo'; -> UPDATE server_settings -> SET is_build_server = false -> WHERE server_id = (SELECT id FROM servers WHERE uuid = 'zg4cwgc44ogc08804000gggo'); -> ``` -> -> followed by `docker restart coolify` to clear Laravel's in-memory config. -> Sending `fqdn` directly is **not** an alternative — the controller's -> `$allowedFields` whitelist rejects it with 422 "This field is not allowed." - -### 3.7 Agent-side stdio MCP servers (`vibn-agent-runner`) - -Separate from the control-plane MCP at `/api/mcp` (which is what external -agents call *into* Vibn), the `vibn-agent-runner` exposes its own in-house -tool surface *outward* over stdio MCP. This lets Cursor, Claude Desktop, -Goose, or any MCP-speaking client drive the same Coolify / Gitea / workspace -tooling the Coder/PM/Marketing sub-agents use internally — with the same -protected-repo and protected-app guardrails enforced centrally. - -Architecture: every tool now has three touch-points backed by one source of truth: - -``` -vibn-agent-runner/src/tools/-api.ts ← pure, config-agnostic logic + security guards -vibn-agent-runner/src/tools/.ts ← thin registerTool() wrappers for the in-process agent loop -vibn-agent-runner/src/mcp/-server.ts ← stdio MCP server for external clients -``` - -| Server | Tools | Required env | -|---|---|---| -| `vibn-coolify-mcp` | 7 — list_projects, list_applications, deploy, get_logs, list_all_apps, get_app_status, deploy_app | `COOLIFY_API_URL`, `COOLIFY_API_TOKEN` | -| `vibn-gitea-mcp` | 6 — create/list/close issues, list_repos, list_all_issues, read_repo_file | `GITEA_API_URL`, `GITEA_API_TOKEN`, `GITEA_USERNAME` | -| `vibn-workspace-mcp` | 8 — read/write/replace/list/find/search_code, execute_command, git_commit_and_push | `WORKSPACE_ROOT` (+ Gitea creds for git push) | -| `vibn-platform-mcp` | 7 — save_memory, list_memory, list_skills, get_skill, finalize_prd, get_prd, web_search | `SESSION_KEY` (optional), Gitea creds (for skills) | -| `vibn-agent-mcp` | 2 — spawn_agent, get_job_status (dispatches into the runner's HTTP API) | `AGENT_RUNNER_URL` (defaults to `http://localhost:3333`) | - -Run locally with `npm run mcp:` (or `:dev` via ts-node) in -`vibn-agent-runner/`. Smoke-test any server with -`node scripts/smoke-mcp.js `. The in-process agent loop still sees -the same 28 registered tools — no behavioral regression. - ---- - -## 4. REST surface - -Every MCP tool is also exposed as a plain HTTP endpoint under -`/api/workspaces/{slug}/…`. Agents that prefer curl-style access can use -these directly; the shape is identical to the MCP `params`. Auth is the -same bearer header. - -### 4.1 Workspace & key management - -| Method | Path | Description | -|---|---|---| -| GET | `/api/workspaces` | All workspaces the principal has access to. | -| GET | `/api/workspaces/{slug}` | Workspace details. | -| POST | `/api/workspaces/{slug}/provision` | Idempotent re-run of Gitea org + bot + SSH keypair + Coolify project setup. | -| GET | `/api/workspaces/{slug}/keys` | List API keys (metadata only). | -| POST | `/api/workspaces/{slug}/keys` | Mint a new API key. Full token returned once. | -| DELETE | `/api/workspaces/{slug}/keys/{keyId}` | Revoke a key. | -| GET | `/api/workspaces/{slug}/gitea-credentials` | Return bot username, PAT (decrypted), clone/SSH templates. | -| GET | `/api/workspaces/{slug}/bootstrap.sh` | Shell script that writes `.cursor/rules`, `.cursor/mcp.json`, `.env.local` into the cwd. | - -### 4.2 Applications - -| Method | Path | Description | -|---|---|---| -| GET | `/api/workspaces/{slug}/apps` | List apps. | -| POST | `/api/workspaces/{slug}/apps` | Create an app from a workspace repo. | -| GET | `/api/workspaces/{slug}/apps/{uuid}` | App details. | -| PATCH | `/api/workspaces/{slug}/apps/{uuid}` | Update whitelisted fields. | -| DELETE | `/api/workspaces/{slug}/apps/{uuid}?confirm=` | Destroy app. | -| POST | `/api/workspaces/{slug}/apps/{uuid}/deploy` | Trigger deploy. | -| GET | `/api/workspaces/{slug}/apps/{uuid}/deployments` | List deployments. | -| GET | `/api/workspaces/{slug}/apps/{uuid}/domains` | List domains. | -| PATCH | `/api/workspaces/{slug}/apps/{uuid}/domains` | Replace domain set. | -| GET | `/api/workspaces/{slug}/apps/{uuid}/envs` | List env vars. | -| PATCH | `/api/workspaces/{slug}/apps/{uuid}/envs` | Upsert env var(s). | -| DELETE | `/api/workspaces/{slug}/apps/{uuid}/envs?key=FOO` | Delete env var. | -| GET | `/api/workspaces/{slug}/deployments/{deploymentUuid}/logs` | Deployment logs. | - -### 4.3 Databases - -| Method | Path | Description | -|---|---|---| -| GET | `/api/workspaces/{slug}/databases` | List databases. | -| POST | `/api/workspaces/{slug}/databases` | Create a database (8 flavors). | -| GET | `/api/workspaces/{slug}/databases/{uuid}` | Database details + internal connection URL. | -| PATCH | `/api/workspaces/{slug}/databases/{uuid}` | Update fields. | -| DELETE | `/api/workspaces/{slug}/databases/{uuid}?confirm=` | Destroy database. | - -### 4.4 Auth providers - -| Method | Path | Description | -|---|---|---| -| GET | `/api/workspaces/{slug}/auth` | List deployed auth providers + the allowlist. | -| POST | `/api/workspaces/{slug}/auth` | Provision a provider from the allowlist. | -| GET | `/api/workspaces/{slug}/auth/{uuid}` | Provider details. | -| DELETE | `/api/workspaces/{slug}/auth/{uuid}?confirm=` | Destroy provider. | - -### 4.5 Domains (P5.1) - -| Method | Path | Description | -|---|---|---| -| POST | `/api/workspaces/{slug}/domains/search` | Availability + pricing for up to 25 candidate names. | -| GET | `/api/workspaces/{slug}/domains` | List workspace-owned domains. | -| POST | `/api/workspaces/{slug}/domains` | Register a domain (idempotent per `(workspace, domain)`). | -| GET | `/api/workspaces/{slug}/domains/{domain}` | Full record + last 20 events. | -| POST | `/api/workspaces/{slug}/domains/{domain}/attach` | Create Cloud DNS zone, write records, update registrar NS, wire Coolify domain list. | - ---- - -## 5. Gitea surface - -AI agents **never** talk to the root Gitea admin token. They use the -workspace's dedicated bot user. - -### 5.1 What the bot can do - -- Fully own the `vibn-{slug}` org (added as the org's owner team). -- Read/write every repo in that org via its PAT. -- Push over SSH using the workspace's ed25519 deploy key (same keypair - Coolify uses to pull code). -- What it **cannot** do: touch any other org, the root admin surface, or - Gitea's `/admin/*` endpoints. - -### 5.2 How to get the bot credentials - -```http -GET /api/workspaces/{slug}/gitea-credentials -Authorization: Bearer vibn_sk_… -``` - -Returns: - -```json -{ - "bot": { "username": "mark-bot", "token": "…" }, - "gitea": { - "apiBase": "https://git.vibnai.com/api/v1", - "host": "git.vibnai.com", - "cloneUrlTemplate": "https://mark-bot:{{token}}@git.vibnai.com/vibn-mark/{{repo}}.git", - "sshRemoteTemplate": "git@git.vibnai.com:vibn-mark/{{repo}}.git", - "webUrlTemplate": "https://git.vibnai.com/vibn-mark/{{repo}}" - }, - "workspace": { "slug": "mark", "giteaOrg": "vibn-mark" } -} -``` - -The PAT is stored **encrypted at rest** using AES-256-GCM with the -`VIBN_SECRETS_KEY` server secret; the decrypt step runs only on this endpoint. - -### 5.3 Gitea operations via the standard Gitea API - -Once the agent has `{bot.token, gitea.apiBase}`, it can call any standard -Gitea v1 endpoint as the bot, scoped to the workspace org. Common ones: - -- `POST /orgs/{org}/repos` — create a repo. -- `PATCH /repos/{org}/{repo}` — update repo settings. -- `GET /repos/{org}/{repo}/contents/{path}` — read files. -- `PUT /repos/{org}/{repo}/contents/{path}` — write files (commits). -- `POST /repos/{org}/{repo}/pulls` — open PRs. -- `POST /repos/{org}/{repo}/branches` — create branches. - ---- - -## 6. Domain policy - -Every app gets an auto-generated domain under the workspace's namespace: - -``` -{app-slug}.{workspace-slug}.vibnai.com -``` - -For example, creating an app named `my-api` in workspace `mark` yields -`my-api.mark.vibnai.com` automatically — no DNS config, no cert work, -served by Coolify's wildcard Traefik. - -### 6.1 What agents can do - -- Accept the auto-generated domain (default path). -- Replace the domain set via `PATCH /apps/{uuid}/domains`, provided every - entry ends with `.{workspace-slug}.vibnai.com`. - -### 6.2 What agents cannot do - -- Point an app at a domain outside the workspace's namespace. The server - rejects this with 403 regardless of DNS state: - - ```json - { "error": "Domain evil.com is not allowed; must end with .mark.vibnai.com", - "hint": "Use my-api.mark.vibnai.com" } - ``` - -This is enforced by `isDomainUnderWorkspace()` in -[`lib/naming.ts`](./vibn-frontend/lib/naming.ts). - -### 6.3 Custom (external) domains - -Not exposed to AI agents. A human can still add them through Coolify -directly or through a future human-gated UI. - ---- - -## 7. Safety model - -### 7.1 Tenant enforcement - -Every resource-returning helper in `lib/coolify.ts` runs through -`ensureResourceInProject()`. It: - -1. Trusts an explicit `project_uuid` on the resource if present, else -2. Fetches the project's environment ids via `GET /projects/{uuid}` and - verifies the resource's `environment_id` is in that set. - -A token for `mark` that tries to read an app in `justine`'s project returns: - -```json -{ "error": "Application does not belong to project " } -``` - -with HTTP 403. Cross-workspace enumeration and access are not just -discouraged — they fail at the helper level. - -### 7.2 Destructive operations - -Every delete endpoint requires `?confirm=`: - -``` -DELETE /apps/{uuid} → 409 "confirmation required" -DELETE /apps/{uuid}?confirm=wrong → 409 "confirmation required" -DELETE /apps/{uuid}?confirm=my-api → 200 deleted -``` - -This means an agent hallucinating a delete call cannot cost you the -resource — it must first know the exact name, which implies it just listed -or just created it. - -**Volumes are kept by default** on delete. To also remove volumes, pass -`?volumes=delete` (apps/dbs) — this is opt-in, per-call, never the default. - -### 7.3 Creation guardrails - -- Apps can only be created from repos in the workspace's Gitea org. -- Auth providers can only be created from the allowlist (see §3.5). -- Database flavors are restricted to the 8 Coolify supports. -- Env var keys must match `/^[A-Z_][A-Z0-9_]*$/` (no shell-escape tricks). - -### 7.4 Secrets handling - -- `VIBN_API_KEY` is only shown **once** on mint. Server keeps a sha256 hash. -- Gitea bot PATs are **encrypted at rest** (AES-256-GCM with - `VIBN_SECRETS_KEY`). -- The SSH private key is held by Coolify, not by Vibn; the public key is - pushed to the Gitea bot user's key list. Rotating is a re-provision. -- Agent prompts and Cursor rules include a "treat VIBN_API_KEY like a - password — never print or commit it" directive. - ---- - -## 8. Worked examples - -### 8.1 "Build me a Next.js app with a Postgres and Pocketbase auth" - -From the agent's side, using MCP: - -```json -// 1. Ensure a repo exists in the workspace org (standard Gitea API, -// using the bot PAT from gitea.credentials). -POST https://git.vibnai.com/api/v1/orgs/vibn-mark/repos -{ "name": "my-site", "private": true, "auto_init": true } - -// 2. Create the Coolify app. Auto-domain my-site.mark.vibnai.com. -{ "tool": "apps.create", - "params": { "repo": "my-site", "ports": "3000", "instantDeploy": false } } - -// 3. Provision a Postgres. -{ "tool": "databases.create", - "params": { "type": "postgresql", "name": "app-db" } } -// → returns { internalUrl: "postgres://…@:5432/postgres" } - -// 4. Wire the db URL into the app as an env var. -{ "tool": "apps.envs.upsert", - "params": { "uuid": "", "key": "DATABASE_URL", - "value": "" } } - -// 5. Deploy Pocketbase as the auth layer. -{ "tool": "auth.create", - "params": { "provider": "pocketbase", "name": "auth" } } - -// 6. First real deploy. -{ "tool": "apps.deploy", "params": { "uuid": "" } } - -// 7. Poll. -{ "tool": "apps.deployments", "params": { "uuid": "" } } -// → [{ uuid, status: "finished" | "in_progress" | "failed" | "queued" }] -``` - -The agent hands the user back `https://my-site.mark.vibnai.com`. - -### 8.2 "Add an `api` subdomain to my app" - -```json -{ "tool": "apps.domains.set", - "params": { - "uuid": "", - "domains": ["my-site.mark.vibnai.com", "api.mark.vibnai.com"] - } } -``` - -Valid — both end with `.mark.vibnai.com`. `evil.com` or `my-site.justine.vibnai.com` -would return 403. - -### 8.3 "Delete the whole thing" - -Agent must learn the resource names first (or it'll hit the confirm gate): - -```json -// Learn the name. -{ "tool": "apps.get", "params": { "uuid": "" } } -// → { name: "my-site", ... } - -// Delete with matching confirm. -{ "tool": "apps.delete", - "params": { "uuid": "", "confirm": "my-site" } } -``` - -Wrong confirm returns `409 "Confirmation required"`. - ---- - -## 9. Error handling reference - -| Status | Meaning | What the agent should do | -|---|---|---| -| 400 | Bad request body (invalid JSON, missing required field, invalid type). | Fix the body, retry. | -| 401 | No / bad bearer token. | Ask the user to mint a fresh key. | -| 403 | **Tenant mismatch** — resource belongs to another workspace, domain outside workspace namespace, or repo not in workspace org. | **Stop.** Do not retry with guessed values. Ask the user. | -| 404 | Resource not found (app/db/service/repo uuid wrong). | Re-list to find the right uuid. | -| 409 | Delete confirmation missing or wrong. | Fetch the resource name first, then retry with `confirm=`. | -| 422 | Coolify validation failure (e.g. malformed domain). | Check the `details` field. | -| 502 | Upstream Coolify/Gitea error. | Retry with backoff. | -| 503 | Workspace not fully provisioned yet. | Call `POST /provision`, then retry. | - ---- - -## 10. Versioning - -The MCP descriptor at `GET /api/mcp` reports a semver `version`. Tool names -are append-only within a major version — agents can cache the tool list -safely for the duration of a conversation but should re-fetch on 404. - -Current version: **2.4.8**. - -- **1.x** — session-cookie-only MCP, no tenant keys. -- **2.0** — `vibn_sk_…` keys, workspace-scoped Gitea bot + Coolify project. -- **2.1** — create/update/delete for apps, 8 database flavors, auth - provider allowlist, domain policy enforcement, confirm-gated deletes. -- **2.2** — per-workspace GCS object storage (`storage.*`), compose-aware - domain routing, runtime log tailing (`apps.logs`), in-container command - execution (`apps.exec`), and diagnostic `apps.update` responses. -- **2.3** — `apps.create` Docker-image and inline-composeRaw pathways (no - Gitea repo required for third-party apps), `apps.volumes.list` + - `apps.volumes.wipe` for self-service volume recovery. -- **2.4** — `apps.create` Coolify-template pathway (`{ template: "twenty" }` - etc.) for one-click deploy of 320+ vetted apps, plus `apps.templates.list` - / `apps.templates.search` for catalog discovery. -- **2.4.1** — `apps.containers.up` / `apps.containers.ps` to bypass Coolify's - unreliable queued-start worker. `apps.create` (template + composeRaw - pathways) now auto-falls-back to direct `docker compose up -d` over SSH - when Coolify's queue stalls, so a single `apps.create` call really does - leave a running stack. -- **2.4.2** — `apps.create` no longer reports `started: false` when only a - sidecar (worker / scheduler) failed its `depends_on: service_healthy` - gate. We now probe the host with `docker ps` after `compose up -d` and - return `started: true` whenever any container of the stack is running, - surfacing the compose stderr in `startDiag` so agents can decide whether - to re-run `apps.containers.up` later. This matches the real-world - behavior of slow-booting apps like Twenty (worker waits ~3 min for - twenty's healthcheck, exceeds compose's default depends_on timeout). -- **2.4.3** — Auto-attached stack containers to the `coolify` proxy network - after `compose up`, fixing Traefik 503s on third-party apps. -- **2.4.4** — Made the proxy-network attach selective (only `traefik.enable=true` - containers) to avoid DNS aliasing collisions where Twenty's `postgres` - hostname resolved to `coolify-db`. -- **2.4.5** — Architectural overhaul of `apps.create` for service templates. - We no longer run `docker compose up -d` over SSH as a deployment fallback - (that bypassed Coolify's compose generation, causing internal services to - land on the wrong networks). Instead `apps.create` now: - 1. Calls Coolify's `start` and lets its queue do the full deploy - (volumes, internal networking, env interpolation, healthchecks). - 2. Polls `service.applications[*].status` (the truthful per-app status - field — `service.status` itself routinely lies as - `starting:unknown` while containers are healthy). - 3. Applies three surgical post-deploy fixes that Coolify's own - pipeline omits but its REST API does not expose: - - rewrites `SERVICE_FQDN_*` / `SERVICE_URL_*` in the rendered - `.env` so frontends that bake their backend URL into the SPA - bundle (Twenty's `SERVER_URL`, etc.) point at the real - custom domain instead of the auto-generated sslip.io URL; - - injects the missing - `traefik.http.services..loadbalancer.server.port` label - (Coolify generates the routing rules but forgets the port, - so Traefik logs `error: port is missing` and returns 503); - - connects `coolify-proxy` to the project's Docker network - (Coolify writes a `caddy_ingress_network=` hint label - but never actually runs `docker network connect`), then - force-recreates ONLY the public-facing container so the new - env+label apply, and restarts the proxy so Traefik - re-discovers. - - The response shape gains: - - `reachable` — boolean, true when `https://` answers 2xx/3xx - - `appStatus` — the truthful per-application status from Coolify - - `postDeploy` — step-by-step diagnostic for each of the three fixes - The previous `started`/`startMethod`/`startDiag` fields are kept for - back-compat. Internal services (Postgres, Redis, worker) stay on - their isolated project network — fixing the `password authentication - failed` regression introduced in 2.4.4. -- **2.4.6** — Two fixes for transient Coolify queue lag observed in - 2.4.5: - - **Polling no longer false-fails on early `exited` status.** - Coolify's queue worker can take 60-120s to dequeue a `start` - request; during that window `service.applications[*].status` - returns the stale `exited` (= "never started") state. Previously - we treated that as terminal failure after 90s. Now we require - *evidence of activity* (`starting:*` or `running:*` was seen at - least once) before treating subsequent `exited` reports as - terminal. Until activity is observed, the loop just keeps polling - up to the 8-min health timeout. Eliminates the case where - `apps.create` returned `started: false` on a stack that was - actually about to come up healthy. - - **`apps.repair`** — new tool. Re-runs the three post-deploy - patches (env rewrite, port label, proxy network attach + recreate - + proxy restart) against an existing service without recreating - it. Useful when a deploy succeeded mechanically but ended up - serving Traefik 503 or Mixed Content errors, or whenever a user - rotates a custom domain. Params: `{ uuid, fqdn, publicAppName, - port? }`. Returns `{ reachable, postDeploy: { steps }, probe }`. -- **2.4.7** — `applyCoolifyPostDeployFixes` now schedules the - `coolify-proxy` restart (step 5) as a fire-and-forget background - job (`(sleep 3 && docker restart coolify-proxy) &`) instead of - blocking on it synchronously. The proxy restart kills any in-flight - TCP connection through the gateway — including the very request - that's running `apps.repair` / `apps.create` — so doing it inline - caused the agent to see a curl framing error (exit 16) right when - the work was in fact succeeding. Now the SSH command returns within - ~50ms, the HTTP response is delivered, and Traefik re-discovers - labels ~3s later. -- **2.4.8** — Massive simplification of post-deploy logic. Coolify's - template engine is fully capable of generating correct Traefik - labels and `SERVICE_FQDN_` / `SERVICE_URL_` env vars **if - the URL passed to `setServiceDomains` includes the upstream port** - (the "Required Port" hint in Coolify's UI: `https://crm.example.com:3000`, - not `https://crm.example.com`). 2.4.5–2.4.7 were missing that - detail, which is why they had to re-write the `.env` and inject - the loadbalancer port label as a workaround. - - In 2.4.8 `apps.create` reads `template.port` from the catalog and - passes `https://:` to `setServiceDomains`. Coolify then: - - generates `traefik.http.services..loadbalancer.server.port=` - automatically; - - rewrites `.env` so `SERVICE_FQDN_=` and - `SERVICE_URL_=https://` (no sslip.io leak); - - keeps `SERVICE_FQDN__` magic placeholders correctly - pointed at the user's host:port. - - All that's left is the one thing Coolify still skips: connecting - `coolify-proxy` to the resource's project Docker network. So - `applyCoolifyPostDeployFixes` is now ~30 lines (down from ~200) and - no longer SSH-runs an embedded Python script inside a - `python:3-alpine` container. The `CoolifyPostDeployResult.steps` - shape gains/keeps `proxyNetwork` + `proxyRestart` only; the old - `envRewrite` / `portLabel` / `recreate` step keys are removed. - `apps.repair` retains its API (`{ uuid, fqdn, publicAppName, port? }`) - but `port` is now informational only (not required for the helper - to function). - ---- - -## 11. Troubleshooting compose apps - -Most real-world app failures fall into a small number of patterns. The -recipes below are the canonical diagnostic flow for an agent operating -on behalf of a user. - -### 11.1 "Deployment succeeds but the app keeps restarting" - -Agents should NOT trust Coolify's deployment status alone. A successful -build + healthcheck-pending response usually means the containers came -up but the app logic is crashing. Investigate with: - -1. `apps.logs { uuid, lines: 300 }` — look for `warnings` (empty - services indicate containers never ran) and per-service stderr. -2. If the logs show repeated DB errors like `relation "xxx" does not - exist` or `pq: no such table`, the app skipped its migration step. - This is common for Docker Compose apps whose `server` service only - runs migrations on a separate `worker` command. -3. Run the app's migration CLI via `apps.exec`, e.g. for Twenty: - - ```json - { - "action": "apps.exec", - "params": { - "uuid": "", - "service": "server", - "command": "yarn command:prod database:migrate:prod", - "timeout_ms": 300000 - } - } - ``` - -4. Re-check logs — errors should be gone. Then `apps.deploy` (or just - wait for the next restart) and verify the container reports - `healthy`. - -### 11.2 "`apps.update` returned success but nothing changed" - -Check the `applied` / `ignored` / `rerouted` arrays in the response. -The most common reroutes: - -- `fqdn`, `domains`, `docker_compose_domains` → use `apps.domains.set`. -- `git_repository` → use `apps.rewire_git` (rewrites the clone URL with - the workspace's Gitea PAT embedded). -- `build_pack` — changing this mid-life for an existing app is not - supported. Recreate the app. - -### 11.3 "Compose app is up but the domain 502s" - -Coolify's API treats compose and single-container apps differently: -compose apps use `docker_compose_domains` (array of `{name, domain}`), -single-container apps use `domains` (comma-separated string). -`apps.domains.set` handles both, but if you're seeing a 502: - -1. `apps.domains.list { uuid }` — confirm the domain is actually - attached to a **service** (not just the app). -2. `apps.exec { uuid, service: "server", command: "nc -vz localhost " }` - — verify the upstream container is listening. -3. `apps.logs { uuid, service: "server", lines: 200 }` — look for - startup errors like `EADDRINUSE` or config failures. - -### 11.4 "Choosing the right `apps.create` pathway" - -| Situation | Use | -|---|---| -| User's own code lives in their Gitea org | `repo` (pathway 1) | -| Single-container third-party app (nginx, redis, a docker image) | `image` (pathway 2) | -| Custom multi-service stack (no upstream template exists) | `composeRaw` (pathway 3) | -| **Popular third-party app (Twenty, n8n, Supabase, Ghost, Wordpress, …)** | **`template` (pathway 4) — strongly preferred** | - -**Always check `apps.templates.search { query: "" }` first.** Coolify ships 320+ vetted one-click templates. Each one has tested env defaults, healthchecks, `depends_on` graphs, and the right volume mounts. The same app deployed via `composeRaw` will hit application-specific quirks (URL validation, DB bootstrap order, secret generation) that the template author already solved. - -**Never** create a Gitea repo just to host a third-party app's compose file. - -**Recipe — deploying any popular app in 3 calls:** - -```json -// 1. Find the right template slug -{ "action": "apps.templates.search", "params": { "query": "twenty" } } -// → { "items": [{ "slug": "twenty", "slogan": "Twenty is a CRM…", "tags": ["crm","self-hosted"], "port": 3000 }] } - -// 2. Deploy it -{ "action": "apps.create", "params": { "template": "twenty", "name": "crm" } } -// → { "uuid": "...", "domain": "crm..vibnai.com", "started": true, -// "note": "First boot may take 1-5 min while Coolify pulls images and runs migrations." } - -// 3. Watch it come up -{ "action": "apps.logs", "params": { "uuid": "...", "lines": 200 } } -``` - -For `composeRaw` (only when no template exists), fetch the app's official `docker-compose.yml` (from GitHub/DockerHub) and pass it inline. Override any hard-coded image tags with pinned versions for reproducibility. - -**Browsing the catalog** with `apps.templates.list { tag: "ai" }` returns all AI/ML templates; `{ tag: "crm" }` returns CRMs; etc. Useful when the user asks "what self-hosted analytics tools can I deploy?" or similar open-ended questions. - -### 11.5 "Compose app fails on second+ deploy — relation/table does not exist" - -Classic stale volume problem. Sequence of events: -1. First deploy: Postgres starts and auto-creates an empty `default` database (from `POSTGRES_DB` env var) -2. App server starts, tries to `CREATE DATABASE` or `DROP DATABASE` inside a transaction → Postgres rejects it -3. Deploy fails, containers stop — but the volume persists with the half-initialized DB -4. Second deploy: Postgres finds existing data, skips init — but schema is corrupt/incomplete -5. Server errors cascade forever - -**Fix:** - -```json -// Step 1: find the volume -{ "action": "apps.volumes.list", "params": { "uuid": "" } } -// → { "volumes": [{ "name": "abc123_db-data", "sizeBytes": 8192 }] } - -// Step 2: wipe it -{ "action": "apps.volumes.wipe", "params": { "uuid": "", "volume": "abc123_db-data", "confirm": "abc123_db-data" } } - -// Step 3: redeploy clean -{ "action": "apps.deploy", "params": { "uuid": "" } } -``` - -If Postgres still auto-creates the database before the app server runs migrations, use `apps.exec` to drop it outside a transaction: - -```json -{ "action": "apps.exec", "params": { "uuid": "", "service": "db", "command": "psql -U postgres -c 'DROP DATABASE IF EXISTS \"default\";'" } } -``` - -Then redeploy. - -### 11.7 "Healthcheck times out on first deploy" - -Docker Compose healthchecks have a `start_period` grace window. Apps -that run long-running migrations on first boot (Twenty, Directus, -older Strapi versions) need a `start_period` that covers the cold -start, typically 120–600s. - -- Fix at the compose level: edit the repo's `docker-compose.yml` to - set `healthcheck.start_period: 300s` on the affected service, commit, - push, `apps.deploy`. -- Alternatively, handle migrations out-of-band via `apps.exec` and let - the default healthcheck succeed instantly. - -### 11.8 "I can't tell what's inside the container" - -`apps.exec` is the escape hatch. Useful shell one-liners: - -| Goal | Command | -|---|---| -| List running processes | `ps -ef` | -| Show env vars | `env \| sort` | -| Check file exists | `ls -la /path/to/file` | -| Test DB connection | `nc -vz postgres 5432` or `psql $POSTGRES_URL -c 'select 1'` | -| Tail an app's internal log | `tail -200 /var/log/app.log` | -| Run a framework CLI | `yarn