From 537e697dabef6137db48b1c6f8ec2c68bc684ca9 Mon Sep 17 00:00:00 2001 From: mawkone Date: Thu, 23 Apr 2026 14:19:01 -0700 Subject: [PATCH] docs: apps.exec + compose troubleshooting appendix Documents the new apps.exec tool alongside apps.logs and adds a troubleshooting cookbook covering the common failure patterns we've seen in the wild: missing migrations, silent apps.update reroutes, 502s on compose domains, healthcheck timeouts, and how to use apps.exec as the platform's escape hatch for in-container inspection. Bumps MCP version to 2.2.0 in the changelog and bumps the vibn-frontend submodule to ship the apps.exec implementation. Also includes setup-vibn-logs-user.sh (the script that installs the locked-down SSH user on the Coolify host) which was already running in production but not yet committed. Made-with: Cursor --- .gitignore | 1 + AI_CAPABILITIES.md | 124 ++++++++++++++++++++++++++++++-- scripts/setup-vibn-logs-user.sh | 47 ++++++++++++ vibn-frontend | 2 +- 4 files changed, 169 insertions(+), 5 deletions(-) create mode 100644 scripts/setup-vibn-logs-user.sh diff --git a/.gitignore b/.gitignore index 0505140f..e5e3c204 100644 --- a/.gitignore +++ b/.gitignore @@ -29,3 +29,4 @@ **/.next/ **/.turbo/ **/coverage/ +.secrets/ diff --git a/AI_CAPABILITIES.md b/AI_CAPABILITIES.md index e0f639a4..fc100d51 100644 --- a/AI_CAPABILITIES.md +++ b/AI_CAPABILITIES.md @@ -110,13 +110,15 @@ Version: **2.1.0**. | `apps.list` | All Coolify apps in the workspace. | — | | `apps.get` | Single app details (status, fqdn, domains, git info). | `{ uuid }` | | `apps.create` | Create a Coolify app from a Gitea repo in the workspace's org. Clones over **HTTPS with the workspace bot's PAT embedded in the URL** — SSH is not used because Gitea's SSH isn't reachable on the default port. Auto-domain `{name}.{slug}.vibnai.com`. | `{ repo, branch?, name?, ports?, buildPack?, domain?, envs?, instantDeploy?, dockerComposeLocation?, dockerfileLocation?, baseDirectory? }` | -| `apps.update` | PATCH a whitelisted set of fields (name, description, git branch/commit, ports, build commands, base directory, Dockerfile location, docker-compose location…). | `{ uuid, patch }` | +| `apps.update` | PATCH a whitelisted set of fields (name, description, git branch/commit, ports, build commands, base directory, Dockerfile location, docker-compose location…). Returns `applied`, `ignored`, and `rerouted` arrays so the agent can see exactly what persisted; setting `fqdn`/`domains`/`docker_compose_domains` returns a `rerouted` entry pointing at `apps.domains.set`, and setting `git_repository` returns one pointing at `apps.rewire_git`. | `{ uuid, patch }` | | `apps.rewire_git` | Re-point an app's `git_repository` at the canonical HTTPS+PAT clone URL. Use to recover older apps that were created with SSH URLs, or to refresh a rotated bot PAT. | `{ uuid, repo? }` — `repo` optional; inferred from current URL if omitted | | `apps.delete` | Destroy the app. Volumes kept by default. | `{ uuid, confirm }` — `confirm` must equal the app's exact name | | `apps.deploy` | Trigger a new deployment. | `{ uuid, force? }` | | `apps.deployments` | List recent deployments + status. | `{ uuid }` | +| `apps.logs` | Runtime logs for a running app. Compose-aware: returns per-service logs for `dockercompose` build packs, single stream for `dockerfile`/`nixpacks`. Includes container status and any diagnostic warnings. | `{ uuid, service?, lines? }` — `service` filter (compose only), `lines` default 200, max 5000 | +| `apps.exec` | Run a one-shot command inside an app container (via `docker exec` on the Coolify host). Compose-aware: pass `service` when the app has >1 container. Returns `{ container, service, code, stdout, stderr, truncated, durationMs, containerHealth }`. Default timeout 60s (max 10 min); default output cap 1 MB (max 5 MB). Command is run through `sh -lc` so shell syntax works. Use this for database migrations, seeds, CLI invocations, and ad-hoc debugging. Every call is audit-logged (command + target, not output). | `{ uuid, command, service?, user?, workdir?, timeout_ms?, max_bytes? }` | | `apps.domains.list` | Current domain set. | `{ uuid }` | -| `apps.domains.set` | Replace the domain set. All entries must end with `.{slug}.vibnai.com`. | `{ uuid, domains: string[] }` | +| `apps.domains.set` | Replace the domain set. All entries must end with `.{slug}.vibnai.com`. Compose-aware: for `dockercompose` apps the domain is attached to a specific service (`server` by default; override with `service`). | `{ uuid, domains: string[], service? }` | | `apps.envs.list` | List env vars. Values returned are redacted for `shown-once` secrets. | `{ uuid }` | | `apps.envs.upsert` | Create or update an env var. `is_build_time` is **ignored** — Coolify derives build-vs-runtime from Dockerfile `ARG` usage. | `{ uuid, key, value, isPreview?, isMultiline?, isLiteral?, isShownOnce? }` | | `apps.envs.delete` | Delete an env var. | `{ uuid, key }` | @@ -579,22 +581,136 @@ The MCP descriptor at `GET /api/mcp` reports a semver `version`. Tool names are append-only within a major version — agents can cache the tool list safely for the duration of a conversation but should re-fetch on 404. -Current version: **2.1.0**. +Current version: **2.2.0**. - **1.x** — session-cookie-only MCP, no tenant keys. - **2.0** — `vibn_sk_…` keys, workspace-scoped Gitea bot + Coolify project. - **2.1** — create/update/delete for apps, 8 database flavors, auth provider allowlist, domain policy enforcement, confirm-gated deletes. +- **2.2** — per-workspace GCS object storage (`storage.*`), compose-aware + domain routing, runtime log tailing (`apps.logs`), in-container command + execution (`apps.exec`), and diagnostic `apps.update` responses. --- -## 11. Where to look in the code +## 11. Troubleshooting compose apps + +Most real-world app failures fall into a small number of patterns. The +recipes below are the canonical diagnostic flow for an agent operating +on behalf of a user. + +### 11.1 "Deployment succeeds but the app keeps restarting" + +Agents should NOT trust Coolify's deployment status alone. A successful +build + healthcheck-pending response usually means the containers came +up but the app logic is crashing. Investigate with: + +1. `apps.logs { uuid, lines: 300 }` — look for `warnings` (empty + services indicate containers never ran) and per-service stderr. +2. If the logs show repeated DB errors like `relation "xxx" does not + exist` or `pq: no such table`, the app skipped its migration step. + This is common for Docker Compose apps whose `server` service only + runs migrations on a separate `worker` command. +3. Run the app's migration CLI via `apps.exec`, e.g. for Twenty: + + ```json + { + "action": "apps.exec", + "params": { + "uuid": "", + "service": "server", + "command": "yarn command:prod database:migrate:prod", + "timeout_ms": 300000 + } + } + ``` + +4. Re-check logs — errors should be gone. Then `apps.deploy` (or just + wait for the next restart) and verify the container reports + `healthy`. + +### 11.2 "`apps.update` returned success but nothing changed" + +Check the `applied` / `ignored` / `rerouted` arrays in the response. +The most common reroutes: + +- `fqdn`, `domains`, `docker_compose_domains` → use `apps.domains.set`. +- `git_repository` → use `apps.rewire_git` (rewrites the clone URL with + the workspace's Gitea PAT embedded). +- `build_pack` — changing this mid-life for an existing app is not + supported. Recreate the app. + +### 11.3 "Compose app is up but the domain 502s" + +Coolify's API treats compose and single-container apps differently: +compose apps use `docker_compose_domains` (array of `{name, domain}`), +single-container apps use `domains` (comma-separated string). +`apps.domains.set` handles both, but if you're seeing a 502: + +1. `apps.domains.list { uuid }` — confirm the domain is actually + attached to a **service** (not just the app). +2. `apps.exec { uuid, service: "server", command: "nc -vz localhost " }` + — verify the upstream container is listening. +3. `apps.logs { uuid, service: "server", lines: 200 }` — look for + startup errors like `EADDRINUSE` or config failures. + +### 11.4 "Healthcheck times out on first deploy" + +Docker Compose healthchecks have a `start_period` grace window. Apps +that run long-running migrations on first boot (Twenty, Directus, +older Strapi versions) need a `start_period` that covers the cold +start, typically 120–600s. + +- Fix at the compose level: edit the repo's `docker-compose.yml` to + set `healthcheck.start_period: 300s` on the affected service, commit, + push, `apps.deploy`. +- Alternatively, handle migrations out-of-band via `apps.exec` and let + the default healthcheck succeed instantly. + +### 11.5 "I can't tell what's inside the container" + +`apps.exec` is the escape hatch. Useful shell one-liners: + +| Goal | Command | +|---|---| +| List running processes | `ps -ef` | +| Show env vars | `env \| sort` | +| Check file exists | `ls -la /path/to/file` | +| Test DB connection | `nc -vz postgres 5432` or `psql $POSTGRES_URL -c 'select 1'` | +| Tail an app's internal log | `tail -200 /var/log/app.log` | +| Run a framework CLI | `yarn