docs: apps.exec + compose troubleshooting appendix
Documents the new apps.exec tool alongside apps.logs and adds a troubleshooting cookbook covering the common failure patterns we've seen in the wild: missing migrations, silent apps.update reroutes, 502s on compose domains, healthcheck timeouts, and how to use apps.exec as the platform's escape hatch for in-container inspection. Bumps MCP version to 2.2.0 in the changelog and bumps the vibn-frontend submodule to ship the apps.exec implementation. Also includes setup-vibn-logs-user.sh (the script that installs the locked-down SSH user on the Coolify host) which was already running in production but not yet committed. Made-with: Cursor
This commit is contained in:
@@ -110,13 +110,15 @@ Version: **2.1.0**.
|
||||
| `apps.list` | All Coolify apps in the workspace. | — |
|
||||
| `apps.get` | Single app details (status, fqdn, domains, git info). | `{ uuid }` |
|
||||
| `apps.create` | Create a Coolify app from a Gitea repo in the workspace's org. Clones over **HTTPS with the workspace bot's PAT embedded in the URL** — SSH is not used because Gitea's SSH isn't reachable on the default port. Auto-domain `{name}.{slug}.vibnai.com`. | `{ repo, branch?, name?, ports?, buildPack?, domain?, envs?, instantDeploy?, dockerComposeLocation?, dockerfileLocation?, baseDirectory? }` |
|
||||
| `apps.update` | PATCH a whitelisted set of fields (name, description, git branch/commit, ports, build commands, base directory, Dockerfile location, docker-compose location…). | `{ uuid, patch }` |
|
||||
| `apps.update` | PATCH a whitelisted set of fields (name, description, git branch/commit, ports, build commands, base directory, Dockerfile location, docker-compose location…). Returns `applied`, `ignored`, and `rerouted` arrays so the agent can see exactly what persisted; setting `fqdn`/`domains`/`docker_compose_domains` returns a `rerouted` entry pointing at `apps.domains.set`, and setting `git_repository` returns one pointing at `apps.rewire_git`. | `{ uuid, patch }` |
|
||||
| `apps.rewire_git` | Re-point an app's `git_repository` at the canonical HTTPS+PAT clone URL. Use to recover older apps that were created with SSH URLs, or to refresh a rotated bot PAT. | `{ uuid, repo? }` — `repo` optional; inferred from current URL if omitted |
|
||||
| `apps.delete` | Destroy the app. Volumes kept by default. | `{ uuid, confirm }` — `confirm` must equal the app's exact name |
|
||||
| `apps.deploy` | Trigger a new deployment. | `{ uuid, force? }` |
|
||||
| `apps.deployments` | List recent deployments + status. | `{ uuid }` |
|
||||
| `apps.logs` | Runtime logs for a running app. Compose-aware: returns per-service logs for `dockercompose` build packs, single stream for `dockerfile`/`nixpacks`. Includes container status and any diagnostic warnings. | `{ uuid, service?, lines? }` — `service` filter (compose only), `lines` default 200, max 5000 |
|
||||
| `apps.exec` | Run a one-shot command inside an app container (via `docker exec` on the Coolify host). Compose-aware: pass `service` when the app has >1 container. Returns `{ container, service, code, stdout, stderr, truncated, durationMs, containerHealth }`. Default timeout 60s (max 10 min); default output cap 1 MB (max 5 MB). Command is run through `sh -lc` so shell syntax works. Use this for database migrations, seeds, CLI invocations, and ad-hoc debugging. Every call is audit-logged (command + target, not output). | `{ uuid, command, service?, user?, workdir?, timeout_ms?, max_bytes? }` |
|
||||
| `apps.domains.list` | Current domain set. | `{ uuid }` |
|
||||
| `apps.domains.set` | Replace the domain set. All entries must end with `.{slug}.vibnai.com`. | `{ uuid, domains: string[] }` |
|
||||
| `apps.domains.set` | Replace the domain set. All entries must end with `.{slug}.vibnai.com`. Compose-aware: for `dockercompose` apps the domain is attached to a specific service (`server` by default; override with `service`). | `{ uuid, domains: string[], service? }` |
|
||||
| `apps.envs.list` | List env vars. Values returned are redacted for `shown-once` secrets. | `{ uuid }` |
|
||||
| `apps.envs.upsert` | Create or update an env var. `is_build_time` is **ignored** — Coolify derives build-vs-runtime from Dockerfile `ARG` usage. | `{ uuid, key, value, isPreview?, isMultiline?, isLiteral?, isShownOnce? }` |
|
||||
| `apps.envs.delete` | Delete an env var. | `{ uuid, key }` |
|
||||
@@ -579,22 +581,136 @@ The MCP descriptor at `GET /api/mcp` reports a semver `version`. Tool names
|
||||
are append-only within a major version — agents can cache the tool list
|
||||
safely for the duration of a conversation but should re-fetch on 404.
|
||||
|
||||
Current version: **2.1.0**.
|
||||
Current version: **2.2.0**.
|
||||
|
||||
- **1.x** — session-cookie-only MCP, no tenant keys.
|
||||
- **2.0** — `vibn_sk_…` keys, workspace-scoped Gitea bot + Coolify project.
|
||||
- **2.1** — create/update/delete for apps, 8 database flavors, auth
|
||||
provider allowlist, domain policy enforcement, confirm-gated deletes.
|
||||
- **2.2** — per-workspace GCS object storage (`storage.*`), compose-aware
|
||||
domain routing, runtime log tailing (`apps.logs`), in-container command
|
||||
execution (`apps.exec`), and diagnostic `apps.update` responses.
|
||||
|
||||
---
|
||||
|
||||
## 11. Where to look in the code
|
||||
## 11. Troubleshooting compose apps
|
||||
|
||||
Most real-world app failures fall into a small number of patterns. The
|
||||
recipes below are the canonical diagnostic flow for an agent operating
|
||||
on behalf of a user.
|
||||
|
||||
### 11.1 "Deployment succeeds but the app keeps restarting"
|
||||
|
||||
Agents should NOT trust Coolify's deployment status alone. A successful
|
||||
build + healthcheck-pending response usually means the containers came
|
||||
up but the app logic is crashing. Investigate with:
|
||||
|
||||
1. `apps.logs { uuid, lines: 300 }` — look for `warnings` (empty
|
||||
services indicate containers never ran) and per-service stderr.
|
||||
2. If the logs show repeated DB errors like `relation "xxx" does not
|
||||
exist` or `pq: no such table`, the app skipped its migration step.
|
||||
This is common for Docker Compose apps whose `server` service only
|
||||
runs migrations on a separate `worker` command.
|
||||
3. Run the app's migration CLI via `apps.exec`, e.g. for Twenty:
|
||||
|
||||
```json
|
||||
{
|
||||
"action": "apps.exec",
|
||||
"params": {
|
||||
"uuid": "<app-uuid>",
|
||||
"service": "server",
|
||||
"command": "yarn command:prod database:migrate:prod",
|
||||
"timeout_ms": 300000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
4. Re-check logs — errors should be gone. Then `apps.deploy` (or just
|
||||
wait for the next restart) and verify the container reports
|
||||
`healthy`.
|
||||
|
||||
### 11.2 "`apps.update` returned success but nothing changed"
|
||||
|
||||
Check the `applied` / `ignored` / `rerouted` arrays in the response.
|
||||
The most common reroutes:
|
||||
|
||||
- `fqdn`, `domains`, `docker_compose_domains` → use `apps.domains.set`.
|
||||
- `git_repository` → use `apps.rewire_git` (rewrites the clone URL with
|
||||
the workspace's Gitea PAT embedded).
|
||||
- `build_pack` — changing this mid-life for an existing app is not
|
||||
supported. Recreate the app.
|
||||
|
||||
### 11.3 "Compose app is up but the domain 502s"
|
||||
|
||||
Coolify's API treats compose and single-container apps differently:
|
||||
compose apps use `docker_compose_domains` (array of `{name, domain}`),
|
||||
single-container apps use `domains` (comma-separated string).
|
||||
`apps.domains.set` handles both, but if you're seeing a 502:
|
||||
|
||||
1. `apps.domains.list { uuid }` — confirm the domain is actually
|
||||
attached to a **service** (not just the app).
|
||||
2. `apps.exec { uuid, service: "server", command: "nc -vz localhost <port>" }`
|
||||
— verify the upstream container is listening.
|
||||
3. `apps.logs { uuid, service: "server", lines: 200 }` — look for
|
||||
startup errors like `EADDRINUSE` or config failures.
|
||||
|
||||
### 11.4 "Healthcheck times out on first deploy"
|
||||
|
||||
Docker Compose healthchecks have a `start_period` grace window. Apps
|
||||
that run long-running migrations on first boot (Twenty, Directus,
|
||||
older Strapi versions) need a `start_period` that covers the cold
|
||||
start, typically 120–600s.
|
||||
|
||||
- Fix at the compose level: edit the repo's `docker-compose.yml` to
|
||||
set `healthcheck.start_period: 300s` on the affected service, commit,
|
||||
push, `apps.deploy`.
|
||||
- Alternatively, handle migrations out-of-band via `apps.exec` and let
|
||||
the default healthcheck succeed instantly.
|
||||
|
||||
### 11.5 "I can't tell what's inside the container"
|
||||
|
||||
`apps.exec` is the escape hatch. Useful shell one-liners:
|
||||
|
||||
| Goal | Command |
|
||||
|---|---|
|
||||
| List running processes | `ps -ef` |
|
||||
| Show env vars | `env \| sort` |
|
||||
| Check file exists | `ls -la /path/to/file` |
|
||||
| Test DB connection | `nc -vz postgres 5432` or `psql $POSTGRES_URL -c 'select 1'` |
|
||||
| Tail an app's internal log | `tail -200 /var/log/app.log` |
|
||||
| Run a framework CLI | `yarn <script>`, `npm run <script>`, `python manage.py <cmd>` |
|
||||
| Inspect filesystem diff vs image | `find /app -newer /tmp/marker -type f 2>/dev/null` |
|
||||
|
||||
Output is capped at 1 MB by default (bump with `max_bytes`). Commands
|
||||
that could exceed the wall-clock timeout should bump `timeout_ms`
|
||||
(max 600000 = 10 minutes).
|
||||
|
||||
### 11.6 "The agent wants to run something interactively"
|
||||
|
||||
It can't. `apps.exec` is strictly non-interactive: no TTY, no stdin,
|
||||
no session resumption. For migrations and CLI invocations this is the
|
||||
right shape. For genuinely interactive work (a debug shell), the
|
||||
operator needs SSH + `docker exec -it` directly — outside the
|
||||
platform's AI surface.
|
||||
|
||||
---
|
||||
|
||||
## 12. Where to look in the code
|
||||
|
||||
- `lib/auth/workspace-auth.ts` — `requireWorkspacePrincipal`, the gate.
|
||||
- `lib/auth/secret-box.ts` — AES-256-GCM encryption of Gitea PATs.
|
||||
- `lib/workspaces.ts` — `ensureWorkspaceProvisioned` (the idempotent setup).
|
||||
- `lib/gitea.ts` — Gitea client (orgs, users, PATs, SSH keys).
|
||||
- `lib/coolify.ts` — Coolify client, tenant helpers, all resource CRUD.
|
||||
- `lib/coolify-ssh.ts` — SSH transport for tools that need host-level
|
||||
docker access (`apps.logs`, `apps.exec`). Uses a dedicated
|
||||
`vibn-logs` user on the Coolify host with docker-group membership
|
||||
and no shell.
|
||||
- `lib/coolify-containers.ts` — container enumeration + service
|
||||
resolution, shared between logs and exec paths.
|
||||
- `lib/coolify-logs.ts` — compose-aware log tailing.
|
||||
- `lib/coolify-exec.ts` — one-shot `docker exec` over SSH with
|
||||
timeout, output caps, and audit logging.
|
||||
- `lib/naming.ts` — domain policy, slugify, SSH URL templates.
|
||||
- `lib/ssh-keys.ts` — ed25519 keypair generation + OpenSSH formatting.
|
||||
- `app/api/workspaces/[slug]/…` — REST surface.
|
||||
|
||||
Reference in New Issue
Block a user