docs: apps.exec + compose troubleshooting appendix

Documents the new apps.exec tool alongside apps.logs and adds a
troubleshooting cookbook covering the common failure patterns we've
seen in the wild: missing migrations, silent apps.update reroutes,
502s on compose domains, healthcheck timeouts, and how to use
apps.exec as the platform's escape hatch for in-container inspection.

Bumps MCP version to 2.2.0 in the changelog and bumps the
vibn-frontend submodule to ship the apps.exec implementation.

Also includes setup-vibn-logs-user.sh (the script that installs the
locked-down SSH user on the Coolify host) which was already running
in production but not yet committed.

Made-with: Cursor
This commit is contained in:
2026-04-23 14:19:01 -07:00
parent d04bd64474
commit 537e697dab
4 changed files with 169 additions and 5 deletions

1
.gitignore vendored
View File

@@ -29,3 +29,4 @@
**/.next/
**/.turbo/
**/coverage/
.secrets/

View File

@@ -110,13 +110,15 @@ Version: **2.1.0**.
| `apps.list` | All Coolify apps in the workspace. | — |
| `apps.get` | Single app details (status, fqdn, domains, git info). | `{ uuid }` |
| `apps.create` | Create a Coolify app from a Gitea repo in the workspace's org. Clones over **HTTPS with the workspace bot's PAT embedded in the URL** — SSH is not used because Gitea's SSH isn't reachable on the default port. Auto-domain `{name}.{slug}.vibnai.com`. | `{ repo, branch?, name?, ports?, buildPack?, domain?, envs?, instantDeploy?, dockerComposeLocation?, dockerfileLocation?, baseDirectory? }` |
| `apps.update` | PATCH a whitelisted set of fields (name, description, git branch/commit, ports, build commands, base directory, Dockerfile location, docker-compose location…). | `{ uuid, patch }` |
| `apps.update` | PATCH a whitelisted set of fields (name, description, git branch/commit, ports, build commands, base directory, Dockerfile location, docker-compose location…). Returns `applied`, `ignored`, and `rerouted` arrays so the agent can see exactly what persisted; setting `fqdn`/`domains`/`docker_compose_domains` returns a `rerouted` entry pointing at `apps.domains.set`, and setting `git_repository` returns one pointing at `apps.rewire_git`. | `{ uuid, patch }` |
| `apps.rewire_git` | Re-point an app's `git_repository` at the canonical HTTPS+PAT clone URL. Use to recover older apps that were created with SSH URLs, or to refresh a rotated bot PAT. | `{ uuid, repo? }``repo` optional; inferred from current URL if omitted |
| `apps.delete` | Destroy the app. Volumes kept by default. | `{ uuid, confirm }``confirm` must equal the app's exact name |
| `apps.deploy` | Trigger a new deployment. | `{ uuid, force? }` |
| `apps.deployments` | List recent deployments + status. | `{ uuid }` |
| `apps.logs` | Runtime logs for a running app. Compose-aware: returns per-service logs for `dockercompose` build packs, single stream for `dockerfile`/`nixpacks`. Includes container status and any diagnostic warnings. | `{ uuid, service?, lines? }``service` filter (compose only), `lines` default 200, max 5000 |
| `apps.exec` | Run a one-shot command inside an app container (via `docker exec` on the Coolify host). Compose-aware: pass `service` when the app has >1 container. Returns `{ container, service, code, stdout, stderr, truncated, durationMs, containerHealth }`. Default timeout 60s (max 10 min); default output cap 1 MB (max 5 MB). Command is run through `sh -lc` so shell syntax works. Use this for database migrations, seeds, CLI invocations, and ad-hoc debugging. Every call is audit-logged (command + target, not output). | `{ uuid, command, service?, user?, workdir?, timeout_ms?, max_bytes? }` |
| `apps.domains.list` | Current domain set. | `{ uuid }` |
| `apps.domains.set` | Replace the domain set. All entries must end with `.{slug}.vibnai.com`. | `{ uuid, domains: string[] }` |
| `apps.domains.set` | Replace the domain set. All entries must end with `.{slug}.vibnai.com`. Compose-aware: for `dockercompose` apps the domain is attached to a specific service (`server` by default; override with `service`). | `{ uuid, domains: string[], service? }` |
| `apps.envs.list` | List env vars. Values returned are redacted for `shown-once` secrets. | `{ uuid }` |
| `apps.envs.upsert` | Create or update an env var. `is_build_time` is **ignored** — Coolify derives build-vs-runtime from Dockerfile `ARG` usage. | `{ uuid, key, value, isPreview?, isMultiline?, isLiteral?, isShownOnce? }` |
| `apps.envs.delete` | Delete an env var. | `{ uuid, key }` |
@@ -579,22 +581,136 @@ The MCP descriptor at `GET /api/mcp` reports a semver `version`. Tool names
are append-only within a major version — agents can cache the tool list
safely for the duration of a conversation but should re-fetch on 404.
Current version: **2.1.0**.
Current version: **2.2.0**.
- **1.x** — session-cookie-only MCP, no tenant keys.
- **2.0** — `vibn_sk_…` keys, workspace-scoped Gitea bot + Coolify project.
- **2.1** — create/update/delete for apps, 8 database flavors, auth
provider allowlist, domain policy enforcement, confirm-gated deletes.
- **2.2** — per-workspace GCS object storage (`storage.*`), compose-aware
domain routing, runtime log tailing (`apps.logs`), in-container command
execution (`apps.exec`), and diagnostic `apps.update` responses.
---
## 11. Where to look in the code
## 11. Troubleshooting compose apps
Most real-world app failures fall into a small number of patterns. The
recipes below are the canonical diagnostic flow for an agent operating
on behalf of a user.
### 11.1 "Deployment succeeds but the app keeps restarting"
Agents should NOT trust Coolify's deployment status alone. A successful
build + healthcheck-pending response usually means the containers came
up but the app logic is crashing. Investigate with:
1. `apps.logs { uuid, lines: 300 }` — look for `warnings` (empty
services indicate containers never ran) and per-service stderr.
2. If the logs show repeated DB errors like `relation "xxx" does not
exist` or `pq: no such table`, the app skipped its migration step.
This is common for Docker Compose apps whose `server` service only
runs migrations on a separate `worker` command.
3. Run the app's migration CLI via `apps.exec`, e.g. for Twenty:
```json
{
"action": "apps.exec",
"params": {
"uuid": "<app-uuid>",
"service": "server",
"command": "yarn command:prod database:migrate:prod",
"timeout_ms": 300000
}
}
```
4. Re-check logs — errors should be gone. Then `apps.deploy` (or just
wait for the next restart) and verify the container reports
`healthy`.
### 11.2 "`apps.update` returned success but nothing changed"
Check the `applied` / `ignored` / `rerouted` arrays in the response.
The most common reroutes:
- `fqdn`, `domains`, `docker_compose_domains` → use `apps.domains.set`.
- `git_repository` → use `apps.rewire_git` (rewrites the clone URL with
the workspace's Gitea PAT embedded).
- `build_pack` — changing this mid-life for an existing app is not
supported. Recreate the app.
### 11.3 "Compose app is up but the domain 502s"
Coolify's API treats compose and single-container apps differently:
compose apps use `docker_compose_domains` (array of `{name, domain}`),
single-container apps use `domains` (comma-separated string).
`apps.domains.set` handles both, but if you're seeing a 502:
1. `apps.domains.list { uuid }` — confirm the domain is actually
attached to a **service** (not just the app).
2. `apps.exec { uuid, service: "server", command: "nc -vz localhost <port>" }`
— verify the upstream container is listening.
3. `apps.logs { uuid, service: "server", lines: 200 }` — look for
startup errors like `EADDRINUSE` or config failures.
### 11.4 "Healthcheck times out on first deploy"
Docker Compose healthchecks have a `start_period` grace window. Apps
that run long-running migrations on first boot (Twenty, Directus,
older Strapi versions) need a `start_period` that covers the cold
start, typically 120600s.
- Fix at the compose level: edit the repo's `docker-compose.yml` to
set `healthcheck.start_period: 300s` on the affected service, commit,
push, `apps.deploy`.
- Alternatively, handle migrations out-of-band via `apps.exec` and let
the default healthcheck succeed instantly.
### 11.5 "I can't tell what's inside the container"
`apps.exec` is the escape hatch. Useful shell one-liners:
| Goal | Command |
|---|---|
| List running processes | `ps -ef` |
| Show env vars | `env \| sort` |
| Check file exists | `ls -la /path/to/file` |
| Test DB connection | `nc -vz postgres 5432` or `psql $POSTGRES_URL -c 'select 1'` |
| Tail an app's internal log | `tail -200 /var/log/app.log` |
| Run a framework CLI | `yarn <script>`, `npm run <script>`, `python manage.py <cmd>` |
| Inspect filesystem diff vs image | `find /app -newer /tmp/marker -type f 2>/dev/null` |
Output is capped at 1 MB by default (bump with `max_bytes`). Commands
that could exceed the wall-clock timeout should bump `timeout_ms`
(max 600000 = 10 minutes).
### 11.6 "The agent wants to run something interactively"
It can't. `apps.exec` is strictly non-interactive: no TTY, no stdin,
no session resumption. For migrations and CLI invocations this is the
right shape. For genuinely interactive work (a debug shell), the
operator needs SSH + `docker exec -it` directly — outside the
platform's AI surface.
---
## 12. Where to look in the code
- `lib/auth/workspace-auth.ts` — `requireWorkspacePrincipal`, the gate.
- `lib/auth/secret-box.ts` — AES-256-GCM encryption of Gitea PATs.
- `lib/workspaces.ts` — `ensureWorkspaceProvisioned` (the idempotent setup).
- `lib/gitea.ts` — Gitea client (orgs, users, PATs, SSH keys).
- `lib/coolify.ts` — Coolify client, tenant helpers, all resource CRUD.
- `lib/coolify-ssh.ts` — SSH transport for tools that need host-level
docker access (`apps.logs`, `apps.exec`). Uses a dedicated
`vibn-logs` user on the Coolify host with docker-group membership
and no shell.
- `lib/coolify-containers.ts` — container enumeration + service
resolution, shared between logs and exec paths.
- `lib/coolify-logs.ts` — compose-aware log tailing.
- `lib/coolify-exec.ts` — one-shot `docker exec` over SSH with
timeout, output caps, and audit logging.
- `lib/naming.ts` — domain policy, slugify, SSH URL templates.
- `lib/ssh-keys.ts` — ed25519 keypair generation + OpenSSH formatting.
- `app/api/workspaces/[slug]/…` — REST surface.

View File

@@ -0,0 +1,47 @@
#!/usr/bin/env bash
# Run as sudo on coolify-server-mtl:
# bash /tmp/setup-vibn-logs-user.sh
#
# Creates a locked-down `vibn-logs` user that the vibn-frontend
# control plane can SSH to. Membership in the `docker` group lets
# it run `docker ps` / `docker logs` without sudo; no shell login,
# no password, single authorized key.
set -euo pipefail
USER=vibn-logs
PUBKEY='ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINkn70ItA4LUZTZDIu8fC8QkuHAewk5VH9ogF+52UTT0 vibn-logs@vibn-frontend'
if id "$USER" &>/dev/null; then
echo "user $USER already exists"
else
useradd -m -s /bin/bash "$USER"
echo "created user $USER"
fi
usermod -aG docker "$USER"
passwd -l "$USER" >/dev/null
mkdir -p "/home/$USER/.ssh"
chmod 700 "/home/$USER/.ssh"
# Exactly one authorized key (force-restrict: no PTY, no agent forwarding,
# no X11 forwarding, no port forwarding). The control plane only needs
# to run docker commands.
AUTH_FILE="/home/$USER/.ssh/authorized_keys"
RESTRICTIONS='no-agent-forwarding,no-port-forwarding,no-X11-forwarding,no-pty'
echo "$RESTRICTIONS $PUBKEY" > "$AUTH_FILE"
chmod 600 "$AUTH_FILE"
chown -R "$USER:$USER" "/home/$USER/.ssh"
echo "$USER ready"
echo " groups: $(id -nG "$USER")"
echo " authorized_keys:"
sed 's/^/ /' "$AUTH_FILE"
# Verify docker access
su - "$USER" -s /bin/bash -c 'docker ps --format "table {{.Names}}" | head -3' || {
echo "⚠ docker access test failed — user may not be able to run docker commands"
exit 1
}
echo "✓ docker access verified"

Submodule vibn-frontend updated: 9959eaeeaa...8c83f8c490