docs: apps.exec + compose troubleshooting appendix
Documents the new apps.exec tool alongside apps.logs and adds a troubleshooting cookbook covering the common failure patterns we've seen in the wild: missing migrations, silent apps.update reroutes, 502s on compose domains, healthcheck timeouts, and how to use apps.exec as the platform's escape hatch for in-container inspection. Bumps MCP version to 2.2.0 in the changelog and bumps the vibn-frontend submodule to ship the apps.exec implementation. Also includes setup-vibn-logs-user.sh (the script that installs the locked-down SSH user on the Coolify host) which was already running in production but not yet committed. Made-with: Cursor
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -29,3 +29,4 @@
|
|||||||
**/.next/
|
**/.next/
|
||||||
**/.turbo/
|
**/.turbo/
|
||||||
**/coverage/
|
**/coverage/
|
||||||
|
.secrets/
|
||||||
|
|||||||
@@ -110,13 +110,15 @@ Version: **2.1.0**.
|
|||||||
| `apps.list` | All Coolify apps in the workspace. | — |
|
| `apps.list` | All Coolify apps in the workspace. | — |
|
||||||
| `apps.get` | Single app details (status, fqdn, domains, git info). | `{ uuid }` |
|
| `apps.get` | Single app details (status, fqdn, domains, git info). | `{ uuid }` |
|
||||||
| `apps.create` | Create a Coolify app from a Gitea repo in the workspace's org. Clones over **HTTPS with the workspace bot's PAT embedded in the URL** — SSH is not used because Gitea's SSH isn't reachable on the default port. Auto-domain `{name}.{slug}.vibnai.com`. | `{ repo, branch?, name?, ports?, buildPack?, domain?, envs?, instantDeploy?, dockerComposeLocation?, dockerfileLocation?, baseDirectory? }` |
|
| `apps.create` | Create a Coolify app from a Gitea repo in the workspace's org. Clones over **HTTPS with the workspace bot's PAT embedded in the URL** — SSH is not used because Gitea's SSH isn't reachable on the default port. Auto-domain `{name}.{slug}.vibnai.com`. | `{ repo, branch?, name?, ports?, buildPack?, domain?, envs?, instantDeploy?, dockerComposeLocation?, dockerfileLocation?, baseDirectory? }` |
|
||||||
| `apps.update` | PATCH a whitelisted set of fields (name, description, git branch/commit, ports, build commands, base directory, Dockerfile location, docker-compose location…). | `{ uuid, patch }` |
|
| `apps.update` | PATCH a whitelisted set of fields (name, description, git branch/commit, ports, build commands, base directory, Dockerfile location, docker-compose location…). Returns `applied`, `ignored`, and `rerouted` arrays so the agent can see exactly what persisted; setting `fqdn`/`domains`/`docker_compose_domains` returns a `rerouted` entry pointing at `apps.domains.set`, and setting `git_repository` returns one pointing at `apps.rewire_git`. | `{ uuid, patch }` |
|
||||||
| `apps.rewire_git` | Re-point an app's `git_repository` at the canonical HTTPS+PAT clone URL. Use to recover older apps that were created with SSH URLs, or to refresh a rotated bot PAT. | `{ uuid, repo? }` — `repo` optional; inferred from current URL if omitted |
|
| `apps.rewire_git` | Re-point an app's `git_repository` at the canonical HTTPS+PAT clone URL. Use to recover older apps that were created with SSH URLs, or to refresh a rotated bot PAT. | `{ uuid, repo? }` — `repo` optional; inferred from current URL if omitted |
|
||||||
| `apps.delete` | Destroy the app. Volumes kept by default. | `{ uuid, confirm }` — `confirm` must equal the app's exact name |
|
| `apps.delete` | Destroy the app. Volumes kept by default. | `{ uuid, confirm }` — `confirm` must equal the app's exact name |
|
||||||
| `apps.deploy` | Trigger a new deployment. | `{ uuid, force? }` |
|
| `apps.deploy` | Trigger a new deployment. | `{ uuid, force? }` |
|
||||||
| `apps.deployments` | List recent deployments + status. | `{ uuid }` |
|
| `apps.deployments` | List recent deployments + status. | `{ uuid }` |
|
||||||
|
| `apps.logs` | Runtime logs for a running app. Compose-aware: returns per-service logs for `dockercompose` build packs, single stream for `dockerfile`/`nixpacks`. Includes container status and any diagnostic warnings. | `{ uuid, service?, lines? }` — `service` filter (compose only), `lines` default 200, max 5000 |
|
||||||
|
| `apps.exec` | Run a one-shot command inside an app container (via `docker exec` on the Coolify host). Compose-aware: pass `service` when the app has >1 container. Returns `{ container, service, code, stdout, stderr, truncated, durationMs, containerHealth }`. Default timeout 60s (max 10 min); default output cap 1 MB (max 5 MB). Command is run through `sh -lc` so shell syntax works. Use this for database migrations, seeds, CLI invocations, and ad-hoc debugging. Every call is audit-logged (command + target, not output). | `{ uuid, command, service?, user?, workdir?, timeout_ms?, max_bytes? }` |
|
||||||
| `apps.domains.list` | Current domain set. | `{ uuid }` |
|
| `apps.domains.list` | Current domain set. | `{ uuid }` |
|
||||||
| `apps.domains.set` | Replace the domain set. All entries must end with `.{slug}.vibnai.com`. | `{ uuid, domains: string[] }` |
|
| `apps.domains.set` | Replace the domain set. All entries must end with `.{slug}.vibnai.com`. Compose-aware: for `dockercompose` apps the domain is attached to a specific service (`server` by default; override with `service`). | `{ uuid, domains: string[], service? }` |
|
||||||
| `apps.envs.list` | List env vars. Values returned are redacted for `shown-once` secrets. | `{ uuid }` |
|
| `apps.envs.list` | List env vars. Values returned are redacted for `shown-once` secrets. | `{ uuid }` |
|
||||||
| `apps.envs.upsert` | Create or update an env var. `is_build_time` is **ignored** — Coolify derives build-vs-runtime from Dockerfile `ARG` usage. | `{ uuid, key, value, isPreview?, isMultiline?, isLiteral?, isShownOnce? }` |
|
| `apps.envs.upsert` | Create or update an env var. `is_build_time` is **ignored** — Coolify derives build-vs-runtime from Dockerfile `ARG` usage. | `{ uuid, key, value, isPreview?, isMultiline?, isLiteral?, isShownOnce? }` |
|
||||||
| `apps.envs.delete` | Delete an env var. | `{ uuid, key }` |
|
| `apps.envs.delete` | Delete an env var. | `{ uuid, key }` |
|
||||||
@@ -579,22 +581,136 @@ The MCP descriptor at `GET /api/mcp` reports a semver `version`. Tool names
|
|||||||
are append-only within a major version — agents can cache the tool list
|
are append-only within a major version — agents can cache the tool list
|
||||||
safely for the duration of a conversation but should re-fetch on 404.
|
safely for the duration of a conversation but should re-fetch on 404.
|
||||||
|
|
||||||
Current version: **2.1.0**.
|
Current version: **2.2.0**.
|
||||||
|
|
||||||
- **1.x** — session-cookie-only MCP, no tenant keys.
|
- **1.x** — session-cookie-only MCP, no tenant keys.
|
||||||
- **2.0** — `vibn_sk_…` keys, workspace-scoped Gitea bot + Coolify project.
|
- **2.0** — `vibn_sk_…` keys, workspace-scoped Gitea bot + Coolify project.
|
||||||
- **2.1** — create/update/delete for apps, 8 database flavors, auth
|
- **2.1** — create/update/delete for apps, 8 database flavors, auth
|
||||||
provider allowlist, domain policy enforcement, confirm-gated deletes.
|
provider allowlist, domain policy enforcement, confirm-gated deletes.
|
||||||
|
- **2.2** — per-workspace GCS object storage (`storage.*`), compose-aware
|
||||||
|
domain routing, runtime log tailing (`apps.logs`), in-container command
|
||||||
|
execution (`apps.exec`), and diagnostic `apps.update` responses.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 11. Where to look in the code
|
## 11. Troubleshooting compose apps
|
||||||
|
|
||||||
|
Most real-world app failures fall into a small number of patterns. The
|
||||||
|
recipes below are the canonical diagnostic flow for an agent operating
|
||||||
|
on behalf of a user.
|
||||||
|
|
||||||
|
### 11.1 "Deployment succeeds but the app keeps restarting"
|
||||||
|
|
||||||
|
Agents should NOT trust Coolify's deployment status alone. A successful
|
||||||
|
build + healthcheck-pending response usually means the containers came
|
||||||
|
up but the app logic is crashing. Investigate with:
|
||||||
|
|
||||||
|
1. `apps.logs { uuid, lines: 300 }` — look for `warnings` (empty
|
||||||
|
services indicate containers never ran) and per-service stderr.
|
||||||
|
2. If the logs show repeated DB errors like `relation "xxx" does not
|
||||||
|
exist` or `pq: no such table`, the app skipped its migration step.
|
||||||
|
This is common for Docker Compose apps whose `server` service only
|
||||||
|
runs migrations on a separate `worker` command.
|
||||||
|
3. Run the app's migration CLI via `apps.exec`, e.g. for Twenty:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"action": "apps.exec",
|
||||||
|
"params": {
|
||||||
|
"uuid": "<app-uuid>",
|
||||||
|
"service": "server",
|
||||||
|
"command": "yarn command:prod database:migrate:prod",
|
||||||
|
"timeout_ms": 300000
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Re-check logs — errors should be gone. Then `apps.deploy` (or just
|
||||||
|
wait for the next restart) and verify the container reports
|
||||||
|
`healthy`.
|
||||||
|
|
||||||
|
### 11.2 "`apps.update` returned success but nothing changed"
|
||||||
|
|
||||||
|
Check the `applied` / `ignored` / `rerouted` arrays in the response.
|
||||||
|
The most common reroutes:
|
||||||
|
|
||||||
|
- `fqdn`, `domains`, `docker_compose_domains` → use `apps.domains.set`.
|
||||||
|
- `git_repository` → use `apps.rewire_git` (rewrites the clone URL with
|
||||||
|
the workspace's Gitea PAT embedded).
|
||||||
|
- `build_pack` — changing this mid-life for an existing app is not
|
||||||
|
supported. Recreate the app.
|
||||||
|
|
||||||
|
### 11.3 "Compose app is up but the domain 502s"
|
||||||
|
|
||||||
|
Coolify's API treats compose and single-container apps differently:
|
||||||
|
compose apps use `docker_compose_domains` (array of `{name, domain}`),
|
||||||
|
single-container apps use `domains` (comma-separated string).
|
||||||
|
`apps.domains.set` handles both, but if you're seeing a 502:
|
||||||
|
|
||||||
|
1. `apps.domains.list { uuid }` — confirm the domain is actually
|
||||||
|
attached to a **service** (not just the app).
|
||||||
|
2. `apps.exec { uuid, service: "server", command: "nc -vz localhost <port>" }`
|
||||||
|
— verify the upstream container is listening.
|
||||||
|
3. `apps.logs { uuid, service: "server", lines: 200 }` — look for
|
||||||
|
startup errors like `EADDRINUSE` or config failures.
|
||||||
|
|
||||||
|
### 11.4 "Healthcheck times out on first deploy"
|
||||||
|
|
||||||
|
Docker Compose healthchecks have a `start_period` grace window. Apps
|
||||||
|
that run long-running migrations on first boot (Twenty, Directus,
|
||||||
|
older Strapi versions) need a `start_period` that covers the cold
|
||||||
|
start, typically 120–600s.
|
||||||
|
|
||||||
|
- Fix at the compose level: edit the repo's `docker-compose.yml` to
|
||||||
|
set `healthcheck.start_period: 300s` on the affected service, commit,
|
||||||
|
push, `apps.deploy`.
|
||||||
|
- Alternatively, handle migrations out-of-band via `apps.exec` and let
|
||||||
|
the default healthcheck succeed instantly.
|
||||||
|
|
||||||
|
### 11.5 "I can't tell what's inside the container"
|
||||||
|
|
||||||
|
`apps.exec` is the escape hatch. Useful shell one-liners:
|
||||||
|
|
||||||
|
| Goal | Command |
|
||||||
|
|---|---|
|
||||||
|
| List running processes | `ps -ef` |
|
||||||
|
| Show env vars | `env \| sort` |
|
||||||
|
| Check file exists | `ls -la /path/to/file` |
|
||||||
|
| Test DB connection | `nc -vz postgres 5432` or `psql $POSTGRES_URL -c 'select 1'` |
|
||||||
|
| Tail an app's internal log | `tail -200 /var/log/app.log` |
|
||||||
|
| Run a framework CLI | `yarn <script>`, `npm run <script>`, `python manage.py <cmd>` |
|
||||||
|
| Inspect filesystem diff vs image | `find /app -newer /tmp/marker -type f 2>/dev/null` |
|
||||||
|
|
||||||
|
Output is capped at 1 MB by default (bump with `max_bytes`). Commands
|
||||||
|
that could exceed the wall-clock timeout should bump `timeout_ms`
|
||||||
|
(max 600000 = 10 minutes).
|
||||||
|
|
||||||
|
### 11.6 "The agent wants to run something interactively"
|
||||||
|
|
||||||
|
It can't. `apps.exec` is strictly non-interactive: no TTY, no stdin,
|
||||||
|
no session resumption. For migrations and CLI invocations this is the
|
||||||
|
right shape. For genuinely interactive work (a debug shell), the
|
||||||
|
operator needs SSH + `docker exec -it` directly — outside the
|
||||||
|
platform's AI surface.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. Where to look in the code
|
||||||
|
|
||||||
- `lib/auth/workspace-auth.ts` — `requireWorkspacePrincipal`, the gate.
|
- `lib/auth/workspace-auth.ts` — `requireWorkspacePrincipal`, the gate.
|
||||||
- `lib/auth/secret-box.ts` — AES-256-GCM encryption of Gitea PATs.
|
- `lib/auth/secret-box.ts` — AES-256-GCM encryption of Gitea PATs.
|
||||||
- `lib/workspaces.ts` — `ensureWorkspaceProvisioned` (the idempotent setup).
|
- `lib/workspaces.ts` — `ensureWorkspaceProvisioned` (the idempotent setup).
|
||||||
- `lib/gitea.ts` — Gitea client (orgs, users, PATs, SSH keys).
|
- `lib/gitea.ts` — Gitea client (orgs, users, PATs, SSH keys).
|
||||||
- `lib/coolify.ts` — Coolify client, tenant helpers, all resource CRUD.
|
- `lib/coolify.ts` — Coolify client, tenant helpers, all resource CRUD.
|
||||||
|
- `lib/coolify-ssh.ts` — SSH transport for tools that need host-level
|
||||||
|
docker access (`apps.logs`, `apps.exec`). Uses a dedicated
|
||||||
|
`vibn-logs` user on the Coolify host with docker-group membership
|
||||||
|
and no shell.
|
||||||
|
- `lib/coolify-containers.ts` — container enumeration + service
|
||||||
|
resolution, shared between logs and exec paths.
|
||||||
|
- `lib/coolify-logs.ts` — compose-aware log tailing.
|
||||||
|
- `lib/coolify-exec.ts` — one-shot `docker exec` over SSH with
|
||||||
|
timeout, output caps, and audit logging.
|
||||||
- `lib/naming.ts` — domain policy, slugify, SSH URL templates.
|
- `lib/naming.ts` — domain policy, slugify, SSH URL templates.
|
||||||
- `lib/ssh-keys.ts` — ed25519 keypair generation + OpenSSH formatting.
|
- `lib/ssh-keys.ts` — ed25519 keypair generation + OpenSSH formatting.
|
||||||
- `app/api/workspaces/[slug]/…` — REST surface.
|
- `app/api/workspaces/[slug]/…` — REST surface.
|
||||||
|
|||||||
47
scripts/setup-vibn-logs-user.sh
Normal file
47
scripts/setup-vibn-logs-user.sh
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# Run as sudo on coolify-server-mtl:
|
||||||
|
# bash /tmp/setup-vibn-logs-user.sh
|
||||||
|
#
|
||||||
|
# Creates a locked-down `vibn-logs` user that the vibn-frontend
|
||||||
|
# control plane can SSH to. Membership in the `docker` group lets
|
||||||
|
# it run `docker ps` / `docker logs` without sudo; no shell login,
|
||||||
|
# no password, single authorized key.
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
USER=vibn-logs
|
||||||
|
PUBKEY='ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINkn70ItA4LUZTZDIu8fC8QkuHAewk5VH9ogF+52UTT0 vibn-logs@vibn-frontend'
|
||||||
|
|
||||||
|
if id "$USER" &>/dev/null; then
|
||||||
|
echo "user $USER already exists"
|
||||||
|
else
|
||||||
|
useradd -m -s /bin/bash "$USER"
|
||||||
|
echo "created user $USER"
|
||||||
|
fi
|
||||||
|
|
||||||
|
usermod -aG docker "$USER"
|
||||||
|
passwd -l "$USER" >/dev/null
|
||||||
|
|
||||||
|
mkdir -p "/home/$USER/.ssh"
|
||||||
|
chmod 700 "/home/$USER/.ssh"
|
||||||
|
|
||||||
|
# Exactly one authorized key (force-restrict: no PTY, no agent forwarding,
|
||||||
|
# no X11 forwarding, no port forwarding). The control plane only needs
|
||||||
|
# to run docker commands.
|
||||||
|
AUTH_FILE="/home/$USER/.ssh/authorized_keys"
|
||||||
|
RESTRICTIONS='no-agent-forwarding,no-port-forwarding,no-X11-forwarding,no-pty'
|
||||||
|
echo "$RESTRICTIONS $PUBKEY" > "$AUTH_FILE"
|
||||||
|
chmod 600 "$AUTH_FILE"
|
||||||
|
chown -R "$USER:$USER" "/home/$USER/.ssh"
|
||||||
|
|
||||||
|
echo "✓ $USER ready"
|
||||||
|
echo " groups: $(id -nG "$USER")"
|
||||||
|
echo " authorized_keys:"
|
||||||
|
sed 's/^/ /' "$AUTH_FILE"
|
||||||
|
|
||||||
|
# Verify docker access
|
||||||
|
su - "$USER" -s /bin/bash -c 'docker ps --format "table {{.Names}}" | head -3' || {
|
||||||
|
echo "⚠ docker access test failed — user may not be able to run docker commands"
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
echo "✓ docker access verified"
|
||||||
Submodule vibn-frontend updated: 9959eaeeaa...8c83f8c490
Reference in New Issue
Block a user