Rewrite VIBNDEV.md as comprehensive infra guide with all lessons learned

This commit is contained in:
2026-05-06 14:24:10 -07:00
parent bc21756eae
commit 87577e69a4

245
VIBNDEV.md Normal file
View File

@@ -0,0 +1,245 @@
# Vibn Development — Infrastructure Reference
## Architecture Overview
```
Your Mac (local dev)
├─ pnpm dev → http://localhost:3000 (vibn-frontend Next.js)
│ ├─ Local Postgres via Docker on port 5433
│ ├─ Reads .env.local (NOT root .env files)
│ ├─ Dev bypass: mark@getacquired.com / NEXT_PUBLIC_DEV_LOCAL_AUTH_EMAIL
│ └─ NEXT_PUBLIC_DEV_BYPASS_PROJECT_AUTH=true skips auth on API routes
├─ gcloud compute ssh → GCP VM (full root access via sudo)
│ Project: master-ai-484822
│ Instance: coolify-server-mtl (northamerica-northeast1-a)
│ IP: 34.19.250.135
├─ SSH → vibn-logs@34.19.250.135 (Docker-only, no shell)
│ Key: ~/.ssh/vibn-logs-local
└─ Git → https://git.vibnai.com/mark/vibn-frontend.git (Gitea)
Coolify Host (GCP VM: coolify-server-mtl, 34.19.250.135)
├─ Coolify API: http://34.19.250.135:8000
│ Token in .coolify.env
├─ vibn-frontend app: y4cscsc8s08c8808go0448s0
│ FQDN: https://vibnai.com
│ Git: https://git.vibnai.com/mark/vibn-frontend.git (main)
│ Deploy: POST /api/v1/deploy?uuid=y4cscsc8s08c8808go0448s0
├─ vibn-api app: m84cc4wsc0ckws8g8k44kkk8
├─ vibn-agent-runner app: jss08wssogw4kw8gok0sk0w0
├─ Traefik: *.vibnai.com + *.preview.vibnai.com wildcard TLS
│ DNS: Cloudflare → 34.19.250.135
└─ Per-project dev containers (vibn-dev image)
Compose files live at: /data/coolify/services/<service_uuid>/
Gitea (https://git.vibnai.com)
Token: in .gitea.env
User: mark
```
## Access
### GCP VM (full access)
```sh
# Always works — no SSH key setup needed
gcloud compute ssh coolify-server-mtl --zone=northamerica-northeast1-a
# Run a command remotely (prefix with sudo for Docker)
gcloud compute ssh coolify-server-mtl --zone=northamerica-northeast1-a \
--command="sudo docker ps"
```
### Coolify API
All calls use the token from `.coolify.env`. Source it first:
```sh
source /Users/markhenderson/master-ai/.coolify.env
```
Then use `$COOLIFY_URL` and `$COOLIFY_API_TOKEN`.
## Local Dev
```sh
cd /Users/markhenderson/master-ai/vibn-frontend
# Start local Postgres
docker compose -f docker-compose.local-db.yml up -d
# Start frontend
pnpm dev
```
`.env.local` needs: `DATABASE_URL`, `NEXTAUTH_URL`, `NEXTAUTH_SECRET`, `NEXT_PUBLIC_DEV_LOCAL_AUTH_EMAIL`, `NEXT_PUBLIC_DEV_BYPASS_PROJECT_AUTH`, `GOOGLE_API_KEY`, `COOLIFY_*`, `GITEA_*`, `VIBN_SECRETS_KEY`, plus optionally `VIBN_CHAT_PROVIDER=deepseek` and `DEEPSEEK_API_KEY`.
## Deploy vibn-frontend
```sh
cd /Users/markhenderson/master-ai/vibn-frontend
git add -A && git commit -m "message" && git push origin main
# Then trigger deploy (correct endpoint for Coolify v4):
source /Users/markhenderson/master-ai/.coolify.env
curl -s -X POST \
-H "Authorization: Bearer $COOLIFY_API_TOKEN" \
"$COOLIFY_URL/api/v1/deploy?uuid=y4cscsc8s08c8808go0448s0"
```
**Note:** `/api/v1/applications/{uuid}/start` or `/deploy` returns 404 on Coolify v4. The correct deploy path is `/api/v1/deploy?uuid=...`. Add `&force=true` to force a full rebuild.
## Coolify API Reference
```sh
# Applications
curl -s -H "Authorization: Bearer $TOKEN" "$URL/api/v1/applications" # list all
curl -s -H "Authorization: Bearer $TOKEN" "$URL/api/v1/applications/<uuid>" # get one
# Services (dev containers, databases, etc.)
curl -s -H "Authorization: Bearer $TOKEN" "$URL/api/v1/services" # list all
curl -s -H "Authorization: Bearer $TOKEN" "$URL/api/v1/services/<uuid>" # get one
curl -s -X POST -H "Authorization: Bearer $TOKEN" "$URL/api/v1/services/<uuid>/start"
curl -s -X POST -H "Authorization: Bearer $TOKEN" "$URL/api/v1/services/<uuid>/stop"
# Deploy (works for both apps and services)
curl -s -X POST -H "Authorization: Bearer $TOKEN" "$URL/api/v1/deploy?uuid=<uuid>"
curl -s -X POST -H "Authorization: Bearer $TOKEN" "$URL/api/v1/deploy?uuid=<uuid>&force=true"
# Deployments
curl -s -H "Authorization: Bearer $TOKEN" "$URL/api/v1/deployments?resource_uuid=<uuid>&per_page=5"
```
There is no `/services/{uuid}/deploy` or `/applications/{uuid}/deploy` — those return 404. Always use `/deploy?uuid=...`.
## vibn-dev Docker Image
### Building
The image must be built ON the x86_64 Coolify host (Mac is ARM):
```sh
cd /Users/markhenderson/master-ai/vibn-dev
# Copy build context to host
gcloud compute scp --zone=northamerica-northeast1-a --recurse . coolify-server-mtl:/tmp/vibn-dev/
# Build on host
gcloud compute ssh coolify-server-mtl --zone=northamerica-northeast1-a \
--command="cd /tmp/vibn-dev && sudo docker build -t vibn-dev:latest ."
# Verify
gcloud compute ssh coolify-server-mtl --zone=northamerica-northeast1-a \
--command="sudo docker images vibn-dev:latest --format '{{.Tag}} {{.Size}} {{.CreatedSince}}'"
```
### Critical: Tag Loss Problem
Every project's docker-compose references `vibn-dev:latest` with `pull_policy: never`. If the `vibn-dev:latest` tag goes missing (e.g., Docker prune, or untagged by a subsequent build), **ALL new dev containers will silently fail** with "No such image." Running containers survive because Docker keeps image layers, but the tag itself is gone.
**Symptoms:**
- New project's dev container stays `exited` in Coolify
- `docker compose up` fails with "No such image: vibn-dev:latest"
- `devcontainer.status` returns `likelyFailed: true` but the AI can't see why
**Fix:** Rebuild the image (see above), then restart the container:
```sh
gcloud compute ssh coolify-server-mtl --zone=northamerica-northeast1-a \
--command="sudo docker compose -f /data/coolify/services/<service_uuid>/docker-compose.yml up -d"
```
### Image Contents
The image is built from `ubuntu:24.04` and includes:
- Node.js LTS (v24.x) + npm
- Python 3.12 + pip
- Go 1.23 (via `/etc/profile.d/go.sh`, only in login shells)
- git, ripgrep, jq, build-essential, curl, wget, lsof, net-tools
- Supervisor + tini
- Runs as user `vibn` (uid 1000), working dir `/workspace`
No mise, nvm, or lazy installers — everything is pre-installed at the OS level.
## Debugging Dev Containers
### Direct Docker inspection (via gcloud)
```sh
# All vibn-dev containers
gcloud compute ssh coolify-server-mtl --zone=northamerica-northeast1-a \
--command="sudo docker ps -a --filter 'name=vibn-dev' --format '{{.Names}} {{.Status}} {{.Image}}'"
# Check if vibn-dev image tag exists (MUST exist for new containers)
gcloud compute ssh coolify-server-mtl --zone=northamerica-northeast1-a \
--command="sudo docker images vibn-dev --format '{{.Tag}} {{.Size}} {{.CreatedSince}}'"
# Docker Compose status — which services are actually running
gcloud compute ssh coolify-server-mtl --zone=northamerica-northeast1-a \
--command="sudo docker compose ls"
# Why did a container exit?
gcloud compute ssh coolify-server-mtl --zone=northamerica-northeast1-a \
--command="sudo docker inspect <name> --format 'ExitCode: {{.State.ExitCode}} Error: {{.State.Error}}'"
# Check tools installed in a container
gcloud compute ssh coolify-server-mtl --zone=northamerica-northeast1-a \
--command="sudo docker exec <name> bash -c 'node --version; npm --version; python3 --version'"
# List compose files on disk
gcloud compute ssh coolify-server-mtl --zone=northamerica-northeast1-a \
--command="sudo ls /data/coolify/services/"
```
### via vibn-logs SSH (limited Docker access)
```sh
ssh -i ~/.ssh/vibn-logs-local vibn-logs@34.19.250.135 \
"docker ps --filter 'name=vibn-dev' --format '{{.Names}} {{.Status}}'"
```
## Common Failure Modes
### 1. Dev container stuck `exited`
**Cause:** `vibn-dev:latest` tag missing from Docker host.
**Fix:** Rebuild image + restart compose (see "vibn-dev Docker Image" section).
### 2. Dev container stuck `provisioning` for minutes
**Cause:** Container never came up (image missing, build failed, resource issue).
**AI sees:** `devcontainer.status → { likelyFailed: true }`. After the latest fix, it also gets `coolifyStatus` and `blockedReason` from Coolify's API.
**Fix:** Check Coolify service status, check Docker directly.
### 3. `npm: command not found` inside container
**Cause:** Container was created before the image was updated to pre-install Node. The old image used mise which was removed.
**Fix:** Rebuild the vibn-dev image and restart the container.
### 4. Dev server shows `npm: command not found` in logs
Same as above — the container doesn't have Node. Rebuild image.
### 5. 15+ stale dev server rows
**Cause:** `startDevServer` wasn't cleaning up old rows when the process died. Each new start created a new row without marking old ones stopped.
**Fix:** Deployed — `startDevServer` now reaps ALL existing rows on the target port before creating a new one. Also force-kills orphaned listeners.
### 6. DeepSeek 400 errors
**Cause:** OpenAI-compatible APIs require `tool_calls` to be immediately followed by matching `tool` messages. Historical messages with stale `toolCalls` (no tool responses persisted) trigger validation errors.
**Fix:** History loading strips `toolCalls` from persisted assistant messages. Diagnostic logging added to `callOpenAiCompatibleChat` — check server logs for `[deepseek]` entries.
### 7. AI loops on `devcontainer.status`
**Cause:** The AI had no visibility into WHY the container was stuck — only got `{ likelyFailed: true }` with no diagnostic detail.
**Fix:** Deployed — `getDevContainerStatus` now fetches Coolify's service status and returns `blockedReason` + `blockedHint`. System prompt tells AI to stop polling and report the reason to the user.
## Key Architecture Decisions
- **No Firebase**: Auth uses NextAuth.js with PostgreSQL.
- **No mise/nvm**: vibn-dev image pre-installs Node, Python, Go at the OS level.
- **Port 3000 default**: Only ports 3000-3009 have Traefik routers pre-allocated per project.
- **DeepSeek compat**: Historical `toolCalls` stripped on load. OpenAI-compatible APIs require tool responses to follow tool calls immediately.
- **Preview priority**: `dev-preview-priority.ts` sorts frontend dev servers first.
- **pull_policy: never**: Dev containers reference the local `vibn-dev:latest` image directly — no registry. The tag must exist on the host.