Coolify's Service::parse() (called inside the deploy job) iterates ALL
ServiceApplications in a multi-container template and overwrites
SERVICE_URL_<NAME> / SERVICE_FQDN_<NAME> env vars based on whichever
inner app's fqdn it sees first — frequently a worker still pointing at
the auto-generated *.sslip.io fallback. The result: the user's custom
domain is saved, the SPA loads, but its baked-in REACT_APP_SERVER_BASE_URL
points at sslip.io and every API call 404s.
apps_domains_set now substitutes ${SERVICE_URL_<NAME>} and
${SERVICE_FQDN_<NAME>} (with optional _<port> suffix) directly in the
docker_compose_raw before calling updateCompose(), and stops calling
service->parse() — so the literal URL survives any future redeploy.
Solves Twenty CRM domain bug; also unlocks reliable custom domains for
n8n, Plausible, Mautic, and every other Coolify service template that
follows the same SERVICE_URL_X pattern.
Made-with: Cursor
Two product gaps surfaced from the twenty-live debugging session:
1. Vibn frontend now has a healthcheck on / port 3000. Coolify will
wait for the new container to be healthy before swapping traffic,
so deploys no longer drop in-flight chat SSE streams. (Setting was
applied via Coolify API; commit just documents.)
2. apps_domains_set now handles SERVICES (template-based apps like
Twenty CRM, n8n) — not just applications. Setting service_apps.fqdn
in the DB alone gets reverted by Coolify's deploy pipeline, so we
replicate the Livewire EditDomain.php save flow via tinker over SSH:
write fqdn → save → updateCompose() → service.parse(). After this
apps_deploy regenerates Traefik labels with the custom domain.
Auto-detects service vs application by uuid lookup. New { port }
parameter lets the AI pin the upstream port for services that
require one (Coolify hard-fails the save without it).
Tool description rewritten with the new behavior + a worked example
so the AI uses the right pipeline first try.
Made-with: Cursor
Round two of AI-hardening based on what bit us with the twenty-* fan-out:
1. apps_create idempotency now covers ALL four pathways (template /
image / composeRaw / repo), not just templates. Same dedup-by-name
check inside the project, same alreadyExisted: true response shape.
Pass force: true to opt out for legitimate dev/staging duplicates.
2. databases_create gets the same idempotency treatment — and now
also scopes to the per-project Coolify project when projectId is
supplied (previously only apps_create did this).
3. New shared helper findExistingResourceByName scans apps + services
+ databases in a project and matches case-insensitively on name.
4. System prompt: three new hard rules teaching the model how to
handle tool results and anchor on reality:
- Tool results are authoritative; conversation history is not.
If a tool contradicts an earlier assertion, discard the
assertion. Don't keep telling the user it's broken when
apps_get now says it's healthy.
- When the user reports an error, FIRST tool call is a
current-state read (apps_get / databases_get / apps_logs).
Stop re-debugging problems that were already fixed.
- Trust idempotency. alreadyExisted means done; don't loop
trying a different name.
Made-with: Cursor
Three changes that compound to fix the "4 orphan twenty-* services"
problem we just hit:
1. apps_create is now idempotent within a project. If a service from
the same template already exists in the same Vibn projectId, return
it with alreadyExisted: true instead of creating a clone. Pass
{ force: true } to opt out for legitimate dev/staging duplicates.
2. New apps_unstick tool. SSH-cleans orphan Docker containers
matching the resource UUID so a deploy that hit "Conflict.
The container name X is already in use" can recover without
deleting the entire service.
3. System prompt hardened with two new hard rules:
- ALWAYS apps_list before apps_create (idempotency in spirit, not
just at the API boundary)
- NEVER delete-and-recreate a service to escape an error. The
recovery for container conflicts is apps_unstick + apps_deploy.
Already cleaned the 3 duplicate twenty-* services from prod
(kept twenty-live, freshest healthy). Frees ~9 GB RAM on the host.
Made-with: Cursor
Adds MCP tools so the AI can capture decisions, tasks, ideas, and the
vision in the moment instead of just reading them:
- plan_get read full plan for context
- plan_vision_set update vision when user refines their pitch
- plan_decision_log log a decision PROACTIVELY when one gets settled
(no permission ask) so the next session doesn't
re-litigate it
- plan_task_add track multi-step work or user-side follow-ups
- plan_task_complete mark done as we go
- plan_idea_add park stray ideas
System prompt is updated with a "be the user's scribe" section that
instructs the model to use these proactively with brief acks instead
of long confirmations.
Also reorders the Plan tab UI to: Vision · Tasks · Decisions · Ideas
(Ideas moved to bottom — it's the lowest-signal pile).
Made-with: Cursor
Anatomy + UI rewrite — locked the conceptual model after user feedback:
Product = "what makes up the thing you're shipping":
- Codebases (Gitea repos)
- Images (Coolify services backed by upstream Docker images: Twenty
CRM, n8n, etc.)
- Dev containers no longer surface here. The vibn-dev-* container is
the AI's workshop, not a product surface; previews it serves still
appear under Hosting → Previews.
Hosting = "where it lives + how it gets there", unified:
- Live: every running endpoint as one list. Each item carries a
source badge ("repo" | "image"), status dot, attached domain, and
last-build summary inline. No separate Build, Domains or Services
categories — those are properties on each Live item.
- Previews: dev container preview URLs (unchanged).
Anatomy endpoint reshaped accordingly:
- product.{codebases, images}
- hosting.{live, previews} (was production/services/previewUrls/domains)
- lastBuild summary fetched per repo-app via listApplicationDeployments
in parallel.
ProjectStagePill rewired to derive Live/Down/Building from hosting.live
+ hosting.previews. dev-container-detail.tsx removed.
services.* MCP tools added so AI agents can manage Coolify services
(Twenty CRM, n8n, …) the same way they manage apps:
- services.list, services.get
- services.start, services.stop
- services.envs.list, services.envs.upsert
All tenant-scoped via getServiceInWorkspace + getOwnedCoolifyProjectUuids.
vibn-dev-* containers stay hidden from services.list.
Made-with: Cursor
After "ship" succeeded the AI was burning 7+ follow-up tool calls
(gitea_repos_list, gitea_credentials, shell.exec×4, apps_list) trying
to verify what actually got pushed and where it deployed. That ate
through MAX_TOOL_ROUNDS and the user got tool-icon spam with no
narrative summary.
Three fixes:
1. ship now returns commitSha (parsed from `git rev-parse HEAD`),
giteaCommitUrl, giteaBranchUrl, coolifyDeployUrl, coolifyAppUuid,
and a summaryHint string telling the AI exactly what to say next.
2. ship's tool description now explicitly tells Gemini "do NOT call
gitea_*, shell_exec, or apps_* afterwards to verify — the result
is authoritative."
3. MAX_TOOL_ROUNDS 12 → 18 as a safety net for genuinely long chains.
Net effect: ship goes from ~12 tool calls to verify a deploy down to
1 (just ship itself), and the next text turn has the SHA + URL
inline.
Made-with: Cursor
Five focused improvements rolled into one deploy:
1. Pre-allocated preview ports + Traefik labels.
Bake docker labels for ports 3000-3009 into every dev-container
compose at ensureDevContainer() time. Each port has its own
subdomain: preview-<slot>-<projectSlug>-<token>.preview.vibnai.com.
Token is derived from projectId so URLs are stable across restarts
but not enumerable across projects. Joins the coolify external
network so Traefik can reach the container.
This avoids the runtime compose-mutation approach (which would
have required a Coolify redeploy on every dev_server.start, ~30s
latency). The trade-off is a hard cap of 10 concurrent dev servers
per project — fine for the "frontend + API" scenario, the only one
we can practically envision.
Wildcard DNS + Traefik DNS-01 cert remain a manual one-time setup
(see vibn-dev/PREVIEWS.md).
2. dev_server.start: port-collision handling.
Detect listeners via `ss` + `lsof` before launching. Three outcomes:
- port out of slot range → PortOutOfRangeError → 400 with allowedRange
- port owned by a different process → PortBusyError → 409
- port owned by a tracked vibn dev server (same project) → kill
the stale row and reuse the slot (most-recent-write-wins; matches
AI mental model when it does an edit-restart loop)
Surfaced via dedicated MCP error codes so the AI can recover
intelligently instead of looping the same start call.
3. gitea_file_{read,write,delete}: hard-removed from AI tool list.
These tools competed with fs.* and tempted the AI into the slow
path. Pulled from VIBN_TOOL_DEFINITIONS but kept in the MCP
dispatcher for 30 days for any external clients still using them.
System prompt rewritten to make Path B the only documented way to
author code; gitea_repo_* + gitea_branches_* remain because they
handle one-time orchestration with no fs.* equivalent.
4. System prompt: HMR + preview-port discipline.
New section covering Vite HMR (clientPort:443 wss), Next dev
(-H 0.0.0.0), and Express (HOST=0.0.0.0). Explicit "ports must be
3000-3009" rule + "if PORT_BUSY don't blindly retry" guidance.
5. Cron docs (vibn-dev/CRON.md).
/etc/cron.d/vibn-path-b template + smoke commands for autosave
and idle-sweep. Wires both 5-minute jobs that already have admin
endpoints (POST /api/admin/path-b/{autosave,idle-sweep}).
MCP version bump 2.6.0 -> 2.7.0. Smoke test: 65 tool defs (down from
68 after gitea_file_* removal), all accepted by Gemini.
Made-with: Cursor
Two bugs caught by the live end-to-end test:
1. Tool dispatch mismatch.
Gemini tool name "dev_server_list" runs through executeMcpTool's
_-to-. converter (toolName.replace(/_/g, '.')) and arrives as
"dev.server.list". The dispatcher only had cases for "dev_server.list",
so all four dev_server.* tools 404'd as "Unknown tool".
The AI gracefully fell back to shell.exec + nohup, so Express still
ran — but the dev_servers table never got populated and the preview
URL machinery was bypassed. Add aliases for both underscore and
fully-dotted forms.
2. State machine never transitioned.
ensureDevContainer wrote state='provisioning'; nothing ever flipped
it to 'running'. As a result the idle-sweep (which filters by
state='running') never saw a candidate to suspend.
Use the first successful exec as the authoritative liveness signal:
touchActivity() now also flips provisioning|suspended → running and
clears suspended_at.
Surfaced by the live trace: AI tried dev_server_list, got 404, fell
back to manually grepping the process table.
Made-with: Cursor
Completes the rest of the Path B tool surface:
- dev_server.{start,stop,list,logs}: nohup processes inside the dev
container, track PID/port/preview-url in fs_dev_servers. Each gets
a randomized preview subdomain (preview.vibnai.com base; Traefik
wildcard wiring is staged in /vibn-dev/PREVIEWS.md but the Coolify
compose hot-update step is deferred — see file for the recommended
pre-allocated-port-range approach).
- ship: git init (if needed) -> add/commit/push to the project's
Gitea repo via the workspace bot PAT, then triggers a Coolify
production deploy if the project is linked to one. Returns push
output + deployment_uuid.
- /api/admin/path-b/autosave [POST { projectId | sweep:true }]:
force-pushes /workspace to vibn-autosave/main in Gitea. Throttled
to once per 5 min per project. Records every push in fs_dev_autosaves
for audit. Treat Gitea as canonical, container disk as ephemeral.
- /api/admin/path-b/idle-sweep [POST?minutes=30]: suspends every
running dev container whose last_active_at is older than `minutes`.
Wire to a 5-min cron. Idempotent.
- Compose template hardened: pull_policy: never (use locally-built
image, no registry round-trip) + per-project bridge network
(vibn-dev-net-<slug>) so dev containers can't reach internal Vibn
services.
- vibn-dev/setup-on-coolify.sh: one-shot script to build vibn-dev:latest
on the Coolify host. Run before first chat session uses Path B.
- vibn-tools.ts: dev_server_{start,stop,list,logs} + ship Gemini tool
defs added. Smoke test passes — 68 tool definitions accepted.
- MCP version 2.5.0 -> 2.6.0 so /api/mcp tells us when the new build
is live.
Plan doc updated to reflect what shipped vs what's still manual
(DNS wildcard, Traefik cert, build-on-host script run, gitea_file_*
hard-remove deferred to allow A/B).
Made-with: Cursor
Kicks off Path B (AI_PATH_B_EXECUTION_PLAN.md): each Vibn project gets
its own vibn-dev Coolify service that the AI drives directly via shell
and filesystem tools. Sub-second iteration vs the 5-min Gitea redeploy
loop.
What's in this commit (week 1, slice 1):
- vibn-dev Dockerfile: small Ubuntu base (~500 MB target). git, ripgrep,
python3, mise. Language toolchains lazy-install on first use.
- lib/dev-container.ts: ensureDevContainer / suspend / resume /
execInDevContainer. Backed by a new fs_project_dev_containers table.
- lib/feature-flags.ts + /api/admin/path-b/{disable,enable}: kill switch.
Bearer NEXTAUTH_SECRET flips path_b_disabled, propagates in ~10s.
- New MCP tools wired into /api/mcp: devcontainer.{ensure,status,suspend},
shell.exec, fs.{read,write,edit,list,delete,glob,grep}. All enforce
workspace isolation via fs_projects ownership check.
- vibn-tools.ts: 11 new Gemini tool defs (smoke test passes, 63 total).
- chat system prompt: shell-first guidance; gitea_file_* marked
deprecated for iterative work (still available, removed week 3).
Safety nets baked in:
- pathBGuard() returns 503 from every Path B tool when the kill switch
flips
- fs.* paths locked to /workspace
- ensureResourceInWorkspaceProjects via fs_project_dev_containers PK
- per-project resource limits (1 vCPU, 1 GiB RAM) on the compose spec
Still pending (queued):
- dev_server.* (preview URLs through Traefik)
- ship tool (push to Gitea + trigger prod deploy)
- auto-push autosave to vibn-autosave/main every 5 min
- idle-suspend cron after 30 min inactivity
- HMR-through-Traefik spike
- eval harness
Made-with: Cursor
Closes the AI's self-reported gap: "I cannot directly commit or push code".
New MCP capabilities (8) — all scoped to the workspace's Gitea org via
requireGiteaOrg + ensureRepoOwnerInOrg:
- gitea.repos.list — discover existing repos
- gitea.repo.get — metadata (default branch, clone URL)
- gitea.repo.create — mint a new private repo with auto-init
- gitea.file.read — read a file (or list a directory)
- gitea.file.write — create/update one file in one commit
- gitea.file.delete — delete a file (auto-resolves sha)
- gitea.branches.list — list branches with head sha
- gitea.branch.create — branch off an existing branch
Wired through:
- lib/gitea.ts: giteaReadFile, giteaListContents, giteaListBranches,
giteaCreateBranch, giteaListOrgRepos, giteaDeleteFile.
- lib/ai/vibn-tools.ts: 8 new Gemini tool declarations (53 total).
- app/api/chat/route.ts: system prompt now teaches the end-to-end
scaffold-then-deploy recipe so the AI stops deferring to the user.
MCP capability descriptor bumped to version 2.5.0.
Made-with: Cursor
Stage 3 of per-project Coolify isolation. Adds an authoritative ownership
table so apps_list { projectId } returns ONLY the resources actually owned
by that Vibn project, even when multiple Vibn projects share a single
Coolify project (the legacy workspace-level vibn-ws-{slug}).
- New table fs_project_resources (project_id, resource_uuid, type, workspace).
Auto-created on first use.
- lib/projects.ts: linkResourceToProject / unlinkResource /
getProjectResourceUuids / getProjectIdForResource helpers.
- apps_list { projectId }: when the project's coolifyProjectUuid equals the
legacy workspace project, restrict results to explicitly-linked resources.
When it has a dedicated Coolify project, return everything in that project.
- apps_create / databases_create: auto-link the newly-created resource to
the requesting Vibn project.
- apps_delete / databases_delete / services_delete: unlink on success.
- projects_get → possibleDeployments: prefer explicit links; fuzzy-match
fallback only fires when no link table entry exists yet.
- POST /api/projects/backfill-isolation: idempotent migration that mints a
dedicated Coolify project for every Vibn project AND records existing
coolifyServiceUuid/coolifyAppUuid/coolifyDatabaseUuid links. Resolves
the "Twenty CRM project shows n8n" bug for legacy projects without
needing to physically move services in Coolify.
Made-with: Cursor
Each Vibn project now gets its OWN Coolify project named
vibn-{workspace-slug}-{project-slug}. All apps/databases/services
deployed for the project land inside that Coolify project, giving
us clean grouping, cascading delete, and per-project domain
namespaces.
Changes:
- New lib/projects.ts: ensureProjectCoolifyProject (idempotent
create/lookup), getProjectCoolifyUuid, getOwnedCoolifyProjectUuids
- /api/projects/create: pre-insert row, mint per-project Coolify
project, then complete the row with productData (preserves the
coolifyProjectUuid that was just set)
- apps.list (MCP): without projectId, aggregates across ALL
workspace-owned Coolify projects; with projectId, scopes to
that project's Coolify project. Returns coolifyProjectUuid
on each result so the AI knows where things live.
- apps.create (MCP): accepts projectId; auto-mints the Vibn
project's Coolify project on first deploy if missing
- apps_list/apps_create tool defs: projectId param surfaced
- System prompt: Project as first-class — planning + live as
facets of ONE thing, never as separate worlds. AI told to
always pass projectId on apps_create.
Stage 2 (next): set-aware ensureResourceInProject across all
single-resource MCP tools (apps.get/delete/exec/etc.) and
cascading delete via projects.delete.
Made-with: Cursor
Project records and Coolify deployments live in separate worlds —
nothing writes the Coolify UUID back into fs_projects.data on deploy.
Now projects.get also scans apps + services in the workspace and
returns any whose name fuzzy-matches the project (lowercased token
overlap), plus any explicitly-linked one. Self-healing forever; the
AI can immediately tell the user what's running for a project even
when the link was never stored.
Made-with: Cursor
Coolify's /api/v1/services response does not include a `project` field.
Services belong to environments and environments belong to projects.
The old filter checked s.project.uuid (always undefined) and silently
dropped every service from the result, so compose-stack apps like
Twenty CRM never showed up in apps.list.
Now we resolve the project's environment IDs via getProject() and
filter services where environment_id is in that set. Also surface the
public service's fqdn in the response (extracted from s.applications)
so the AI can immediately tell the user where the app lives.
Made-with: Cursor
projects_get was dumping raw JSONB including turborepo scaffold fields
(product/website/admin/storybook sub-app configs), which Gemini mistook
for live deployed services. Now returns a clean summary with only the
fields relevant to the AI. Also updated the system prompt to explicitly
distinguish Vibn project records (planning artifacts) from Coolify apps
(actual running services), instructing the model to call apps_list when
the user asks what's live.
Made-with: Cursor
The Coolify UI shows a "Required Port: 3000 — All domains must
include this port number" hint on service templates. That hint is
load-bearing: when the URL passed to `setServiceDomains` includes
:<upstream_port>, Coolify's template engine auto-generates everything
that 2.4.5-2.4.7 were doing by hand:
- traefik.http.services.<svc>.loadbalancer.server.port label
- SERVICE_FQDN_<APP>=<fqdn> (no sslip.io leak)
- SERVICE_URL_<APP>=https://<fqdn>
- SERVICE_FQDN_<APP>_<PORT>=<fqdn>:<port>
- SERVICE_URL_<APP>_<PORT>=https://<fqdn>:<port>
Verified end-to-end with twenty:
setServiceDomains(uuid, [{ name:'twenty', url:'https://crm.mark.vibnai.com:3000' }])
followed by `compose up -d --force-recreate twenty` produced HTTP/2
200 from https://crm.mark.vibnai.com on first hit, with the
loadbalancer label present, .env clean, and zero env-rewriting
required.
Changes:
- apps.create template path now reads template.port from the catalog
and calls setServiceDomains with https://<fqdn>:<port>
- listServiceTemplates now accepts port as either number or numeric
string (Coolify ships both shapes in the catalog)
- applyCoolifyPostDeployFixes simplified from ~200 lines to ~50:
drops env rewrite, label injection, and force-recreate steps;
keeps proxy network attach + (background) proxy restart
- CoolifyPostDeployResult.steps shrinks to { proxyNetwork, proxyRestart }
- Removes the python:3-alpine SSH dependency entirely
- buildPythonRunner helper removed
Made-with: Cursor
The post-deploy step that restarts coolify-proxy was running
synchronously inside the HTTP request handler. coolify-proxy is the
same gateway that's serving the request itself, so the restart
killed our outbound response mid-flight — the agent saw curl exit
16 (HTTP/2 framing error) instead of our nicely-formatted result.
Switch to a fire-and-forget shell:
nohup sh -c '(sleep 3 && docker restart coolify-proxy) ...' &
The SSH command returns within ~50ms, we finish the HTTP response,
and Traefik re-discovers labels 3s later — same end state as before
but without breaking the calling request.
Made-with: Cursor
Two fixes for transient Coolify queue lag observed when smoke-testing
v2.4.5:
1. ensureServiceReachable no longer false-fails on early `exited` status.
Coolify's queue worker can take 60-120s to dequeue a `start` request;
during that window service.applications[*].status returns the stale
`exited` (= "never started") state. Previously the polling loop
treated that as terminal failure after 90s, returning started:false
on stacks that were about to come up healthy.
The new logic requires "evidence of activity" (status starting:* or
running:* seen at least once) before treating subsequent `exited`
reports as terminal. Until activity is observed, the loop just keeps
polling up to the 8-min health timeout.
2. apps.repair (new tool). Re-runs the three post-deploy patches
(env rewrite, traefik port label, coolify-proxy network attach +
force-recreate + proxy restart) against an existing service without
recreating it. Useful when:
- apps.create returned started:false but containers eventually
came up (now the polling fix should make this rare)
- a deploy succeeded mechanically but is serving Traefik 503 or
Mixed Content
- a user rotates a custom domain on an existing app
Params: { uuid, fqdn, publicAppName, port? }
Returns: { reachable, postDeploy: { steps }, probe }
Version bumped to 2.4.6.
Made-with: Cursor
apps.create for service templates now lets Coolify's queue do the
full deploy (compose generation, volumes, internal networking,
healthchecks) and applies three surgical post-deploy fixes that
Coolify's REST API does NOT expose:
1. Rewrites SERVICE_FQDN_* / SERVICE_URL_* in the rendered .env so
frontends that bake their backend URL into the SPA bundle
(Twenty's SERVER_URL, n8n, etc.) point at the real custom domain
instead of the auto-generated sslip.io URL. Without this fix
Twenty's frontend loads on the real HTTPS domain but fires XHRs
at insecure sslip.io, blocking everything as Mixed Content.
2. Injects the missing
traefik.http.services.<svc>.loadbalancer.server.port label.
Coolify generates the routing rules but forgets the port, so
Traefik logs "error: port is missing" and returns 503 forever.
3. Connects coolify-proxy to the project network (Coolify writes a
caddy_ingress_network=<uuid> hint label but never actually runs
docker network connect), then force-recreates ONLY the
public-facing container so the new env+label apply, and
restarts the proxy so Traefik re-discovers.
Polling switches from service.status (which routinely lies as
"starting:unknown" while containers are actually healthy) to the
truthful per-application service.applications[*].status field.
Removes the SSH "docker compose up -d" fallback that v2.4.1-2.4.4
used. That fallback bypassed Coolify's full pipeline, causing
internal services like Postgres/Redis to land on the shared coolify
network where DNS aliases collided with coolify-db/coolify-redis,
producing the "password authentication failed" regression we saw
on Twenty deploys. With v2.4.5 internal services stay on their
isolated project network — only the public app crosses to the
proxy.
Response shape gains: reachable (boolean for HTTPS 2xx/3xx),
appStatus (truthful per-app status from Coolify), postDeploy
(step-by-step diagnostic for each of the three fixes). Existing
started/startDiag fields kept for back-compat.
apps.containers.up / apps.containers.ps remain unchanged for
manual user recovery.
Made-with: Cursor
v2.4.3 attached every stack container to the `coolify` network so
Traefik could reach the public container. But that network also hosts
coolify-db (alias `postgres`) and coolify-redis (alias `redis`).
Docker's embedded DNS resolves unqualified hostnames to the first
container with that name on the network, so once Twenty's
`postgres-<uuid>` joined the coolify network, Twenty's connection
string `postgres://postgres:5432/...` started resolving to coolify-db
and auth-failing in a tight restart loop.
Coolify's own pipeline only attaches the proxied container — filter
by the `traefik.enable=true` label so internal stack members (db,
redis, worker) stay isolated on the project network.
Made-with: Cursor
The Twenty (and any service-template) stack was reachable on its private
project network but invisible to coolify-proxy/Traefik because no
container was joined to the `coolify` network. Public URLs like
crm.mark.vibnai.com returned 503 "no available server" even though the
underlying app was healthy.
Coolify's UI deploy attaches the proxy network as a post-step after the
full stack is up. When a sidecar (e.g. Twenty's worker, which waits ~3
min on twenty's healthcheck) fails its depends_on gate, that post-step
can be skipped and the stack is left isolated.
composeUp now calls attachToCoolifyProxyNetwork() after compose
finishes (best-effort, idempotent), and ensureServiceUp does the same
on the Coolify-queue happy path. Single apps.create call should now
result in a publicly reachable app.
Made-with: Cursor
Coolify's `compose up -d` returns non-zero whenever any sidecar container
hits a `depends_on: condition: service_healthy` timeout. For slow-booting
apps like Twenty (where the worker waits ~3 min for twenty's healthcheck),
this caused apps.create to return started=false even when the primary
stack was running fine.
Now ensureServiceUp probes the host with `docker ps` after a non-zero
compose exit and returns started=true whenever any container is running,
surfacing the compose stderr in startDiag so agents can decide whether
to retry apps.containers.up later.
Made-with: Cursor
Coolify's POST /services/{uuid}/start writes the rendered compose
files but its Laravel queue worker routinely fails to actually
invoke `docker compose up -d`. Until now agents had to SSH to
recover. For an MVP that promises "tell vibn what app you want,
get a URL", that's unacceptable.
- lib/coolify-compose.ts: composeUp/composeDown/composePs over SSH
via a one-shot docker:cli container that bind-mounts the rendered
compose dir (works around vibn-logs being in docker group but not
having read access to /data/coolify/services).
- apps.create (template + composeRaw pathways) now uses
ensureServiceUp which probes whether Coolify's queue actually
spawned containers and falls back to direct docker compose up -d
if not. Result includes startMethod for visibility.
- apps.containers.up / apps.containers.ps exposed as MCP tools for
recovery scenarios and post-env-change recreations.
- Tenant safety: resolveAppOrService validates uuid against the
caller's project before touching anything on the host.
Made-with: Cursor
Adds Coolify one-click template support — 320+ vetted apps deployable
in one MCP call (Twenty, n8n, Supabase, Ghost, etc).
- apps.create gains a 4th pathway: { template: "<slug>", ... }. Auto-
rewrites the Coolify-assigned sslip URL to the workspace FQDN and
applies user envs before starting.
- apps.templates.list / apps.templates.search expose the catalog so
agents can discover slugs. Catalog is fetched from upstream GitHub
and cached in-memory for 1h.
- lib/coolify.ts: + setServiceDomains, updateService, listService-
Templates, searchServiceTemplates. Reuses existing createService.
- next.config.ts: externalize ssh2 + cpu-features from turbopack so
`next build` can complete (native .node binaries can't be ESM-bundled).
Made-with: Cursor
Coolify's /applications/dockercompose creates a Service (not Application)
with its own API surface. Wire it up correctly:
lib/coolify.ts
- createDockerComposeApp returns { uuid, resourceType: 'service' }
- Add startService, stopService, getService, listAllServices helpers
- Add listServiceEnvs, upsertServiceEnv, bulkUpsertServiceEnvs for
the /services/{uuid}/envs endpoint
app/api/mcp/route.ts
- toolAppsList: includes Services (compose stacks) alongside Applications
- toolAppsDeploy: falls back to /services/{uuid}/start for service UUIDs
- toolAppsCreate composeRaw path: uses upsertServiceEnv + startService
instead of Application deploy; notes that domain routing must be
configured post-startup via SERVER_URL env
Made-with: Cursor
Third-party apps (Twenty, Directus, Cal.com, Plane…) should never need
a Gitea repo. This adds two new apps.create pathways:
image: "twentyhq/twenty:1.23.0" → Coolify /applications/dockerimage
composeRaw: "services:\n..." → Coolify /applications/dockercompose
No repo is created, no git clone, no PAT embedding. Agents can fetch
the official docker-compose.yml and pass it inline, or just give an
image name. Pathway 1 (repo) is unchanged.
Also adds volume management tools so agents can self-recover from the
most common compose failure (stale DB volume blocking fresh migrations):
apps.volumes.list { uuid } → list volumes + sizes
apps.volumes.wipe { uuid, volume, confirm } → stop containers,
rm volume, done
Both volume tools go through the same vibn-logs SSH channel. The wipe
tool requires confirm == volume name to prevent accidents and verifies
the volume belongs to the target app (uuid in name).
lib/coolify.ts: createDockerImageApp + createDockerComposeApp helpers,
dockerimage added to CoolifyBuildPack union.
app/api/mcp/route.ts: resolveFqdn + applyEnvsAndDeploy extracted as
shared helpers; toolAppsCreate now dispatches on image/composeRaw/repo.
toolAppsVolumesList + toolAppsVolumesWipe added.
sq() moved to module scope (shared by exec + volumes tools).
Version bumped to 2.3.0.
Made-with: Cursor
Companion to apps.logs. SSH to the Coolify host as vibn-logs, resolve
the target container by app uuid + service, and run the caller's
command through `docker exec ... sh -lc`. No TTY, no stdin — this is
the write-path sibling of apps.logs, purpose-built for migrations,
seeds, CLI invocations, and ad-hoc debugging.
- lib/coolify-containers.ts extracts container enumeration + service
resolution into a shared helper used by both logs and exec.
- lib/coolify-exec.ts wraps docker exec with timeout (60s default,
10-min cap), output byte cap (1 MB default, 5 MB cap), optional
--user / --workdir, and structured audit logging of the command +
target (never the output).
- app/api/mcp/route.ts wires `apps.exec` into the dispatcher and
advertises it in the capabilities manifest.
- app/api/workspaces/[slug]/apps/[uuid]/exec/route.ts exposes the same
tool over REST for session-cookie callers.
Tenant safety: every entrypoint runs getApplicationInProject before
touching SSH, so an agent can only exec in apps belonging to their
workspace.
Made-with: Cursor
Two live-test bugs surfaced while deploying Twenty CRM:
1. apps.domains.set silently 422'd on compose apps
Coolify hard-rejects top-level `domains` for dockercompose build
packs — they must use `docker_compose_domains` (per-service JSON).
setApplicationDomains now detects build_pack (fetched via GET if
not passed) and dispatches correctly. Default service is `server`
(matches Twenty, Plane, Cal.com); override with `service` param.
2. apps.update silently dropped unrecognised fields
Caller got `{ok:true}` even when zero fields persisted. This
created false-positive "bug reports" (e.g. the user-reported
"fqdn returns ok but doesn't persist" — fqdn was never forwarded
at all). apps.update now returns:
- applied: fields that were forwarded to Coolify
- ignored: unknown fields (agent typos, stale field names)
- rerouted: fields that belong to a different tool
(fqdn/domains → apps.domains.set,
git_repository → apps.rewire_git)
400 when nothing applied, 200 with diagnostics otherwise.
Made-with: Cursor
The per-workspace GCS backend (bucket, service account, HMAC keys) was
already provisioned for P5.3 but wasn't reachable through MCP, so
agents using vibn_sk_* tokens couldn't actually use object storage.
Three new tools:
- storage.describe → bucket, region, endpoint, access_key_id.
No secret in response.
- storage.provision → idempotent ensureWorkspaceGcsProvisioned().
- storage.inject_env → writes STORAGE_* (or user-chosen prefix) env
vars into a Coolify app. SECRET_ACCESS_KEY is
tagged is_shown_once so Coolify masks it in
the UI, and it never leaves our backend — the
agent kicks off injection, but the HMAC secret
is read from our DB and pushed directly to
Coolify.
Apps can then hit the bucket with any S3 SDK (aws-sdk, boto3, etc.)
using force_path_style=true and the standard endpoint.
Made-with: Cursor
Coolify was failing all Gitea clones with "Permission denied (publickey)"
because the helper container's SSH hits git.vibnai.com:22 (Ubuntu host
sshd, which doesn't know Gitea keys), while Gitea's builtin SSH is on
host port 22222 (not publicly reachable).
Rather than fight the SSH topology, switch every Vibn-provisioned app
to clone over HTTPS with the workspace bot's PAT embedded in the URL.
The PAT is already stored encrypted per workspace and scoped to that
org, so this gives equivalent isolation with zero SSH dependency.
Changes:
- lib/naming.ts: add giteaHttpsUrl() + redactGiteaHttpsUrl(); mark
giteaSshUrl() as deprecated-for-deploys with a comment.
- lib/coolify.ts: extend CreatePublicAppOpts with install/build/start
commands, base_directory, dockerfile_location, docker_compose_location,
manual_webhook_secret_gitea so it's at parity with the SSH variant.
- app/api/mcp/route.ts:
- apps.create now uses createPublicApp(giteaHttpsUrl(...)) and pulls
the bot PAT via getWorkspaceBotCredentials(). No more private-
deploy-key path for new apps.
- apps.update adds git_commit_sha + docker_compose_location to the
whitelist.
- New apps.rewire_git tool: re-points an app's git_repository at the
canonical HTTPS+PAT URL. Unblocks older apps stuck on SSH URLs
and provides a path for PAT rotation without rebuilding the app.
- lib/gitea.ts: createUser() now issues an immediate PATCH to set
active: true. Gitea's admin-create endpoint creates users as inactive
by default, and inactive users fail permission checks even though
they're org members. GiteaUser gains optional `active` field.
- scripts/activate-workspace-bots.ts: idempotent backfill that flips
active=true for any existing workspace bot that was created before
this fix. Safe to re-run.
- AI_CAPABILITIES.md: document apps.rewire_git; clarify apps.create
uses HTTPS+PAT (no SSH).
Already unblocked in prod for the mark workspace:
- vibn-bot-mark activated.
- twenty-crm's git_repository PATCHed to HTTPS+PAT form; git clone
now succeeds (remaining unrelated error: docker-compose file path).
Made-with: Cursor
Coolify v4's POST/PATCH /applications/{uuid}/envs only accepts key,
value, is_preview, is_literal, is_multiline, is_shown_once. Sending
is_build_time triggers a 422 "This field is not allowed." — it's now
a derived read-only flag (is_buildtime) computed from Dockerfile ARG
usage. Breaks agents trying to upsert env vars.
Three-layer fix so this can't regress:
- lib/coolify.ts: COOLIFY_ENV_WRITE_FIELDS whitelist enforced at the
network boundary, regardless of caller shape
- app/api/workspaces/[slug]/apps/[uuid]/envs: stops forwarding the
field; returns a deprecation warning when callers send it; GET
reads both is_buildtime and is_build_time for version parity
- app/api/mcp/route.ts: same treatment in the MCP dispatcher;
AI_CAPABILITIES.md doc corrected
Also bundles (not related to the above):
- Workspace API keys are now revealable from settings. New
key_encrypted column stores AES-256-GCM(VIBN_SECRETS_KEY, token).
POST /api/workspaces/[slug]/keys/[keyId]/reveal returns plaintext
for session principals only; API-key principals cannot reveal
siblings. Legacy keys stay valid for auth but can't reveal.
- P5.3 Object storage: lib/gcp/storage.ts + lib/workspace-gcs.ts
idempotently provision a per-workspace GCS bucket, service
account, IAM binding and HMAC key. New POST /api/workspaces/
[slug]/storage/buckets endpoint. Migration script + smoke test
included. Proven end-to-end against prod master-ai-484822.
Made-with: Cursor
Adds end-to-end custom apex domain support: workspace-scoped
registration via OpenSRS (Tucows), authoritative DNS via Google
Cloud DNS, and one-call attach that wires registrar nameservers,
DNS records, and Coolify app routing in a single transactional
flow.
Schema (additive, idempotent — run /api/admin/migrate after deploy)
- vibn_workspaces.dns_provider TEXT DEFAULT 'cloud_dns'
Per-workspace DNS backend choice. Future: 'cira_dzone' for
strict CA-only residency on .ca.
- vibn_domains
One row per registered/intended apex. Tracks status
(pending|active|failed|expired), registrar order id, encrypted
registrar manage-user creds (AES-256-GCM, VIBN_SECRETS_KEY),
period, dates, dns_provider/zone_id/nameservers, and a
created_by audit field.
- vibn_domain_events
Append-only lifecycle audit (register.attempt/success/fail,
attach.success, ns.update, lock.toggle, etc).
- vibn_billing_ledger
Workspace-scoped money ledger (CAD by default) with
ref_type/ref_id back to the originating row.
OpenSRS XML client (lib/opensrs.ts)
- Mode-gated host/key (OPENSRS_MODE=test → horizon sandbox,
rejectUnauthorized:false; live → rr-n1-tor, strict TLS).
- MD5 double-hash signature.
- Pure Node https module (no undici dep).
- Verbs: lookupDomain, getDomainPrice, checkDomain, registerDomain,
updateDomainNameservers, setDomainLock, getResellerBalance.
- TLD policy: minPeriodFor() bumps .ai to 2y; CPR/legalType
plumbed through for .ca; registrations default to UNLOCKED so
immediate NS updates succeed without a lock toggle.
DNS provider abstraction (lib/dns/{provider,cloud-dns}.ts)
- DnsProvider interface (createZone/getZone/setRecords/deleteZone)
so the workspace residency knob can swap backends later.
- cloudDnsProvider implementation against Google Cloud DNS using
the existing vibn-workspace-provisioner SA (roles/dns.admin).
- Idempotent zone creation, additions+deletions diff for rrsets.
Shared GCP auth (lib/gcp-auth.ts)
- Single getGcpAccessToken() helper used by Cloud DNS today and
future GCP integrations. Prefers GOOGLE_SERVICE_ACCOUNT_KEY_B64,
falls back to ADC.
Workspace-scoped helpers (lib/domains.ts)
- listDomainsForWorkspace, getDomainForWorkspace, createDomainIntent,
markDomainRegistered, markDomainFailed, markDomainAttached,
recordDomainEvent, recordLedgerEntry.
Attach orchestrator (lib/domain-attach.ts)
Single function attachDomain() reused by REST + MCP. For one
apex it:
1. Resolves target → Coolify app uuid OR raw IP OR CNAME.
2. Ensures Cloud DNS managed zone exists.
3. Writes A / CNAME records (apex + requested subdomains).
4. Updates registrar nameservers, with auto unlock-retry-relock
fallback for TLDs that reject NS changes while locked.
5. PATCHes the Coolify application's domain list so Traefik
routes the new hostname.
6. Persists dns_provider/zone_id/nameservers and emits an
attach.success domain_event.
AttachError carries a stable .tag + http status so the caller
can map registrar/dns/coolify failures cleanly.
REST endpoints
- POST /api/workspaces/[slug]/domains/search
- GET /api/workspaces/[slug]/domains
- POST /api/workspaces/[slug]/domains
- GET /api/workspaces/[slug]/domains/[domain]
- POST /api/workspaces/[slug]/domains/[domain]/attach
All routes go through requireWorkspacePrincipal (session OR
Authorization: Bearer vibn_sk_...). Register is idempotent:
re-issuing for an existing intent re-attempts at OpenSRS without
duplicating the row or charging twice.
MCP bridge (app/api/mcp/route.ts → version 2.2.0)
Adds five tools backed by the same library code:
- domains.search (batch availability + pricing)
- domains.list (workspace-owned)
- domains.get (single + recent events)
- domains.register (idempotent OpenSRS register)
- domains.attach (full Cloud DNS + registrar + Coolify)
Sandbox smoke tests (scripts/smoke-opensrs-*.ts)
Standalone Node scripts validating each new opensrs.ts call against
horizon.opensrs.net: balance + lookup + check, TLD policy
(.ca/.ai/.io/.com), full register flow, NS update with systemdns
nameservers, and the lock/unlock toggle that backs the attach
fallback path.
Post-deploy checklist
1. POST https://vibnai.com/api/admin/migrate
-H "x-admin-secret: $ADMIN_MIGRATE_SECRET"
2. Set OPENSRS_* env vars on the vibn-frontend Coolify app
(RESELLER_USERNAME, API_KEY_LIVE, API_KEY_TEST, HOST_LIVE,
HOST_TEST, PORT, MODE). Without them, only domains.list/get
work; search/register/attach return 500.
3. GCP_PROJECT_ID is read from env or defaults to master-ai-484822.
4. Live attach end-to-end against a real apex is queued as a
follow-up — sandbox path is fully proven.
Not in this commit (deliberate)
- The 100+ unrelated in-flight files (mvp-setup wizard, justine
homepage rework, BuildLivePlanPanel, etc) — kept local to keep
blast radius minimal.
Made-with: Cursor
Workspace-owned deploy infra so AI agents can create and destroy
Coolify resources without ever touching the root admin token.
vibn_workspaces
+ coolify_server_uuid, coolify_destination_uuid
+ coolify_environment_name (default "production")
+ coolify_private_key_uuid, gitea_bot_ssh_key_id
ensureWorkspaceProvisioned
+ generates an ed25519 keypair per workspace
+ pushes pubkey to the Gitea bot user (read/write scoped by team)
+ registers privkey in Coolify as a reusable deploy key
New endpoints under /api/workspaces/[slug]/
apps/ POST (private-deploy-key from Gitea repo)
apps/[uuid] PATCH, DELETE?confirm=<name>
apps/[uuid]/domains GET, PATCH (policy: *.{ws}.vibnai.com only)
databases/ GET, POST (8 types incl. postgres, clickhouse, dragonfly)
databases/[uuid] GET, PATCH, DELETE?confirm=<name>
auth/ GET, POST (Pocketbase, Authentik, Keycloak, Pocket-ID, Logto, Supertokens)
auth/[uuid] DELETE?confirm=<name>
MCP (/api/mcp) gains 15 new tools that mirror the REST surface and
enforce the same workspace tenancy + delete-confirm guard.
Safety: destructive ops require ?confirm=<exact-resource-name>; volumes
are kept by default (pass delete_volumes=true to drop).
Made-with: Cursor
Ship Phases 1–3 of the multi-tenant AI access plan so an AI agent can
act on a Vibn workspace with one bearer token and zero admin reach.
Phase 1 — Gitea bot per workspace
- Add gitea_bot_username / gitea_bot_user_id / gitea_bot_token_encrypted
columns to vibn_workspaces (migrate route).
- New lib/auth/secret-box.ts (AES-256-GCM, VIBN_SECRETS_KEY) for PAT at rest.
- Extend lib/gitea.ts with createUser, createAccessTokenFor (Sudo PAT),
createOrgTeam, addOrgTeamMember, ensureOrgTeamMembership.
- ensureWorkspaceProvisioned now mints a vibn-bot-<slug> user, adds it to
a Writers team (write perms only) on the workspace's org, and stores
its PAT encrypted.
- GET /api/workspaces/[slug]/gitea-credentials returns a workspace-scoped
bot PAT + clone URL template; session or vibn_sk_ bearer auth.
Phase 2 — Tenant-safe Coolify proxy + real MCP
- lib/coolify.ts: projectUuidOf, listApplicationsInProject,
getApplicationInProject, TenantError, env CRUD, deployments list.
- Workspace-scoped REST endpoints (all filtered by coolify_project_uuid):
GET/POST /api/workspaces/[slug]/apps/[uuid](/deploy|/envs|/deployments),
GET /api/workspaces/[slug]/deployments/[deploymentUuid]/logs.
- Full rewrite of /api/mcp off legacy Firebase onto Postgres vibn_sk_
keys, exposing workspace.describe, gitea.credentials, projects.*,
apps.* (list/get/deploy/deployments, envs.list/upsert/delete).
Phase 3 — Settings UI AI bundle
- GET /api/workspaces/[slug]/bootstrap.sh: curl|sh installer that writes
.cursor/rules, .cursor/mcp.json and appends VIBN_* to .env.local.
Embeds the caller's vibn_sk_ token when invoked with bearer auth.
- WorkspaceKeysPanel: single AiAccessBundleCard with system-prompt block,
one-line bootstrap, Reveal-bot-PAT button, collapsible manual-setup
fallback. Minted-key modal also shows the bootstrap one-liner.
Ops prerequisites:
- Set VIBN_SECRETS_KEY (>=16 chars) on the frontend.
- Run /api/admin/migrate to add the three bot columns.
- GITEA_API_TOKEN must be a site-admin token (needed for admin/users
+ Sudo PAT mint); otherwise provision_status lands on 'partial'.
Made-with: Cursor