- AI_PATH_B_EXECUTION_PLAN.md: add 3 safety nets (auto-push, kill switch, hard tool removal), tighten 4 risks (network policy week 1, HMR spike day 1, lean image + lazy mise, random preview suffix). - AI_CAPABILITIES_ROADMAP.md: pointer note already in place. - vibn-dev/Dockerfile + supervisord.conf + mise.default.toml + README: scaffold for the per-project dev container image. Ubuntu 24.04 + git + ripgrep + python3 + mise. Toolchains lazy-install on first `mise install`. Container runs as uid 1000 vibn (sudo available). Frontend wiring lives in vibn-frontend (separate commit). Made-with: Cursor
19 KiB
Path B Execution Plan — Persistent Dev Container Architecture
The plan to replace Vibn's current "API-wrap-every-Coolify-action" agent surface with a Claude-Code-style architecture: one persistent dev container per Vibn project, ~10 composable tools, sub-15-second iteration, and Coolify only touched at "ship it" time.
Companion to:
AI_CAPABILITIES.md(current state) andAI_CAPABILITIES_ROADMAP.md(everything else).Status: proposed. Not started. Decision document.
Why this exists: today's AI loop is 3–7 min to first preview, 2–4 min per iteration, because every change goes through a Coolify nixpacks build. That UX cannot host the marketplace / SaaS / iterative-build stories Vibn is selling. Path B fixes the floor.
1. The user experience this unlocks
Reference scenario: a non-technical founder chats "build me a two-sided marketplace for handmade ceramics."
| Phase | Path A (today) | Path B (target) |
|---|---|---|
| Discovery & OSS pick | OK | OK |
| Fork an OSS base (e.g. Sharetribe, 800 files) | ~15 min of single-file commits, 800 webhook fires | git clone in 8s |
| First live preview | 3–7 min (Coolify build) | ~30s (Vite HMR in dev container) |
| Each iteration | 2–4 min (rebuild) | 3–15s (HMR / process restart) |
| User makes 10 small decisions | ~40 min of staring at spinners | ~3 min of conversation |
| "Ship it" → real domain | already 3 min | 3 min (unchanged — this is the only Coolify build) |
| Total time to live, polished marketplace | 30–60 min, often abandoned | ~20 min, mostly the user thinking |
The asymmetry is structural, not optimisable inside Path A.
2. Architecture overview
┌──────────────────────────┐ ┌────────────────────────────────┐
│ vibnai.com chat (user) │ ←→ │ /api/mcp │
└──────────────────────────┘ │ ├ shell.exec │
│ ├ fs.read / fs.edit / fs.glob │
│ ├ dev_server.start │
│ ├ ship │
│ └ apps.* / databases.* / ... │
└────────────┬───────────────────┘
│
▼ (workspace-scoped)
┌────────────────────────────────────┐
│ Per-Vibn-project Coolify project │
│ ├ vibn-dev ← dev container │
│ ├ web ← prod app │
│ ├ db │
│ └ ... │
└────────────────────────────────────┘
Per-project dev container — the only new piece
For every active Vibn project, we run one long-lived Coolify
service named vibn-dev inside that project's dedicated Coolify
project (Stage 2/3 of per-project isolation already shipped).
| Property | Value |
|---|---|
| Image | ghcr.io/vibnai/vibn-dev:latest (we build & maintain) |
| Base | Ubuntu 24.04 |
| Pre-installed | Node 20, bun, pnpm, Python 3.12 + uv, Go 1.23, Rust, git, gh, tea (Gitea CLI), ripgrep, fd, jq, curl, tar, openvscode-server |
Default cwd |
/workspace (persistent volume containing the Gitea working tree) |
| Persistent volumes | /workspace (git tree), /cache/{npm,pip,go,cargo} (package caches) |
| Resource floor | 512 MB / 0.25 CPU when idle |
| Resource ceiling | 4 GB / 2 CPU during builds (configurable per workspace plan) |
| Idle suspend | After 30 min no shell.exec activity |
| Re-wake | Any shell.exec / fs.* / dev_server.* call |
| Ports | 3000–9999 reserved for the AI's dev server, exposed at https://preview-{ws}-{project}.vibnai.com via Traefik wildcard |
| Tenancy | Inherits per-project Coolify isolation — workspace can never reach into another's dev container |
Why this shape (and not e2b / Cloud Run / VM-per-task)
- We already have Coolify, per-project Coolify projects, and Coolify exec primitives. Adding one service per project is zero new infra.
- Persistence (workspace state, package cache, git working tree) matters more than per-task isolation for our user. Founders return to projects across sessions.
- Tenant safety is already solved at the Coolify-project layer.
- Cost stays bounded: one container per active project, idle-suspended.
- Upgrade path to e2b / Firecracker exists later if needed (replace the executor, keep the tool surface).
3. Tool surface
New tools (the AI's primary working set)
| Tool | Signature | Purpose |
|---|---|---|
shell.exec |
{ cmd, cwd?, timeoutSec?, env? } |
Run any shell command in the dev container. Streams stdout/stderr back. Capped 15 min. |
fs.read |
{ path, ref? } |
Read a file (or directory listing) from /workspace. |
fs.write |
{ path, content } |
Create/overwrite a file. |
fs.edit |
{ path, oldString, newString, replaceAll? } |
Aider-style search/replace. Fails if oldString not found / not unique. |
fs.glob |
{ pattern, cwd? } |
List files matching a pattern (e.g. **/*.tsx). |
fs.grep |
{ pattern, glob?, contextLines? } |
ripgrep-backed code search. |
fs.delete |
{ path } |
Delete a file or directory. |
dev_server.start |
{ cmd, port, name? } |
Start a long-running process (e.g. npm run dev). Returns a public preview URL. |
dev_server.stop |
{ id } |
Kill a dev server. |
dev_server.list |
— | What's running, on what URL. |
ship |
{ projectId, commitMsg, deploy? } |
git add . && git commit && git push to Gitea, then trigger Coolify deploy of the prod app. The "graduate to production" tool. |
Kept (orchestration — these are correctly modeled as APIs)
apps.*— Coolify app CRUD, logs, domains, env vars, etc.databases.*,auth.*,domains.*,storage.*— infrastructure primitives.projects_get,projects_list,workspace_describe— context.github_search,github_file,http_fetch— external lookup.
Deprecated (kept for back-compat, banner in docs)
gitea_file_read,gitea_file_write,gitea_file_delete,gitea_branches_list,gitea_branch_create,gitea_repo_create,gitea_repo_get,gitea_repos_list— the AI usesshell.exec(git/teaCLI) andfs.*instead.apps.exec— kept (it's still useful for prod-container debugging), but deprecated for dev-time code work.
Net change: 53 tools → ~30 tools, but the new ones compose to do everything the old ones did and more.
4. The system prompt rewrite
The AI's prompt today says "call gitea_file_write to push code." It becomes:
You have a real Linux dev environment for this project at
/workspace. Useshell.execto run any command (npm, git, tea, python, anything). Usefs.editfor surgical changes,fs.writefor new files.Standard loop:
shell.exec { cmd: "git status" }to see what's there.- Edit / create files via
fs.edit/fs.write.shell.exec { cmd: "npm test" }(or relevant test runner).dev_server.startto give the user a live preview URL.- When the user says "ship it", call
ship— that pushes and triggers the production Coolify deploy.NEVER call
apps_createto deploy code that hasn't been tested viashell.execfirst. The dev container is your safety net.
5. Week-by-week execution
Week 1 — Foundations (dev container + shell)
Goal: AI can clone a repo, install deps, run a script.
- Build
ghcr.io/vibnai/vibn-dev:latestDocker image (Ubuntu 24.04 + toolchains). Push to a registry the Coolify host can pull from. - Add
lib/dev-container.ts: helpers to mint, locate, ensure-running, suspend, resume the per-projectvibn-devCoolify service. - Add MCP tool
dev_container.ensure { projectId }— internal/auto, spins up the container if not present. Returns its UUID + status. - Add MCP tools:
shell.exec,fs.read,fs.write,fs.list,fs.delete,fs.glob,fs.grep. All proxy through Coolify's exec API to the dev container. - Smoke test (
scripts/smoke-path-b.ts): boots a dev container, clones a Gitea repo, runsnpm init -y && npm install lodash, readspackage.json, succeeds. - Update
vibn-tools.tsand ship a chat UI that streamsshell.execstdout to the user as it happens (existing terminal-style component if we have one, or a new one).
Exit criteria: an internal user can chat "clone the express hello-world repo and run it" and see the output stream live.
Week 2 — Preview URLs + iteration
Goal: AI starts a dev server, user clicks a preview URL, sees their app.
- Traefik wildcard rule on the Coolify host:
*.preview.vibnai.com→ terminates TLS, forwards to dev container based on subdomain (preview-{ws}-{project}.vibnai.com→vibn-devof that project's Coolify project). - Add MCP tools:
dev_server.start,dev_server.stop,dev_server.list. Implementation: starts the process inside the dev container under a small supervisor (e.g.tini/supervisord), tracks PID/port, registers a Traefik label on the dev container. - Add
fs.edit(Aider-format search/replace, with explicit error whenoldStringnot found / ambiguous). - Per-workspace plan-tier resource caps on the dev container (free tier: 1 GB / 0.5 CPU; paid: 4 GB / 2 CPU).
- System prompt rewrite (see §4). Update the AI's deploy recipes to start with
shell.execanddev_server.startrather thanapps_create.
Exit criteria: the marketplace scenario from §1 works end-to-end up through "user makes 5 styling changes in 3 minutes."
Week 3 — Ship-it path + cleanup
Goal: the dev container's working tree graduates to production.
- Add
shiptool: runsgit add . && git commit -m {msg} && git pushinside the container, then either callsapps_deploy { uuid }(if a prod app exists) orapps_create { projectId, repo }(first ship). - Auto-link the prod app to
fs_project_resourcesso the existing project-isolation accounting stays consistent. - Idle-suspend logic: cron job every 5 min checks last
shell.exectimestamp per dev container; suspends after 30 min idle. Auto-resume on next call. - Deprecation pass: mark
gitea_file_*tools as deprecated invibn-tools.ts(keep working, add a banner in their description). - Update
AI_CAPABILITIES.mdto reflect the new architecture and tool surface.
Exit criteria: end-to-end clean run: user prompt → AI scaffolds in dev container → user iterates → user says "ship" → app live on real domain. Total time logged.
Week 4 — Eval, polish, IDE drop-in
Goal: measure that this actually delivers the promised UX, ship the optional graduation path.
- Eval harness: 10 reference prompts (TODO app, marketplace, blog with auth, kanban, image-uploader, AI chatbot, simple e-commerce, dashboard, REST API + DB, static site). Measure: time-to-first-preview, time-to-shipped, AI tool-call count, success rate. Compare to a baseline run on Path A.
- Theia drop-in: expose openvscode-server (already in the image) at
https://ide-{ws}-{project}.vibnai.com. Optional toggle in chat UI: "Open IDE." Lets a user-becoming-developer drop into the same/workspacethe AI's been editing. - Bug fixes found during eval.
- Docs: update Vibn's user-facing pages to reflect the new "describe → live preview in seconds → iterate → ship" flow.
Exit criteria: eval shows ≥3× speedup on time-to-first-preview vs. Path A, ≥80% success rate on the 10 reference prompts.
6. OSS we will lean on (not reinvent)
| Need | OSS choice | Notes |
|---|---|---|
| Dev container image base | Ubuntu 24.04 + toolchains | We bake & maintain. ~1 GB. |
| In-browser IDE (week 4 graduation path) | openvscode-server (gitpod-io/openvscode-server, MIT) |
Pre-installed in the image. Optional toggle. |
| Edit format | Aider's search/replace block format (Aider-AI/aider, Apache 2.0) |
Borrow the format + error semantics. |
| Process supervision inside the container | tini (already standard) + a tiny in-house supervisor for dev_server.* |
No need for full systemd. |
| Code search inside the container | ripgrep (BurntSushi/ripgrep, MIT) |
Pre-installed. fs.grep is a thin wrapper. |
| Git inside the container | git + tea (Gitea CLI, MIT) |
tea lets the AI do PR ops without us building gitea_pr_* tools. |
| Reference for end-to-end agent loops | All-Hands-AI/OpenHands (MIT) |
Read their runtime + tool design. Don't import their code. |
| Reference for fast iteration UX | bolt.new (stackblitz/bolt.new) |
UX north star, not a code source. |
7. Risks & open questions
| Risk | Mitigation |
|---|---|
| Dev containers eat money. 100 active projects × 24/7 = ~$50/mo wasted. | Idle-suspend after 30 min. Resume in <5s. Per-plan caps. Auto-delete suspended-and-untouched volumes after 30 days. |
shell.exec is the universal escape hatch — security? AI inside a single workspace's container can do anything that container can do. |
(a) Per-project Coolify isolation. (b) Network policy: dev containers have NO route to internal Vibn services (vibn-postgres, vibn-frontend, Coolify control plane). Implemented via Docker network rules in week 1, not deferred. (c) Audit log on every shell.exec call. (d) Per-container CPU/RAM caps absorb fork-bomb / coin-mining attempts. |
Preview URL leaks. https://preview-mark-ceramic-market.vibnai.com is publicly resolvable. |
Default: random suffix in subdomain (preview-mark-ceramic-market-7a3f.vibnai.com) — ~64 bits of unguessability. Optional Vibn-session-cookie auth as paid-tier feature later. |
| Hot reload through Traefik. WebSocket / HMR can be finicky over a reverse proxy. | Spike on week 1, day 1: bring up a Vite dev server inside vibn-dev, expose via Traefik, edit a file, verify HMR fires. Failure here is the biggest "things look fine until you actually test" risk; de-risk early. |
| Image size / pull time on first project. ~1 GB pull adds 30–60s to first dev container spin-up. | (a) Pre-pull image on every Coolify host on deploy. (b) Keep base image small (~500 MB: OS + git + ripgrep + supervisord + IDE server). Lazy-install language toolchains via mise on first project use. Prevents the image from bloating to 4 GB six months from now. |
Dependency cache poisoning. Cached node_modules from project A bleeds into project B. |
Caches are per-project (volume vibn-dev-cache-{projectId}). Never share. Take the slower-first-install hit; add a Verdaccio mirror later only if it bothers anyone. |
AI keeps calling gitea_file_* instead of shell.exec. |
Hard removal from AI's tool list in week 3, not soft deprecation. Keep REST endpoints alive for a 30-day grace period for any external MCP client. After 30 days, return 410 Gone. The AI has no muscle memory; no graceful migration needed. |
| What if the user has no Vibn project yet? | First chat creates a project + provisions its Coolify project + spins up vibn-dev lazily. ~10s overhead, one-time. Stream progress to the chat ("creating workspace... installing tools..."). Same UX bolt.new uses while WebContainers boot. |
Coolify host disk dies → users lose unshipped /workspace work. |
Auto-push to Gitea vibn-autosave/main branch every 5 min of activity, plus before idle-suspend. Treat Gitea as canonical, container disk as ephemeral. Built in week 1, day 2 (not optional). |
| Path B turns out to be wrong; we need to revert. | Kill-switch admin endpoint (POST /api/admin/path-b/disable) flips a feature flag — all new chat sessions go back to Path A; existing dev containers drain. ~10-min revert window. Built week 1. |
8. Success metrics
We're not done until all four are true on the eval harness:
| Metric | Target | Today (Path A) |
|---|---|---|
| Time-to-first-preview (10 reference prompts, p50) | ≤ 60 s | ~5 min |
| Iteration loop (small edit → user sees change) p50 | ≤ 15 s | ~3 min |
| Tool calls per "build me X" task (median) | ≤ 30 | ~80 |
| End-to-end success rate (live deployable result) | ≥ 80% | ~50% |
9. What this changes about the existing roadmap
- Tier 1.5 ("Code authoring capability") is collapsed into this doc. C1–C9 mostly disappear (replaced by
shell.exec+fs.edit); C10 ("persistent agent dev workspace") is Path B. - Tier 1 P5.1–P5.4 are unchanged. Domains, email, storage, workers — still the right next infra primitives. Path B doesn't replace them; it makes the AI capable enough to actually use them.
- Tier 2 P6.x (backups, runtime logs, scoped keys) — unchanged.
gitea_*tools shipped 2026-04-28 are now legacy. Mark deprecated in week 3. Remove in a future cleanup once telemetry confirms zero usage.
10. Decision needed before week 1 starts
- Approve Path B as the primary architecture for code authoring. (If no, this doc dies here.)
- Approve the dev-container-as-Coolify-service implementation choice. Alternatives: separate dev-host, e2b self-host, Cloud Run jobs. Picked Coolify-service for zero new infra; flag if you want to revisit.
- Approve the deprecation of
gitea_file_*tools. They were shipped today; deprecating them within 3 weeks is fine if the path forward is clearer, embarrassing if we keep them around as half-working alternates. - Approve the resource cap defaults (free: 1 GB / 0.5 CPU, paid: 4 GB / 2 CPU). Or set different numbers.
Once those four are decided, week 1 starts.
How to use this doc
- This is the architectural execution plan. The detailed task list goes into the agent's TodoWrite per-week, not into this file.
- When an item ships, move it from "planned" to "shipped" in
AI_CAPABILITIES.mdand link the commit/PR. - When a risk in §7 turns out to be real, document the mitigation outcome inline so future readers see what actually happened.
- This doc supersedes the proposed Tier 1.5 in
AI_CAPABILITIES_ROADMAP.md. Add a one-line pointer there once approved.