Files

mawkone 2491363b5c docs(path-b): execution plan + vibn-dev image scaffold

- AI_PATH_B_EXECUTION_PLAN.md: add 3 safety nets (auto-push, kill
  switch, hard tool removal), tighten 4 risks (network policy week 1,
  HMR spike day 1, lean image + lazy mise, random preview suffix).
- AI_CAPABILITIES_ROADMAP.md: pointer note already in place.
- vibn-dev/Dockerfile + supervisord.conf + mise.default.toml + README:
  scaffold for the per-project dev container image. Ubuntu 24.04 +
  git + ripgrep + python3 + mise. Toolchains lazy-install on first
  `mise install`. Container runs as uid 1000 vibn (sudo available).

Frontend wiring lives in vibn-frontend (separate commit).

Made-with: Cursor

2026-04-28 12:53:31 -07:00

19 KiB

Raw Blame History

Path B Execution Plan — Persistent Dev Container Architecture

The plan to replace Vibn's current "API-wrap-every-Coolify-action" agent surface with a Claude-Code-style architecture: one persistent dev container per Vibn project, ~10 composable tools, sub-15-second iteration, and Coolify only touched at "ship it" time.

Companion to: AI_CAPABILITIES.md (current state) and AI_CAPABILITIES_ROADMAP.md (everything else).

Status: proposed. Not started. Decision document.

Why this exists: today's AI loop is 3–7 min to first preview, 2–4 min per iteration, because every change goes through a Coolify nixpacks build. That UX cannot host the marketplace / SaaS / iterative-build stories Vibn is selling. Path B fixes the floor.

1. The user experience this unlocks

Reference scenario: a non-technical founder chats "build me a two-sided marketplace for handmade ceramics."

Phase	Path A (today)	Path B (target)
Discovery & OSS pick	OK	OK
Fork an OSS base (e.g. Sharetribe, 800 files)	~15 min of single-file commits, 800 webhook fires	`git clone` in 8s
First live preview	3–7 min (Coolify build)	~30s (Vite HMR in dev container)
Each iteration	2–4 min (rebuild)	3–15s (HMR / process restart)
User makes 10 small decisions	~40 min of staring at spinners	~3 min of conversation
"Ship it" → real domain	already 3 min	3 min (unchanged — this is the only Coolify build)
Total time to live, polished marketplace	30–60 min, often abandoned	~20 min, mostly the user thinking

The asymmetry is structural, not optimisable inside Path A.

2. Architecture overview

┌──────────────────────────┐     ┌────────────────────────────────┐
│  vibnai.com chat (user)  │ ←→  │  /api/mcp                       │
└──────────────────────────┘     │   ├ shell.exec                  │
                                 │   ├ fs.read / fs.edit / fs.glob │
                                 │   ├ dev_server.start            │
                                 │   ├ ship                        │
                                 │   └ apps.* / databases.* / ...  │
                                 └────────────┬───────────────────┘
                                              │
                                              ▼ (workspace-scoped)
                          ┌────────────────────────────────────┐
                          │  Per-Vibn-project Coolify project  │
                          │   ├ vibn-dev   ← dev container     │
                          │   ├ web         ← prod app         │
                          │   ├ db                              │
                          │   └ ...                             │
                          └────────────────────────────────────┘

Per-project dev container — the only new piece

For every active Vibn project, we run one long-lived Coolify service named vibn-dev inside that project's dedicated Coolify project (Stage 2/3 of per-project isolation already shipped).

Property	Value
Image	`ghcr.io/vibnai/vibn-dev:latest` (we build & maintain)
Base	Ubuntu 24.04
Pre-installed	Node 20, bun, pnpm, Python 3.12 + uv, Go 1.23, Rust, git, gh, `tea` (Gitea CLI), ripgrep, fd, jq, curl, tar, openvscode-server
Default `cwd`	`/workspace` (persistent volume containing the Gitea working tree)
Persistent volumes	`/workspace` (git tree), `/cache/{npm,pip,go,cargo}` (package caches)
Resource floor	512 MB / 0.25 CPU when idle
Resource ceiling	4 GB / 2 CPU during builds (configurable per workspace plan)
Idle suspend	After 30 min no `shell.exec` activity
Re-wake	Any `shell.exec` / `fs.` / `dev_server.` call
Ports	3000–9999 reserved for the AI's dev server, exposed at `https://preview-{ws}-{project}.vibnai.com` via Traefik wildcard
Tenancy	Inherits per-project Coolify isolation — workspace can never reach into another's dev container

Why this shape (and not e2b / Cloud Run / VM-per-task)

We already have Coolify, per-project Coolify projects, and Coolify exec primitives. Adding one service per project is zero new infra.
Persistence (workspace state, package cache, git working tree) matters more than per-task isolation for our user. Founders return to projects across sessions.
Tenant safety is already solved at the Coolify-project layer.
Cost stays bounded: one container per active project, idle-suspended.
Upgrade path to e2b / Firecracker exists later if needed (replace the executor, keep the tool surface).

3. Tool surface

New tools (the AI's primary working set)

Tool	Signature	Purpose
`shell.exec`	`{ cmd, cwd?, timeoutSec?, env? }`	Run any shell command in the dev container. Streams stdout/stderr back. Capped 15 min.
`fs.read`	`{ path, ref? }`	Read a file (or directory listing) from `/workspace`.
`fs.write`	`{ path, content }`	Create/overwrite a file.
`fs.edit`	`{ path, oldString, newString, replaceAll? }`	Aider-style search/replace. Fails if `oldString` not found / not unique.
`fs.glob`	`{ pattern, cwd? }`	List files matching a pattern (e.g. `*/.tsx`).
`fs.grep`	`{ pattern, glob?, contextLines? }`	ripgrep-backed code search.
`fs.delete`	`{ path }`	Delete a file or directory.
`dev_server.start`	`{ cmd, port, name? }`	Start a long-running process (e.g. `npm run dev`). Returns a public preview URL.
`dev_server.stop`	`{ id }`	Kill a dev server.
`dev_server.list`	—	What's running, on what URL.
`ship`	`{ projectId, commitMsg, deploy? }`	`git add . && git commit && git push` to Gitea, then trigger Coolify deploy of the prod app. The "graduate to production" tool.

Kept (orchestration — these are correctly modeled as APIs)

apps.* — Coolify app CRUD, logs, domains, env vars, etc.
databases.*, auth.*, domains.*, storage.* — infrastructure primitives.
projects_get, projects_list, workspace_describe — context.
github_search, github_file, http_fetch — external lookup.

Deprecated (kept for back-compat, banner in docs)

gitea_file_read, gitea_file_write, gitea_file_delete, gitea_branches_list, gitea_branch_create, gitea_repo_create, gitea_repo_get, gitea_repos_list — the AI uses shell.exec (git/tea CLI) and fs.* instead.
apps.exec — kept (it's still useful for prod-container debugging), but deprecated for dev-time code work.

Net change: 53 tools → ~30 tools, but the new ones compose to do everything the old ones did and more.

4. The system prompt rewrite

The AI's prompt today says "call gitea_file_write to push code." It becomes:

You have a real Linux dev environment for this project at /workspace. Use shell.exec to run any command (npm, git, tea, python, anything). Use fs.edit for surgical changes, fs.write for new files.

Standard loop:

shell.exec { cmd: "git status" } to see what's there.

Edit / create files via fs.edit / fs.write.

shell.exec { cmd: "npm test" } (or relevant test runner).

dev_server.start to give the user a live preview URL.

When the user says "ship it", call ship — that pushes and triggers the production Coolify deploy.

NEVER call apps_create to deploy code that hasn't been tested via shell.exec first. The dev container is your safety net.

5. Week-by-week execution

Week 1 — Foundations (dev container + shell)

Goal: AI can clone a repo, install deps, run a script.

Build ghcr.io/vibnai/vibn-dev:latest Docker image (Ubuntu 24.04 + toolchains). Push to a registry the Coolify host can pull from.
Add lib/dev-container.ts: helpers to mint, locate, ensure-running, suspend, resume the per-project vibn-dev Coolify service.
Add MCP tool dev_container.ensure { projectId } — internal/auto, spins up the container if not present. Returns its UUID + status.
Add MCP tools: shell.exec, fs.read, fs.write, fs.list, fs.delete, fs.glob, fs.grep. All proxy through Coolify's exec API to the dev container.
Smoke test (scripts/smoke-path-b.ts): boots a dev container, clones a Gitea repo, runs npm init -y && npm install lodash, reads package.json, succeeds.
Update vibn-tools.ts and ship a chat UI that streams shell.exec stdout to the user as it happens (existing terminal-style component if we have one, or a new one).

Exit criteria: an internal user can chat "clone the express hello-world repo and run it" and see the output stream live.

Week 2 — Preview URLs + iteration

Goal: AI starts a dev server, user clicks a preview URL, sees their app.

Traefik wildcard rule on the Coolify host: *.preview.vibnai.com → terminates TLS, forwards to dev container based on subdomain (preview-{ws}-{project}.vibnai.com → vibn-dev of that project's Coolify project).
Add MCP tools: dev_server.start, dev_server.stop, dev_server.list. Implementation: starts the process inside the dev container under a small supervisor (e.g. tini / supervisord), tracks PID/port, registers a Traefik label on the dev container.
Add fs.edit (Aider-format search/replace, with explicit error when oldString not found / ambiguous).
Per-workspace plan-tier resource caps on the dev container (free tier: 1 GB / 0.5 CPU; paid: 4 GB / 2 CPU).
System prompt rewrite (see §4). Update the AI's deploy recipes to start with shell.exec and dev_server.start rather than apps_create.

Exit criteria: the marketplace scenario from §1 works end-to-end up through "user makes 5 styling changes in 3 minutes."

Week 3 — Ship-it path + cleanup

Goal: the dev container's working tree graduates to production.

Add ship tool: runs git add . && git commit -m {msg} && git push inside the container, then either calls apps_deploy { uuid } (if a prod app exists) or apps_create { projectId, repo } (first ship).
Auto-link the prod app to fs_project_resources so the existing project-isolation accounting stays consistent.
Idle-suspend logic: cron job every 5 min checks last shell.exec timestamp per dev container; suspends after 30 min idle. Auto-resume on next call.
Deprecation pass: mark gitea_file_* tools as deprecated in vibn-tools.ts (keep working, add a banner in their description).
Update AI_CAPABILITIES.md to reflect the new architecture and tool surface.

Exit criteria: end-to-end clean run: user prompt → AI scaffolds in dev container → user iterates → user says "ship" → app live on real domain. Total time logged.

Week 4 — Eval, polish, IDE drop-in

Goal: measure that this actually delivers the promised UX, ship the optional graduation path.

Eval harness: 10 reference prompts (TODO app, marketplace, blog with auth, kanban, image-uploader, AI chatbot, simple e-commerce, dashboard, REST API + DB, static site). Measure: time-to-first-preview, time-to-shipped, AI tool-call count, success rate. Compare to a baseline run on Path A.
Theia drop-in: expose openvscode-server (already in the image) at https://ide-{ws}-{project}.vibnai.com. Optional toggle in chat UI: "Open IDE." Lets a user-becoming-developer drop into the same /workspace the AI's been editing.
Bug fixes found during eval.
Docs: update Vibn's user-facing pages to reflect the new "describe → live preview in seconds → iterate → ship" flow.

Exit criteria: eval shows ≥3× speedup on time-to-first-preview vs. Path A, ≥80% success rate on the 10 reference prompts.

6. OSS we will lean on (not reinvent)

Need	OSS choice	Notes
Dev container image base	Ubuntu 24.04 + toolchains	We bake & maintain. ~1 GB.
In-browser IDE (week 4 graduation path)	`openvscode-server` (`gitpod-io/openvscode-server`, MIT)	Pre-installed in the image. Optional toggle.
Edit format	Aider's search/replace block format (`Aider-AI/aider`, Apache 2.0)	Borrow the format + error semantics.
Process supervision inside the container	`tini` (already standard) + a tiny in-house supervisor for `dev_server.*`	No need for full systemd.
Code search inside the container	`ripgrep` (`BurntSushi/ripgrep`, MIT)	Pre-installed. `fs.grep` is a thin wrapper.
Git inside the container	`git` + `tea` (Gitea CLI, MIT)	`tea` lets the AI do PR ops without us building gitea_pr_* tools.
Reference for end-to-end agent loops	`All-Hands-AI/OpenHands` (MIT)	Read their runtime + tool design. Don't import their code.
Reference for fast iteration UX	`bolt.new` (`stackblitz/bolt.new`)	UX north star, not a code source.

7. Risks & open questions

Risk	Mitigation
Dev containers eat money. 100 active projects × 24/7 = ~$50/mo wasted.	Idle-suspend after 30 min. Resume in <5s. Per-plan caps. Auto-delete suspended-and-untouched volumes after 30 days.
`shell.exec` is the universal escape hatch — security? AI inside a single workspace's container can do anything that container can do.	(a) Per-project Coolify isolation. (b) Network policy: dev containers have NO route to internal Vibn services (vibn-postgres, vibn-frontend, Coolify control plane). Implemented via Docker network rules in week 1, not deferred. (c) Audit log on every `shell.exec` call. (d) Per-container CPU/RAM caps absorb fork-bomb / coin-mining attempts.
Preview URL leaks. `https://preview-mark-ceramic-market.vibnai.com` is publicly resolvable.	Default: random suffix in subdomain (`preview-mark-ceramic-market-7a3f.vibnai.com`) — ~64 bits of unguessability. Optional Vibn-session-cookie auth as paid-tier feature later.
Hot reload through Traefik. WebSocket / HMR can be finicky over a reverse proxy.	Spike on week 1, day 1: bring up a Vite dev server inside vibn-dev, expose via Traefik, edit a file, verify HMR fires. Failure here is the biggest "things look fine until you actually test" risk; de-risk early.
Image size / pull time on first project. ~1 GB pull adds 30–60s to first dev container spin-up.	(a) Pre-pull image on every Coolify host on deploy. (b) Keep base image small (~500 MB: OS + git + ripgrep + supervisord + IDE server). Lazy-install language toolchains via `mise` on first project use. Prevents the image from bloating to 4 GB six months from now.
Dependency cache poisoning. Cached `node_modules` from project A bleeds into project B.	Caches are per-project (volume `vibn-dev-cache-{projectId}`). Never share. Take the slower-first-install hit; add a Verdaccio mirror later only if it bothers anyone.
*AI keeps calling `gitea_file_` instead of `shell.exec`.**	Hard removal from AI's tool list in week 3, not soft deprecation. Keep REST endpoints alive for a 30-day grace period for any external MCP client. After 30 days, return 410 Gone. The AI has no muscle memory; no graceful migration needed.
What if the user has no Vibn project yet?	First chat creates a project + provisions its Coolify project + spins up `vibn-dev` lazily. ~10s overhead, one-time. Stream progress to the chat ("creating workspace... installing tools..."). Same UX bolt.new uses while WebContainers boot.
Coolify host disk dies → users lose unshipped `/workspace` work.	Auto-push to Gitea `vibn-autosave/main` branch every 5 min of activity, plus before idle-suspend. Treat Gitea as canonical, container disk as ephemeral. Built in week 1, day 2 (not optional).
Path B turns out to be wrong; we need to revert.	Kill-switch admin endpoint (`POST /api/admin/path-b/disable`) flips a feature flag — all new chat sessions go back to Path A; existing dev containers drain. ~10-min revert window. Built week 1.

8. Success metrics

We're not done until all four are true on the eval harness:

Metric	Target	Today (Path A)
Time-to-first-preview (10 reference prompts, p50)	≤ 60 s	~5 min
Iteration loop (small edit → user sees change) p50	≤ 15 s	~3 min
Tool calls per "build me X" task (median)	≤ 30	~80
End-to-end success rate (live deployable result)	≥ 80%	~50%

9. What this changes about the existing roadmap

Tier 1.5 ("Code authoring capability") is collapsed into this doc. C1–C9 mostly disappear (replaced by shell.exec + fs.edit); C10 ("persistent agent dev workspace") is Path B.
Tier 1 P5.1–P5.4 are unchanged. Domains, email, storage, workers — still the right next infra primitives. Path B doesn't replace them; it makes the AI capable enough to actually use them.
Tier 2 P6.x (backups, runtime logs, scoped keys) — unchanged.
gitea_* tools shipped 2026-04-28 are now legacy. Mark deprecated in week 3. Remove in a future cleanup once telemetry confirms zero usage.

10. Decision needed before week 1 starts

Approve Path B as the primary architecture for code authoring. (If no, this doc dies here.)
Approve the dev-container-as-Coolify-service implementation choice. Alternatives: separate dev-host, e2b self-host, Cloud Run jobs. Picked Coolify-service for zero new infra; flag if you want to revisit.
Approve the deprecation of gitea_file_* tools. They were shipped today; deprecating them within 3 weeks is fine if the path forward is clearer, embarrassing if we keep them around as half-working alternates.
Approve the resource cap defaults (free: 1 GB / 0.5 CPU, paid: 4 GB / 2 CPU). Or set different numbers.

Once those four are decided, week 1 starts.

How to use this doc

This is the architectural execution plan. The detailed task list goes into the agent's TodoWrite per-week, not into this file.
When an item ships, move it from "planned" to "shipped" in AI_CAPABILITIES.md and link the commit/PR.
When a risk in §7 turns out to be real, document the mitigation outcome inline so future readers see what actually happened.
This doc supersedes the proposed Tier 1.5 in AI_CAPABILITIES_ROADMAP.md. Add a one-line pointer there once approved.

19 KiB Raw Blame History Unescape Escape