This repository has been archived on 2026-06-07. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
master-ai/docs/AI_PATH_B_EXECUTION_PLAN.md

289 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Path B Execution Plan — Persistent Dev Container Architecture
> The plan to replace Vibn's current "API-wrap-every-Coolify-action" agent
> surface with a Claude-Code-style architecture: one persistent dev
> container per Vibn project, ~10 composable tools, sub-15-second
> iteration, and Coolify only touched at "ship it" time.
>
> **Companion to:** [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (current
> state) and [`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md)
> (everything else).
>
> **Status:** week 1 shipped (2026-04-28). Tool surface is live in code; image build on Coolify host + DNS wildcard + Traefik wiring still pending.
>
> **Why this exists:** today's AI loop is *37 min to first preview, 24
> min per iteration*, because every change goes through a Coolify nixpacks
> build. That UX cannot host the marketplace / SaaS / iterative-build
> stories Vibn is selling. Path B fixes the floor.
---
## 1. The user experience this unlocks
Reference scenario: a non-technical founder chats *"build me a
two-sided marketplace for handmade ceramics."*
| Phase | Path A (today) | Path B (target) |
|---|---|---|
| Discovery & OSS pick | OK | OK |
| Fork an OSS base (e.g. Sharetribe, 800 files) | ~15 min of single-file commits, 800 webhook fires | `git clone` in 8s |
| First live preview | 37 min (Coolify build) | ~30s (Vite HMR in dev container) |
| Each iteration | 24 min (rebuild) | 315s (HMR / process restart) |
| User makes 10 small decisions | ~40 min of staring at spinners | ~3 min of conversation |
| "Ship it" → real domain | already 3 min | 3 min (unchanged — this is the only Coolify build) |
| Total time to live, polished marketplace | 3060 min, often abandoned | ~20 min, mostly the user thinking |
The asymmetry is structural, not optimisable inside Path A.
---
## 2. Architecture overview
```
┌──────────────────────────┐ ┌────────────────────────────────┐
│ vibnai.com chat (user) │ ←→ │ /api/mcp │
└──────────────────────────┘ │ ├ shell.exec │
│ ├ fs.read / fs.edit / fs.glob │
│ ├ dev_server.start │
│ ├ ship │
│ └ apps.* / databases.* / ... │
└────────────┬───────────────────┘
▼ (workspace-scoped)
┌────────────────────────────────────┐
│ Per-Vibn-project Coolify project │
│ ├ vibn-dev ← dev container │
│ ├ web ← prod app │
│ ├ db │
│ └ ... │
└────────────────────────────────────┘
```
### Per-project dev container — the only new piece
For every active Vibn project, we run **one long-lived Coolify
service named `vibn-dev`** inside that project's dedicated Coolify
project (Stage 2/3 of per-project isolation already shipped).
| Property | Value |
|---|---|
| **Image** | `ghcr.io/vibnai/vibn-dev:latest` (we build & maintain) |
| **Base** | Ubuntu 24.04 |
| **Pre-installed** | Node 20, bun, pnpm, Python 3.12 + uv, Go 1.23, Rust, git, gh, `tea` (Gitea CLI), ripgrep, fd, jq, curl, tar, openvscode-server |
| **Default `cwd`** | `/workspace` (persistent volume containing the Gitea working tree) |
| **Persistent volumes** | `/workspace` (git tree), `/cache/{npm,pip,go,cargo}` (package caches) |
| **Resource floor** | 512 MB / 0.25 CPU when idle |
| **Resource ceiling** | 4 GB / 2 CPU during builds (configurable per workspace plan) |
| **Idle suspend** | After 30 min no `shell.exec` activity |
| **Re-wake** | Any `shell.exec` / `fs.*` / `dev_server.*` call |
| **Ports** | 30009999 reserved for the AI's dev server, exposed at `https://preview-{ws}-{project}.vibnai.com` via Traefik wildcard |
| **Tenancy** | Inherits per-project Coolify isolation — workspace can never reach into another's dev container |
### Why this shape (and not e2b / Cloud Run / VM-per-task)
- We already have Coolify, per-project Coolify projects, and Coolify
exec primitives. Adding one service per project is zero new infra.
- Persistence (workspace state, package cache, git working tree)
matters more than per-task isolation for our user. Founders return
to projects across sessions.
- Tenant safety is already solved at the Coolify-project layer.
- Cost stays bounded: one container per *active* project, idle-suspended.
- Upgrade path to e2b / Firecracker exists later if needed (replace the
executor, keep the tool surface).
---
## 3. Tool surface
### New tools (the AI's primary working set)
| Tool | Signature | Purpose |
|---|---|---|
| `shell.exec` | `{ cmd, cwd?, timeoutSec?, env? }` | Run any shell command in the dev container. Streams stdout/stderr back. Capped 15 min. |
| `fs.read` | `{ path, ref? }` | Read a file (or directory listing) from `/workspace`. |
| `fs.write` | `{ path, content }` | Create/overwrite a file. |
| `fs.edit` | `{ path, oldString, newString, replaceAll? }` | Aider-style search/replace. Fails if `oldString` not found / not unique. |
| `fs.glob` | `{ pattern, cwd? }` | List files matching a pattern (e.g. `**/*.tsx`). |
| `fs.grep` | `{ pattern, glob?, contextLines? }` | ripgrep-backed code search. |
| `fs.delete` | `{ path }` | Delete a file or directory. |
| `dev_server.start` | `{ cmd, port, name? }` | Start a long-running process (e.g. `npm run dev`). Returns a public preview URL. |
| `dev_server.stop` | `{ id }` | Kill a dev server. |
| `dev_server.list` | — | What's running, on what URL. |
| `ship` | `{ projectId, commitMsg, deploy? }` | `git add . && git commit && git push` to Gitea, then trigger Coolify deploy of the prod app. The "graduate to production" tool. |
### Kept (orchestration — these are correctly modeled as APIs)
- `apps.*` — Coolify app CRUD, logs, domains, env vars, etc.
- `databases.*`, `auth.*`, `domains.*`, `storage.*` — infrastructure primitives.
- `projects_get`, `projects_list`, `workspace_describe` — context.
- `github_search`, `github_file`, `http_fetch` — external lookup.
### Deprecated (kept for back-compat, banner in docs)
- `gitea_file_read`, `gitea_file_write`, `gitea_file_delete`,
`gitea_branches_list`, `gitea_branch_create`,
`gitea_repo_create`, `gitea_repo_get`, `gitea_repos_list` — the
AI uses `shell.exec` (`git`/`tea` CLI) and `fs.*` instead.
- `apps.exec` — kept (it's still useful for prod-container debugging),
but deprecated for *dev-time* code work.
**Net change:** 53 tools → ~30 tools, but the new ones compose to do
everything the old ones did and more.
---
## 4. The system prompt rewrite
The AI's prompt today says *"call gitea_file_write to push code."* It
becomes:
> You have a real Linux dev environment for this project at `/workspace`.
> Use `shell.exec` to run any command (npm, git, tea, python, anything).
> Use `fs.edit` for surgical changes, `fs.write` for new files.
>
> Standard loop:
> 1. `shell.exec { cmd: "git status" }` to see what's there.
> 2. Edit / create files via `fs.edit` / `fs.write`.
> 3. `shell.exec { cmd: "npm test" }` (or relevant test runner).
> 4. `dev_server.start` to give the user a live preview URL.
> 5. When the user says "ship it", call `ship` — that pushes and
> triggers the production Coolify deploy.
>
> NEVER call `apps_create` to deploy code that hasn't been tested via
> `shell.exec` first. The dev container is your safety net.
---
## 5. Week-by-week execution
### Week 1 — Foundations (dev container + shell) — **SHIPPED 2026-04-28**
**Goal:** AI can clone a repo, install deps, run a script.
- [x] `vibn-dev/Dockerfile` (Ubuntu 24.04 + git + ripgrep + python3 + mise lazy toolchains). `setup-on-coolify.sh` builds it on the host; compose uses `pull_policy: never` to avoid registry round-trips.
- [x] `lib/dev-container.ts`: ensure / exec / suspend / resume helpers. Backed by `fs_project_dev_containers` (auto-created).
- [x] `devcontainer.{ensure,status,suspend}` MCP tools.
- [x] `shell.exec` + `fs.{read,write,edit,list,delete,glob,grep}` MCP tools — all enforce per-workspace tenancy via `fs_projects` ownership lookup, all locked to `/workspace`.
- [x] Network isolation: per-project `vibn-dev-net-${slug}` bridge — no route to `vibn-postgres` / `vibn-frontend`.
- [x] Kill switch: `/api/admin/path-b/{disable,enable}` flips a feature flag in <10s.
- [x] `vibn-tools.ts`: 11 new Gemini tool defs, smoke test passes (63 tools accepted).
- [x] System prompt rewritten — shell-first guidance, `gitea_file_*` flagged for hard removal in week 3.
**Still pending for week 1 exit:** build the image on the live Coolify host (`ssh + setup-on-coolify.sh`), end-to-end verify `devcontainer.ensure → shell.exec ls` against a real project once the frontend deploy lands.
### Week 2 — Preview URLs + iteration — **PARTIALLY SHIPPED 2026-04-28**
**Goal:** AI starts a dev server, user clicks a preview URL, sees their app.
- [ ] DNS: `*.preview.vibnai.com → coolify-host-ip` in OpenSRS. **Manual step, not yet done.**
- [ ] Traefik wildcard cert via DNS-01 against OpenSRS. **Config staged in `vibn-dev/PREVIEWS.md`, not yet applied to live Traefik.**
- [x] `dev_server.{start,stop,list,logs}` MCP tools. Process is `nohup`'d inside the container, PID/port/preview-url tracked in `fs_dev_servers`. Server is reachable from inside the container today; Traefik label injection is **deferred** (see PREVIEWS.md for the recommended pre-allocated-port-range approach).
- [x] `fs.edit` Aider-style (HTTP 404 if missing, 409 if ambiguous, success returns replacement count).
- [x] Per-container CPU/RAM caps: 1 vCPU / 1 GiB by default. Tier scaling via env var.
- [x] System prompt rewritten with shell-first recipe.
**Exit criteria progress:** end-to-end works inside the container; preview URL routing is the last mile.
### Week 3 — Ship-it path + cleanup — **PARTIALLY SHIPPED 2026-04-28**
**Goal:** the dev container's working tree graduates to production.
- [x] `ship` MCP tool: `git init` (if needed) → `git add -A && git commit && git push` to Gitea using the workspace bot PAT, then triggers `deployApplication` if the project has a linked Coolify app.
- [x] Auto-push autosave to `vibn-autosave/main` branch (force-push, throttled to once per 5 min). Endpoint: `POST /api/admin/path-b/autosave { projectId | sweep:true }`.
- [x] Idle-suspend sweep: `POST /api/admin/path-b/idle-sweep[?minutes=30]`. Wire to a 5-min cron once we trust the suspend path.
- [ ] Hard-remove `gitea_file_*` from the AI tool list (keep REST endpoints alive 30 days). **Deferred to next week so we can A/B the new tools first.**
- [ ] Update `AI_CAPABILITIES.md`. **Deferred — will rewrite once eval data is in.**
**Exit criteria progress:** ship loop is functionally complete. Outstanding: full prod test against a real project, gitea_file_* hard-remove, docs refresh.
### Week 4 — Eval, polish, IDE drop-in
**Goal:** measure that this actually delivers the promised UX, ship the optional graduation path.
- [ ] **Eval harness:** 10 reference prompts (TODO app, marketplace, blog with auth, kanban, image-uploader, AI chatbot, simple e-commerce, dashboard, REST API + DB, static site). Measure: time-to-first-preview, time-to-shipped, AI tool-call count, success rate. Compare to a baseline run on Path A.
- [ ] **Theia drop-in:** expose openvscode-server (already in the image) at `https://ide-{ws}-{project}.vibnai.com`. Optional toggle in chat UI: "Open IDE." Lets a user-becoming-developer drop into the same `/workspace` the AI's been editing.
- [ ] **Bug fixes** found during eval.
- [ ] **Docs:** update Vibn's user-facing pages to reflect the new "describe → live preview in seconds → iterate → ship" flow.
**Exit criteria:** eval shows ≥3× speedup on time-to-first-preview vs.
Path A, ≥80% success rate on the 10 reference prompts.
---
## 6. OSS we will lean on (not reinvent)
| Need | OSS choice | Notes |
|---|---|---|
| Dev container image base | Ubuntu 24.04 + toolchains | We bake & maintain. ~1 GB. |
| In-browser IDE (week 4 graduation path) | `openvscode-server` (`gitpod-io/openvscode-server`, MIT) | Pre-installed in the image. Optional toggle. |
| Edit format | **Aider's search/replace block format** (`Aider-AI/aider`, Apache 2.0) | Borrow the format + error semantics. |
| Process supervision inside the container | `tini` (already standard) + a tiny in-house supervisor for `dev_server.*` | No need for full systemd. |
| Code search inside the container | `ripgrep` (`BurntSushi/ripgrep`, MIT) | Pre-installed. `fs.grep` is a thin wrapper. |
| Git inside the container | `git` + `tea` (Gitea CLI, MIT) | `tea` lets the AI do PR ops without us building gitea_pr_* tools. |
| Reference for end-to-end agent loops | `All-Hands-AI/OpenHands` (MIT) | Read their runtime + tool design. Don't import their code. |
| Reference for fast iteration UX | `bolt.new` (`stackblitz/bolt.new`) | UX north star, not a code source. |
---
## 7. Risks & open questions
| Risk | Mitigation |
|---|---|
| **Dev containers eat money.** 100 active projects × 24/7 = ~$50/mo wasted. | Idle-suspend after 30 min. Resume in <5s. Per-plan caps. Auto-delete suspended-and-untouched volumes after 30 days. |
| **`shell.exec` is the universal escape hatch — security?** AI inside a single workspace's container can do anything that container can do. | (a) Per-project Coolify isolation. (b) **Network policy: dev containers have NO route to internal Vibn services (vibn-postgres, vibn-frontend, Coolify control plane). Implemented via Docker network rules in week 1, not deferred.** (c) Audit log on every `shell.exec` call. (d) Per-container CPU/RAM caps absorb fork-bomb / coin-mining attempts. |
| **Preview URL leaks.** `https://preview-mark-ceramic-market.vibnai.com` is publicly resolvable. | Default: random suffix in subdomain (`preview-mark-ceramic-market-7a3f.vibnai.com`) — ~64 bits of unguessability. Optional Vibn-session-cookie auth as paid-tier feature later. |
| **Hot reload through Traefik.** WebSocket / HMR can be finicky over a reverse proxy. | **Spike on week 1, day 1**: bring up a Vite dev server inside vibn-dev, expose via Traefik, edit a file, verify HMR fires. Failure here is the biggest "things look fine until you actually test" risk; de-risk early. |
| **Image size / pull time on first project.** ~1 GB pull adds 3060s to first dev container spin-up. | (a) Pre-pull image on every Coolify host on deploy. (b) **Keep base image small (~500 MB: OS + git + ripgrep + supervisord + IDE server). Lazy-install language toolchains via `mise` on first project use.** Prevents the image from bloating to 4 GB six months from now. |
| **Dependency cache poisoning.** Cached `node_modules` from project A bleeds into project B. | Caches are per-project (volume `vibn-dev-cache-{projectId}`). Never share. Take the slower-first-install hit; add a Verdaccio mirror later only if it bothers anyone. |
| **AI keeps calling `gitea_file_*` instead of `shell.exec`.** | **Hard removal from AI's tool list in week 3, not soft deprecation.** Keep REST endpoints alive for a 30-day grace period for any external MCP client. After 30 days, return 410 Gone. The AI has no muscle memory; no graceful migration needed. |
| **What if the user has no Vibn project yet?** | First chat creates a project + provisions its Coolify project + spins up `vibn-dev` lazily. ~10s overhead, one-time. Stream progress to the chat ("creating workspace... installing tools..."). Same UX bolt.new uses while WebContainers boot. |
| **Coolify host disk dies → users lose unshipped `/workspace` work.** | **Auto-push to Gitea `vibn-autosave/main` branch every 5 min of activity, plus before idle-suspend.** Treat Gitea as canonical, container disk as ephemeral. Built in week 1, day 2 (not optional). |
| **Path B turns out to be wrong; we need to revert.** | **Kill-switch admin endpoint (`POST /api/admin/path-b/disable`) flips a feature flag — all new chat sessions go back to Path A; existing dev containers drain.** ~10-min revert window. Built week 1. |
---
## 8. Success metrics
We're not done until **all four** are true on the eval harness:
| Metric | Target | Today (Path A) |
|---|---|---|
| Time-to-first-preview (10 reference prompts, p50) | ≤ 60 s | ~5 min |
| Iteration loop (small edit → user sees change) p50 | ≤ 15 s | ~3 min |
| Tool calls per "build me X" task (median) | ≤ 30 | ~80 |
| End-to-end success rate (live deployable result) | ≥ 80% | ~50% |
---
## 9. What this changes about the existing roadmap
- **Tier 1.5 ("Code authoring capability") is collapsed into this doc.** C1C9 mostly disappear (replaced by `shell.exec` + `fs.edit`); C10 ("persistent agent dev workspace") **is** Path B.
- **Tier 1 P5.1P5.4 are unchanged.** Domains, email, storage, workers — still the right next infra primitives. Path B doesn't replace them; it makes the AI capable enough to actually use them.
- **Tier 2 P6.x** (backups, runtime logs, scoped keys) — unchanged.
- **`gitea_*` tools shipped 2026-04-28** are now legacy. Mark deprecated in week 3. Remove in a future cleanup once telemetry confirms zero usage.
---
## 10. Decision needed before week 1 starts
1. **Approve Path B as the primary architecture for code authoring.** (If no, this doc dies here.)
2. **Approve the dev-container-as-Coolify-service implementation choice.** Alternatives: separate dev-host, e2b self-host, Cloud Run jobs. Picked Coolify-service for zero new infra; flag if you want to revisit.
3. **Approve the deprecation of `gitea_file_*` tools.** They were shipped today; deprecating them within 3 weeks is fine if the path forward is clearer, embarrassing if we keep them around as half-working alternates.
4. **Approve the resource cap defaults** (free: 1 GB / 0.5 CPU, paid: 4 GB / 2 CPU). Or set different numbers.
Once those four are decided, week 1 starts.
---
## How to use this doc
- This is the *architectural* execution plan. The detailed task list
goes into the agent's TodoWrite per-week, not into this file.
- When an item ships, **move it from "planned" to "shipped"** in
[`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) and link the commit/PR.
- When a risk in §7 turns out to be real, document the mitigation
outcome inline so future readers see what actually happened.
- This doc supersedes the proposed Tier 1.5 in
[`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md). Add a
one-line pointer there once approved.