docs: heavily compress and simplify remaining reference files to represent current state

2026-05-07 15:07:31 -07:00
parent 3563b98de1
commit 057115a9fc
8 changed files with 58 additions and 2926 deletions
--- a/docs/AI_PATH_B_EXECUTION_PLAN.md
+++ b/docs/AI_PATH_B_EXECUTION_PLAN.md
@@ -1,288 +1,12 @@
-# Path B Execution Plan — Persistent Dev Container Architecture
+# AI Path B (Shipped)

-> The plan to replace Vibn's current "API-wrap-every-Coolify-action" agent
-> surface with a Claude-Code-style architecture: one persistent dev
-> container per Vibn project, ~10 composable tools, sub-15-second
-> iteration, and Coolify only touched at "ship it" time.
->
-> **Companion to:** [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (current
-> state) and [`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md)
-> (everything else).
->
-> **Status:** week 1 shipped (2026-04-28). Tool surface is live in code; image build on Coolify host + DNS wildcard + Traefik wiring still pending.
->
-> **Why this exists:** today's AI loop is *3–7 min to first preview, 2–4
-> min per iteration*, because every change goes through a Coolify nixpacks
-> build. That UX cannot host the marketplace / SaaS / iterative-build
-> stories Vibn is selling. Path B fixes the floor.
+> **Note:** This document outlines the architecture for "Path B", which shifted the AI's execution context from Cloud Run to persistent per-project Docker containers hosted on the Coolify server. This architecture was fully successfully shipped in May 2026.

---
+## Architecture
+- Every project has a persistent Gitea repository.
+- Every project gets a single `vibn-dev` container provisioned as a Coolify service (`ensureDevContainer`).
+- The AI runs its tools (like `shell_exec` and `fs_*`) *inside* this container using `docker exec` via the Coolify API.
+- Dev servers (like `npm run dev`) bind to `0.0.0.0:3000` and are exposed to the internet via Traefik wildcard subdomains (`*.preview.vibnai.com`).
+- When the user is ready, the code is committed to Gitea and deployed to production via `apps_deploy`.

-## 1. The user experience this unlocks
-
-Reference scenario: a non-technical founder chats *"build me a
-two-sided marketplace for handmade ceramics."*
-
-| Phase | Path A (today) | Path B (target) |
-|---|---|---|
-| Discovery & OSS pick | OK | OK |
-| Fork an OSS base (e.g. Sharetribe, 800 files) | ~15 min of single-file commits, 800 webhook fires | `git clone` in 8s |
-| First live preview | 3–7 min (Coolify build) | ~30s (Vite HMR in dev container) |
-| Each iteration | 2–4 min (rebuild) | 3–15s (HMR / process restart) |
-| User makes 10 small decisions | ~40 min of staring at spinners | ~3 min of conversation |
-| "Ship it" → real domain | already 3 min | 3 min (unchanged — this is the only Coolify build) |
-| Total time to live, polished marketplace | 30–60 min, often abandoned | ~20 min, mostly the user thinking |
-
-The asymmetry is structural, not optimisable inside Path A.
-
---
-
-## 2. Architecture overview
-
-```
-┌──────────────────────────┐     ┌────────────────────────────────┐
-│  vibnai.com chat (user)  │ ←→  │  /api/mcp                       │
-└──────────────────────────┘     │   ├ shell.exec                  │
-                                 │   ├ fs.read / fs.edit / fs.glob │
-                                 │   ├ dev_server.start            │
-                                 │   ├ ship                        │
-                                 │   └ apps.* / databases.* / ...  │
-                                 └────────────┬───────────────────┘
-                                              │
-                                              ▼ (workspace-scoped)
-                          ┌────────────────────────────────────┐
-                          │  Per-Vibn-project Coolify project  │
-                          │   ├ vibn-dev   ← dev container     │
-                          │   ├ web         ← prod app         │
-                          │   ├ db                              │
-                          │   └ ...                             │
-                          └────────────────────────────────────┘
-```
-
-### Per-project dev container — the only new piece
-
-For every active Vibn project, we run **one long-lived Coolify
-service named `vibn-dev`** inside that project's dedicated Coolify
-project (Stage 2/3 of per-project isolation already shipped).
-
-| Property | Value |
-|---|---|
-| **Image** | `ghcr.io/vibnai/vibn-dev:latest` (we build & maintain) |
-| **Base** | Ubuntu 24.04 |
-| **Pre-installed** | Node 20, bun, pnpm, Python 3.12 + uv, Go 1.23, Rust, git, gh, `tea` (Gitea CLI), ripgrep, fd, jq, curl, tar, openvscode-server |
-| **Default `cwd`** | `/workspace` (persistent volume containing the Gitea working tree) |
-| **Persistent volumes** | `/workspace` (git tree), `/cache/{npm,pip,go,cargo}` (package caches) |
-| **Resource floor** | 512 MB / 0.25 CPU when idle |
-| **Resource ceiling** | 4 GB / 2 CPU during builds (configurable per workspace plan) |
-| **Idle suspend** | After 30 min no `shell.exec` activity |
-| **Re-wake** | Any `shell.exec` / `fs.*` / `dev_server.*` call |
-| **Ports** | 3000–9999 reserved for the AI's dev server, exposed at `https://preview-{ws}-{project}.vibnai.com` via Traefik wildcard |
-| **Tenancy** | Inherits per-project Coolify isolation — workspace can never reach into another's dev container |
-
-### Why this shape (and not e2b / Cloud Run / VM-per-task)
-
- We already have Coolify, per-project Coolify projects, and Coolify
-  exec primitives. Adding one service per project is zero new infra.
- Persistence (workspace state, package cache, git working tree)
-  matters more than per-task isolation for our user. Founders return
-  to projects across sessions.
- Tenant safety is already solved at the Coolify-project layer.
- Cost stays bounded: one container per *active* project, idle-suspended.
- Upgrade path to e2b / Firecracker exists later if needed (replace the
-  executor, keep the tool surface).
-
---
-
-## 3. Tool surface
-
-### New tools (the AI's primary working set)
-
-| Tool | Signature | Purpose |
-|---|---|---|
-| `shell.exec` | `{ cmd, cwd?, timeoutSec?, env? }` | Run any shell command in the dev container. Streams stdout/stderr back. Capped 15 min. |
-| `fs.read` | `{ path, ref? }` | Read a file (or directory listing) from `/workspace`. |
-| `fs.write` | `{ path, content }` | Create/overwrite a file. |
-| `fs.edit` | `{ path, oldString, newString, replaceAll? }` | Aider-style search/replace. Fails if `oldString` not found / not unique. |
-| `fs.glob` | `{ pattern, cwd? }` | List files matching a pattern (e.g. `**/*.tsx`). |
-| `fs.grep` | `{ pattern, glob?, contextLines? }` | ripgrep-backed code search. |
-| `fs.delete` | `{ path }` | Delete a file or directory. |
-| `dev_server.start` | `{ cmd, port, name? }` | Start a long-running process (e.g. `npm run dev`). Returns a public preview URL. |
-| `dev_server.stop` | `{ id }` | Kill a dev server. |
-| `dev_server.list` | — | What's running, on what URL. |
-| `ship` | `{ projectId, commitMsg, deploy? }` | `git add . && git commit && git push` to Gitea, then trigger Coolify deploy of the prod app. The "graduate to production" tool. |
-
-### Kept (orchestration — these are correctly modeled as APIs)
-
- `apps.*` — Coolify app CRUD, logs, domains, env vars, etc.
- `databases.*`, `auth.*`, `domains.*`, `storage.*` — infrastructure primitives.
- `projects_get`, `projects_list`, `workspace_describe` — context.
- `github_search`, `github_file`, `http_fetch` — external lookup.
-
-### Deprecated (kept for back-compat, banner in docs)
-
- `gitea_file_read`, `gitea_file_write`, `gitea_file_delete`,
-  `gitea_branches_list`, `gitea_branch_create`,
-  `gitea_repo_create`, `gitea_repo_get`, `gitea_repos_list` — the
-  AI uses `shell.exec` (`git`/`tea` CLI) and `fs.*` instead.
- `apps.exec` — kept (it's still useful for prod-container debugging),
-  but deprecated for *dev-time* code work.
-
-**Net change:** 53 tools → ~30 tools, but the new ones compose to do
-everything the old ones did and more.
-
---
-
-## 4. The system prompt rewrite
-
-The AI's prompt today says *"call gitea_file_write to push code."* It
-becomes:
-
-> You have a real Linux dev environment for this project at `/workspace`.
-> Use `shell.exec` to run any command (npm, git, tea, python, anything).
-> Use `fs.edit` for surgical changes, `fs.write` for new files.
->
-> Standard loop:
-> 1. `shell.exec { cmd: "git status" }` to see what's there.
-> 2. Edit / create files via `fs.edit` / `fs.write`.
-> 3. `shell.exec { cmd: "npm test" }` (or relevant test runner).
-> 4. `dev_server.start` to give the user a live preview URL.
-> 5. When the user says "ship it", call `ship` — that pushes and
->    triggers the production Coolify deploy.
->
-> NEVER call `apps_create` to deploy code that hasn't been tested via
-> `shell.exec` first. The dev container is your safety net.
-
---
-
-## 5. Week-by-week execution
-
-### Week 1 — Foundations (dev container + shell) — **SHIPPED 2026-04-28**
-
-**Goal:** AI can clone a repo, install deps, run a script.
-
- [x] `vibn-dev/Dockerfile` (Ubuntu 24.04 + git + ripgrep + python3 + mise lazy toolchains). `setup-on-coolify.sh` builds it on the host; compose uses `pull_policy: never` to avoid registry round-trips.
- [x] `lib/dev-container.ts`: ensure / exec / suspend / resume helpers. Backed by `fs_project_dev_containers` (auto-created).
- [x] `devcontainer.{ensure,status,suspend}` MCP tools.
- [x] `shell.exec` + `fs.{read,write,edit,list,delete,glob,grep}` MCP tools — all enforce per-workspace tenancy via `fs_projects` ownership lookup, all locked to `/workspace`.
- [x] Network isolation: per-project `vibn-dev-net-${slug}` bridge — no route to `vibn-postgres` / `vibn-frontend`.
- [x] Kill switch: `/api/admin/path-b/{disable,enable}` flips a feature flag in <10s.
- [x] `vibn-tools.ts`: 11 new Gemini tool defs, smoke test passes (63 tools accepted).
- [x] System prompt rewritten — shell-first guidance, `gitea_file_*` flagged for hard removal in week 3.
-
-**Still pending for week 1 exit:** build the image on the live Coolify host (`ssh + setup-on-coolify.sh`), end-to-end verify `devcontainer.ensure → shell.exec ls` against a real project once the frontend deploy lands.
-
-### Week 2 — Preview URLs + iteration — **PARTIALLY SHIPPED 2026-04-28**
-
-**Goal:** AI starts a dev server, user clicks a preview URL, sees their app.
-
- [ ] DNS: `*.preview.vibnai.com → coolify-host-ip` in OpenSRS. **Manual step, not yet done.**
- [ ] Traefik wildcard cert via DNS-01 against OpenSRS. **Config staged in `vibn-dev/PREVIEWS.md`, not yet applied to live Traefik.**
- [x] `dev_server.{start,stop,list,logs}` MCP tools. Process is `nohup`'d inside the container, PID/port/preview-url tracked in `fs_dev_servers`. Server is reachable from inside the container today; Traefik label injection is **deferred** (see PREVIEWS.md for the recommended pre-allocated-port-range approach).
- [x] `fs.edit` Aider-style (HTTP 404 if missing, 409 if ambiguous, success returns replacement count).
- [x] Per-container CPU/RAM caps: 1 vCPU / 1 GiB by default. Tier scaling via env var.
- [x] System prompt rewritten with shell-first recipe.
-
-**Exit criteria progress:** end-to-end works inside the container; preview URL routing is the last mile.
-
-### Week 3 — Ship-it path + cleanup — **PARTIALLY SHIPPED 2026-04-28**
-
-**Goal:** the dev container's working tree graduates to production.
-
- [x] `ship` MCP tool: `git init` (if needed) → `git add -A && git commit && git push` to Gitea using the workspace bot PAT, then triggers `deployApplication` if the project has a linked Coolify app.
- [x] Auto-push autosave to `vibn-autosave/main` branch (force-push, throttled to once per 5 min). Endpoint: `POST /api/admin/path-b/autosave { projectId | sweep:true }`.
- [x] Idle-suspend sweep: `POST /api/admin/path-b/idle-sweep[?minutes=30]`. Wire to a 5-min cron once we trust the suspend path.
- [ ] Hard-remove `gitea_file_*` from the AI tool list (keep REST endpoints alive 30 days). **Deferred to next week so we can A/B the new tools first.**
- [ ] Update `AI_CAPABILITIES.md`. **Deferred — will rewrite once eval data is in.**
-
-**Exit criteria progress:** ship loop is functionally complete. Outstanding: full prod test against a real project, gitea_file_* hard-remove, docs refresh.
-
-### Week 4 — Eval, polish, IDE drop-in
-
-**Goal:** measure that this actually delivers the promised UX, ship the optional graduation path.
-
- [ ] **Eval harness:** 10 reference prompts (TODO app, marketplace, blog with auth, kanban, image-uploader, AI chatbot, simple e-commerce, dashboard, REST API + DB, static site). Measure: time-to-first-preview, time-to-shipped, AI tool-call count, success rate. Compare to a baseline run on Path A.
- [ ] **Theia drop-in:** expose openvscode-server (already in the image) at `https://ide-{ws}-{project}.vibnai.com`. Optional toggle in chat UI: "Open IDE." Lets a user-becoming-developer drop into the same `/workspace` the AI's been editing.
- [ ] **Bug fixes** found during eval.
- [ ] **Docs:** update Vibn's user-facing pages to reflect the new "describe → live preview in seconds → iterate → ship" flow.
-
-**Exit criteria:** eval shows ≥3× speedup on time-to-first-preview vs.
-Path A, ≥80% success rate on the 10 reference prompts.
-
---
-
-## 6. OSS we will lean on (not reinvent)
-
-| Need | OSS choice | Notes |
-|---|---|---|
-| Dev container image base | Ubuntu 24.04 + toolchains | We bake & maintain. ~1 GB. |
-| In-browser IDE (week 4 graduation path) | `openvscode-server` (`gitpod-io/openvscode-server`, MIT) | Pre-installed in the image. Optional toggle. |
-| Edit format | **Aider's search/replace block format** (`Aider-AI/aider`, Apache 2.0) | Borrow the format + error semantics. |
-| Process supervision inside the container | `tini` (already standard) + a tiny in-house supervisor for `dev_server.*` | No need for full systemd. |
-| Code search inside the container | `ripgrep` (`BurntSushi/ripgrep`, MIT) | Pre-installed. `fs.grep` is a thin wrapper. |
-| Git inside the container | `git` + `tea` (Gitea CLI, MIT) | `tea` lets the AI do PR ops without us building gitea_pr_* tools. |
-| Reference for end-to-end agent loops | `All-Hands-AI/OpenHands` (MIT) | Read their runtime + tool design. Don't import their code. |
-| Reference for fast iteration UX | `bolt.new` (`stackblitz/bolt.new`) | UX north star, not a code source. |
-
---
-
-## 7. Risks & open questions
-
-| Risk | Mitigation |
-|---|---|
-| **Dev containers eat money.** 100 active projects × 24/7 = ~$50/mo wasted. | Idle-suspend after 30 min. Resume in <5s. Per-plan caps. Auto-delete suspended-and-untouched volumes after 30 days. |
-| **`shell.exec` is the universal escape hatch — security?** AI inside a single workspace's container can do anything that container can do. | (a) Per-project Coolify isolation. (b) **Network policy: dev containers have NO route to internal Vibn services (vibn-postgres, vibn-frontend, Coolify control plane). Implemented via Docker network rules in week 1, not deferred.** (c) Audit log on every `shell.exec` call. (d) Per-container CPU/RAM caps absorb fork-bomb / coin-mining attempts. |
-| **Preview URL leaks.** `https://preview-mark-ceramic-market.vibnai.com` is publicly resolvable. | Default: random suffix in subdomain (`preview-mark-ceramic-market-7a3f.vibnai.com`) — ~64 bits of unguessability. Optional Vibn-session-cookie auth as paid-tier feature later. |
-| **Hot reload through Traefik.** WebSocket / HMR can be finicky over a reverse proxy. | **Spike on week 1, day 1**: bring up a Vite dev server inside vibn-dev, expose via Traefik, edit a file, verify HMR fires. Failure here is the biggest "things look fine until you actually test" risk; de-risk early. |
-| **Image size / pull time on first project.** ~1 GB pull adds 30–60s to first dev container spin-up. | (a) Pre-pull image on every Coolify host on deploy. (b) **Keep base image small (~500 MB: OS + git + ripgrep + supervisord + IDE server). Lazy-install language toolchains via `mise` on first project use.** Prevents the image from bloating to 4 GB six months from now. |
-| **Dependency cache poisoning.** Cached `node_modules` from project A bleeds into project B. | Caches are per-project (volume `vibn-dev-cache-{projectId}`). Never share. Take the slower-first-install hit; add a Verdaccio mirror later only if it bothers anyone. |
-| **AI keeps calling `gitea_file_*` instead of `shell.exec`.** | **Hard removal from AI's tool list in week 3, not soft deprecation.** Keep REST endpoints alive for a 30-day grace period for any external MCP client. After 30 days, return 410 Gone. The AI has no muscle memory; no graceful migration needed. |
-| **What if the user has no Vibn project yet?** | First chat creates a project + provisions its Coolify project + spins up `vibn-dev` lazily. ~10s overhead, one-time. Stream progress to the chat ("creating workspace... installing tools..."). Same UX bolt.new uses while WebContainers boot. |
-| **Coolify host disk dies → users lose unshipped `/workspace` work.** | **Auto-push to Gitea `vibn-autosave/main` branch every 5 min of activity, plus before idle-suspend.** Treat Gitea as canonical, container disk as ephemeral. Built in week 1, day 2 (not optional). |
-| **Path B turns out to be wrong; we need to revert.** | **Kill-switch admin endpoint (`POST /api/admin/path-b/disable`) flips a feature flag — all new chat sessions go back to Path A; existing dev containers drain.** ~10-min revert window. Built week 1. |
-
---
-
-## 8. Success metrics
-
-We're not done until **all four** are true on the eval harness:
-
-| Metric | Target | Today (Path A) |
-|---|---|---|
-| Time-to-first-preview (10 reference prompts, p50) | ≤ 60 s | ~5 min |
-| Iteration loop (small edit → user sees change) p50 | ≤ 15 s | ~3 min |
-| Tool calls per "build me X" task (median) | ≤ 30 | ~80 |
-| End-to-end success rate (live deployable result) | ≥ 80% | ~50% |
-
---
-
-## 9. What this changes about the existing roadmap
-
- **Tier 1.5 ("Code authoring capability") is collapsed into this doc.** C1–C9 mostly disappear (replaced by `shell.exec` + `fs.edit`); C10 ("persistent agent dev workspace") **is** Path B.
- **Tier 1 P5.1–P5.4 are unchanged.** Domains, email, storage, workers — still the right next infra primitives. Path B doesn't replace them; it makes the AI capable enough to actually use them.
- **Tier 2 P6.x** (backups, runtime logs, scoped keys) — unchanged.
- **`gitea_*` tools shipped 2026-04-28** are now legacy. Mark deprecated in week 3. Remove in a future cleanup once telemetry confirms zero usage.
-
---
-
-## 10. Decision needed before week 1 starts
-
-1. **Approve Path B as the primary architecture for code authoring.** (If no, this doc dies here.)
-2. **Approve the dev-container-as-Coolify-service implementation choice.** Alternatives: separate dev-host, e2b self-host, Cloud Run jobs. Picked Coolify-service for zero new infra; flag if you want to revisit.
-3. **Approve the deprecation of `gitea_file_*` tools.** They were shipped today; deprecating them within 3 weeks is fine if the path forward is clearer, embarrassing if we keep them around as half-working alternates.
-4. **Approve the resource cap defaults** (free: 1 GB / 0.5 CPU, paid: 4 GB / 2 CPU). Or set different numbers.
-
-Once those four are decided, week 1 starts.
-
---
-
-## How to use this doc
-
- This is the *architectural* execution plan. The detailed task list
-  goes into the agent's TodoWrite per-week, not into this file.
- When an item ships, **move it from "planned" to "shipped"** in
-  [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) and link the commit/PR.
- When a risk in §7 turns out to be real, document the mitigation
-  outcome inline so future readers see what actually happened.
- This doc supersedes the proposed Tier 1.5 in
-  [`AI_CAPABILITIES_ROADMAP.md`](./AI_CAPABILITIES_ROADMAP.md). Add a
-  one-line pointer there once approved.
+*(Refer to `lib/ai/vibn-tools.ts` and `app/api/mcp/route.ts` for the live implementation).*