Archived

This repository has been archived on 2026-06-07. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

mawkone 78d468d365 plan: add Phase 6 (artifact-first UX) + model-assignment convention

Phase 6 captures the Bolt parity work that came out of today's
review of Bolt screenshots — two-pane shell, preview-as-default,
plan-as-checklist, settings popover, project-level secrets.

Adds a "Model assignment convention" section so we can
explicitly route mechanical work to a cheaper model and reserve
Opus for judgment-heavy tasks. Each Phase 6 row tagged opus /
cheap / opus-spec→cheap. Net: 9 hrs Opus, 8 hrs cheap.

Also brings forward two items shipped today that weren't in
the plan yet:
  - 5.7 dev container <-> Gitea wiring (auto-clone +
    auto-commit + GITEA_USERNAME fallback fix)
  - 3.8a/b/c "stop at something tangible" rule + reverted
    composer chip row + queued server-side enforcement

Sequencing diagram + cadence note updated to include P6.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-04 13:22:21 -07:00

34 KiB

Raw Blame History

Beta Launch Execution Plan

The path from "shipping to ourselves" to "5–10 friendly testers can use Vibn end-to-end without us hand-holding."

Companion to: AI_PATH_B_EXECUTION_PLAN.md (architecture) and AI_CAPABILITIES.md (current state).

Drafted: 2026-04-30. Owner: Mark + AI.

Scope: Everything we agreed in the 2026-04-30 review that's NOT already shipped. Pulls in the unfinished items from Path B (DNS, cert, previews, eval) AND the "before strangers see this" gaps that Path B doesn't cover (runtime errors, error surfaces, onboarding smoke test, landing page, safety rails).

North star for the beta

A non-technical founder receives a Vibn invite link, signs up, describes what they want to build, sees a working preview within a few minutes, can iterate on it through chat without seeing a stack trace, and can ship it to a real domain — all without us reaching into Coolify on their behalf.

If any of those steps requires us in the loop, beta isn't ready.

Phase ordering

Sequenced by leverage × blocking risk. Earlier phases unblock later ones.

P1 Previews unlock        ── enables fast-iteration UX & demos ──┐
P2 Stability & visibility ── stops silent rot ───────────────────┤
P3 UX surfaces            ── what the user actually touches ─────┼─── INVITE
P4 Onboarding & safety    ── what a stranger needs day 1 ────────┤
P5 Path B closeout        ── ship the architectural commitments ─┤
P6 Artifact-first UX      ── two-pane shell, preview-as-default ─┘

Model assignment convention

Opus is reserved for judgment-heavy work — anything that touches multiple subsystems, has security implications, designs a protocol, or requires reading existing architecture before deciding what to ship. Mechanical, well-specified work goes to a cheaper coder model.

Per-task tags inside each Phase table:

opus — architectural, cross-cutting, security-sensitive. Opus reads the relevant code, decides the approach, writes the code.
cheap — well-specified, single-file or local-scope, pattern exists. Cheaper model executes from the row's notes.
opus-spec → cheap — Opus writes a tight one-paragraph spec in the row's notes (schema columns, function signature, exact files to touch); cheaper model implements verbatim.

If a row has no model tag, default is cheap. The expensive default is opt-in, not opt-out.

Phase 1 — Previews unlock — SHIPPED 2026-05-01

Goal: dev_server.start returns a clickable https://*.preview.vibnai.com URL that loads in <30s, with HMR working over the proxy.

Why first: the single biggest UX cliff today is "user iterates → 3-7 min Coolify build". Previews collapse it to seconds. Everything else is polish on a slow loop until this lands.

#	Task	Owner	Effort	Status
1.1	Sign up for Cloudflare; add `vibnai.com`; verify imported records (MX, SPF, wildcard A, apex A)	Mark	15 min	✓ done
1.2	Switch Namecheap nameservers to Cloudflare-assigned NS pair	Mark	2 min	✓ done
1.3	Wait for propagation; verify `dig @1.1.1.1` from multiple resolvers	AI	30–120 min	✓ done — `34.19.250.135` from CF + Google resolvers
1.4	Generate Cloudflare API token (DNS edit, `vibnai.com` only)	Mark	2 min	✓ done — stored in `.coolify.env`
1.5	Configure Traefik Let's Encrypt DNS-01 with the Cloudflare token	AI	20 min	✓ done — `letsencrypt-dns` resolver wired in `coolify-proxy`
1.6	Test wildcard cert issues for `*.preview.vibnai.com` (curl, browser)	AI	10 min	✓ done — both `.vibnai.com` and `.preview.vibnai.com` certs issued; `curl https://test.preview.vibnai.com` returns valid LE cert
1.7	Wire `dev_server.start` to mint Traefik labels with the wildcard host	AI	1 hr	✓ done — pre-baked labels for ports 3000–3009 in `vibn-dev` compose; YAML escape bug fixed; cert resolver fixed to `letsencrypt-dns`
1.8	Spike: WebSocket / Vite HMR through Traefik against `vibn-dev` container	AI	30 min	✓ done — `101 Switching Protocols`, `vite-hmr` subprotocol negotiated, `js-update` messages fire within ~1s of file edit. See verified config below.

Definition of done: ✅ AI says "open a Vite dev server", user clicks the URL, sees Vite's welcome page, edits a file via fs.edit, change appears in browser within 5s without manual reload.

Verified Vite config for HMR through Traefik (the system prompt should advertise this exact shape when scaffolding Vite projects):

server: {
  host: '0.0.0.0',
  port: 3001,            // any 3000–3009
  strictPort: true,
  hmr: {
    clientPort: 443,
    protocol: 'wss',
    host: 'preview-{slot}-{slug}-{token}.preview.vibnai.com',
  },
}

Phase 2 — Stability & visibility

Goal: when something breaks in production, we hear about it before users do.

#	Task	Owner	Effort	Notes
2.1	Reproduce + diagnose `ERR_HTTP_HEADERS_SENT` from prod logs	AI	1–2 hrs	Likely a server action / API route returning twice
2.2	Reproduce + diagnose `TypeError: reading 'z'/'j'/'aa'` in prod bundle	AI	1–2 hrs	Minified prod error; suspect `react-markdown` server/client boundary
2.3	Wire Sentry (or alternative) for both client + server runtime errors	AI	✓ done 2026-05-01	`@sentry/nextjs` v10 wired in `vibn-frontend`. `instrumentation.ts` (server+edge), `instrumentation-client.ts` (browser w/ Session Replay free tier, all text masked), `app/global-error.tsx`, `next.config.ts` wrapped with `withSentryConfig`. `NEXT_PUBLIC_SENTRY_DSN` and `SENTRY_AUTH_TOKEN` in Coolify env, with matching `ARG` lines in `vibn-frontend/Dockerfile`. End-to-end verified via `/sentry-example-page` 2026-05-01: client + server errors capture, breadcrumbs work, stack traces de-minify to real filenames (`app/sentry-example-page/page.tsx:49`).
2.4	Wire deployment-failed Coolify webhook → Slack/email	AI	✓ done 2026-05-01	Slack webhook wired into `slack_notification_settings` for both Coolify teams. Defaults: failure events on (deploy, backup, scheduled task, docker cleanup, server unreachable, disk usage), success events off. Tested with a manual webhook ping — confirmed in user's Slack.
2.5	Tighten Coolify docker prune to every 6 hrs (vs daily)	AI	✓ done 2026-05-01	Already configured: both servers use `docker_cleanup_frequency: "0 /6 * *"` with `force_docker_cleanup: true`. Verified via `/api/v1/servers`.
2.6	Bake `HEALTHCHECK 127.0.0.1` into `vibn-frontend/Dockerfile` so future apps inherit	AI	✓ done 2026-05-01	Already in `vibn-frontend/Dockerfile:67-68`; comment explains the IPv6 trap
2.7	Audit other Dockerfile-based apps for the same `localhost`/IPv6 trap	AI	✓ done 2026-05-01	Audited `vibn-dev/Dockerfile` and `vibn-agent-runner/Dockerfile` — neither defines a HEALTHCHECK, so neither can hit the localhost/IPv6 trap. No action needed today; revisit when either gets a healthcheck added.
2.8	Tool-error recovery middleware (AI_HARNESS_GAPS.md §1) — pattern-match known-recoverable tool errors and inject synthetic instructions before the model's next round	AI	✓ done 2026-05-01	`vibn-frontend/lib/ai/error-recovery.ts`. Initial rules: orphan container conflict, image pull denied, port allocated. Wired into `app/api/chat/route.ts` tool-result loop.
2.9	Sentry-as-product loop (SENTRY_AS_PRODUCT.md) — auto-provision per-project Sentry, bake into scaffolds, expose error feed to AI as MCP tools, auto-surface unresolved errors at chat-turn start	AI	✓ done 2026-05-01	All 4 stages shipped: (1) `lib/integrations/sentry.ts` provisions per-project Sentry under shared `vibnai` org from `POST /api/projects/create` and lazily on `apps.create`; injects `NEXT_PUBLIC_SENTRY_DSN` + `SENTRY_AUTH_TOKEN` into Coolify app env. (2) `lib/scaffold/sentry-snippets.ts` ships canonical Next.js + Vite snippets; AI system prompt instructs it to wire Sentry on every new app; `projects.get` returns `sentry: {slug, dsn}`. (3) Three MCP tools: `project_recent_errors`, `project_error_detail`, `project_error_resolve` (tenant-safe). (4) `app/api/chat/route.ts` injects `[PROJECT HEALTH]` block at chat-turn start when ≥2-occurrence unresolved issues exist in last 6h. End-to-end verification deferred to smoke test (4.1).

Definition of done: force-fail a route in staging → Sentry alert lands in < 1 min. Force-fail a Coolify deploy → notification fires. Reproduce an orphan-container conflict in prod → model calls apps_unstick instead of delete-and-recreate.

Phase 3 — UX surfaces (what users actually touch)

Goal: every screen a beta tester lands on either does something useful or gets out of the way. No screens that exist "to teach the data model".

#	Task	Owner	Effort	Notes
3.1	Hosting tab rewrite — focus on the domain (live URL, redeploy, env, logs) instead of master-detail of "live + previews"	AI	4 hrs	Mark flagged earlier
3.2	Replace the chat's "⚠️ Failed to get response. Please try again." with structured errors that show what tool failed and why	AI	2 hrs	Critical — currently zero feedback
3.3	Empty states across Plan/Product/Infrastructure/Hosting that suggest the next AI prompt to try (not just "nothing here")	AI	2 hrs	Vibe coders need a nudge
3.4	Project header URL chips: collapse to a "+N" pill when there are >3 endpoints	AI	✓ done 2026-05-01	`components/project/project-header-urls.tsx`: bumped MAX_VISIBLE to 3, replaced title-tooltip with click-to-open popover (closes on outside-click + Escape). Each row in the popover is a real clickable link with icon + label + host.
3.5	Status pill: tooltip should link directly to Coolify build logs	AI	✓ done 2026-05-01	`components/project/project-stage-pill.tsx`: "Logs" affordance now appears on `deploying`, `down`, and `build_failed` (not just failures). Deep-links to `<COOLIFY_URL>/project/<coolifyProjectUuid>` — one click from build logs. (Direct deployment-uuid link blocked on extending anatomy to surface deployment UUIDs; tracked but low priority.)
3.6	Product tab: confirm it's actually useful day-to-day. Revise scope if not	Mark + AI	1 hr	Open question
3.7	Scope-doc upload in Plan tab — drop a PDF/.md/.docx/.txt as the project brief; server extracts text, stores on `fs_projects.brief_text` + `brief_meta`, exposes via `[PROJECT BRIEF]` block in system prompt and a `project_brief` MCP tool for on-demand grep. New file: `lib/integrations/brief-extract.ts`. Empty state replaces "nothing here" on Plan.	AI	3 hrs	Came up during smoke test prep — users will arrive with scope docs (PDF/Notion-export/Doc); right now there's no way to hand the AI the source of truth except paste-into-chat.
3.8	"Stop at something tangible" — three layers	AI	partially done	Came up watching Manifest scaffold — AI stopped at "everything is wired together" with no preview, leaving the user to wonder if any of it was real. Code on disk is invisible; preview URL is the proof.
3.8a	System-prompt rule: dedicated "Stop at something the user can see" section + tightened build-me-X recipe so `previewUrl` is the explicit stopping point	AI	✓ done 2026-05-04	`app/api/chat/route.ts` `buildSystemPrompt`. For multi-service stacks, instructs AI to start the user-facing service first even if other services aren't done.
3.8b	~~Persistent quick-action chips above the chat input~~ REVERTED 2026-05-04	AI	reverted	Tried it; pulled it. The chip menu was prescriptive ("here's what to type") which conflicts with the principle that the AI should drive toward the goal without presenting the user a menu of homework. Welcome-screen suggested prompts kept (different context — empty conversation, user genuinely needs a starting nudge). The `sendMessage(override)` refactor + welcome-screen auto-send shipped from this work survived; only the composer chip row was removed.
3.8c	Server-side enforcement: if a turn called `fs_write` ≥10 times for source files but never `dev_server_start` or `apps_deploy`, append a synthetic recovery instruction telling the model to either start a server or explain the blocker	AI	1 hr	Safety net for when the model ignores the prompt rule under load. Add a tracker in `app/api/chat/route.ts` tool loop, fire the instruction inside the round 2 system message.

Definition of done: a stranger lands on every tab in turn. None of them make us cringe. Each one either shows useful info or gives the user a concrete next action.

Phase 4 — Onboarding & safety

Goal: a stranger with the invite link can get from "what is this" to "I shipped a thing" without us in the chat.

#	Task	Owner	Effort	Notes
4.1	End-to-end smoke test on a fresh account: signup → workspace → project → first chat → first preview → first deploy	Mark + AI	2 hrs	Walk through with an empty cookie jar; fix everything broken. Runbook below.
4.2	Landing page at `vibnai.com` that explains the product in 30s	Mark + AI	4 hrs	Currently a login screen
4.3	"Delete project" UI in project settings (and underlying Coolify cleanup)	AI	✓ done 2026-05-04	`app/api/projects/delete/route.ts` now cascades: stops + deletes the dev container service (with volumes + docker-cleanup), deletes every linked Coolify resource via `fs_project_resources`, deletes the per-project Coolify project shell when no other Vibn project shares it, drops `fs_project_dev_containers` + `fs_project_resources` rows, unlinks `fs_sessions`, then deletes `fs_projects`. Gitea repo + Sentry project are deliberately preserved (returned in the response so the user can recover code/error history). Failure inside cascade is logged but doesn't abort; partial failure leaves the orphan in Coolify for manual cleanup, which is strictly better than rolling back to a half-state. Smoke test 2026-05-04 found 2 ghost containers from previously-deleted projects consuming the user's full quota; cleaned up manually + shipped this fix to prevent recurrence.
4.4	"Delete workspace" UI — same	AI	1 hr
4.5	Auth hardening pass: NextAuth session expiry, CSRF on mutating routes, GitHub OAuth scope review	AI	2 hrs
4.6	Per-workspace compute quota: max N Coolify projects, max N dev containers, soft cap with friendly error	AI	✓ done 2026-05-01	`lib/quotas.ts`: 3 active projects + 3 active dev containers per workspace (suspended containers don't count). Overridable via `VIBN_QUOTA_MAX_PROJECTS_PER_WORKSPACE` / `VIBN_QUOTA_MAX_DEV_CONTAINERS_PER_WORKSPACE` env. Hits return HTTP 402 with structured payload; AI's error-recovery middleware has a `workspace-quota-exceeded` rule that explains the cap to the user without blind retries. Wired into `POST /api/projects/create` and `lib/dev-container.ts` ensure/resume paths.
4.7	Per-workspace audit log of mutating MCP calls (apps/databases/services create/delete)	AI	2 hrs	We need this when something goes wrong
4.8	Invite link / waitlist page (manual approval) so we control who joins	Mark + AI	1 hr

Definition of done: Mark hands the invite link to one non-developer friend, they get to "shipped a thing" without messaging Mark for help.

Phase 5 — Path B closeout

Goal: finish the architectural commitments in AI_PATH_B_EXECUTION_PLAN.md that aren't covered above.

#	Task	Owner	Effort	Notes
5.1	Build `ghcr.io/vibnai/vibn-dev:latest` on the live Coolify host (`ssh + setup-on-coolify.sh`)	AI	✓ done 2026-05-01	Image `vibn-dev:latest` built 2026-04-30 on Coolify host (589 MB, last Dockerfile change Apr 28 so build is current). Smoke-tested as `vibn` user: ripgrep, git, mise all functional. Toolchains install on demand via mise.
5.2	Hard-remove `gitea_file_*` from the AI tool list; keep REST routes alive 30 days with deprecation header	AI	1 hr	Path B week 3 task
5.3	Update `AI_CAPABILITIES.md` to reflect everything that shipped	AI	1 hr
5.4	Eval harness: 10 reference prompts, measure time-to-first-preview, time-to-shipped, tool-call count, success rate	AI	1–2 days	The actual proof Path B works
5.5	Theia / openvscode-server toggle: "Open IDE" button in chat → `https://ide-{ws}-{project}.vibnai.com`	AI	4 hrs	Week 4 nice-to-have; gates the "user becomes developer" graduation
5.6	Idle-suspend cron — wire `POST /api/admin/path-b/idle-sweep` to a 5-min schedule once we trust it	AI	30 min	Keeps cost bounded
5.7	Persistent dev container ↔ Gitea wiring — auto-clone repo into `/workspace/<slug>/` on first chat turn; auto-commit + push at end of every turn so AI work surfaces in the Product tab without manual `gitea_*` calls	AI	✓ done 2026-05-04	`lib/dev-container-git.ts` (`ensureProjectRepoCloned`, `commitAndPushIfDirty`) wired into `app/api/chat/route.ts` pre-loop + turn-end. Tri-state probe (`git` / `dir` / `absent`) so projects with files-but-no-git auto-heal on next turn. Production fix shipped today: `GITEA_USERNAME` was missing from prod env so `isGiteaConfigured()` silently no-op'd; added the env value AND a defensive fallback to `GITEA_ADMIN_USER` in code. Backfilled `vibn-mark/manifest` repo manually from the dev container after the env fix. Smoke-tested by inspecting `/workspace/manifest/` over SSH bridge — 64 tracked files pushed, all 6 phase directories present.

Definition of done: eval harness reports ≥3× speedup on time-to-first-preview vs. Path A baseline, ≥80% success rate across the 10 reference prompts.

Phase 6 — Artifact-first UX (Bolt parity, Vibn taste)

Goal: the running app is the dominant surface on every project page, not a thing-to-summon. The user should never have to wonder if the AI actually built something — it's right there. Lift the structural patterns Bolt/Lovable/v0 have proven (two-pane, preview chrome, plan- as-checklist) without the dark glassy aesthetic.

Why now (after smoke test prep): today the AI can scaffold a full six-service stack and a non-technical founder still has no idea where "their app" is. The composer chip + header Preview button (3.8) helped, but the structural fix is two-pane.

#	Task	Owner	Effort	Model	Notes
6.A1	Two-pane structural refactor — replace `app/[workspace]/project/[projectId]/(home)/layout.tsx` shell. Left side: workspace sidebar (unchanged) → vertical icon rail (👁 / ⟨⟩ / 🗄 / ⚙) → permanent chat column ~380px. Right side: pure content pane keyed off icon rail selection. Default view = Preview. Coordinates with existing `ProjectStagePill`, `ProjectAssociationPrompt`, `--chat-panel-width` CSS var (now obsolete on project pages, kept elsewhere).	AI	3 hrs	opus	Cross-cutting; touches layout, chat panel, header. Has to keep `ProjectHeaderUrls` working.
6.A2	PreviewPane component — iframe of `previewUrl		fqdn`, with empty-state placeholder ("Your preview will appear here" + Vibn V mark). Reads from` useAnatomy()`. Exports` kind: "preview"` rendered by 6.A1's right pane.	AI	1 hr
6.A3	Iframe chrome (artifact-local) — top-right of the iframe: ↻ reload (force iframe `key` bump), ↗ open-in-new-tab, ⛶ fullscreen (toggles a panel-level `expanded` state that hides chat column), 📱 device-frame (desktop / tablet / mobile widths, persisted to localStorage per-project).	AI	1 hr	cheap	Pure UI. Width tokens: desktop=100%, tablet=820px, mobile=390px. Frame is a CSS wrapper, not real device emulation.
6.A4	Code view — when ⟨⟩ is selected on the rail, render the existing `gitea-file-tree.tsx` + `gitea-file-viewer.tsx` in the right pane. Two-column inside the right pane: tree on the left, viewer on the right.	AI	30 min	cheap	Components already exist; just compose them. Shared `selectedPath` state.
6.A5	Resources view — when 🗄 is selected, render the existing `database-table-tree.tsx` + `table-viewer.tsx` plus a small list of running services (from `anatomy.hosting.live[]` and `apps_containers_list`).	AI	1 hr	cheap	Same wire-up pattern as 6.A4.
6.B1	Persist last-known dev server config — new table `fs_project_dev_servers (project_id PK, command, port, framework, last_started_at, status)`. Hook `dev_server_start` MCP tool to upsert on success; `dev_server_stop` to flip status.	AI	1 hr	opus-spec → cheap	Spec: schema is `project_id UUID PK, command TEXT NOT NULL, port INT NOT NULL, framework TEXT, last_started_at TIMESTAMPTZ, status TEXT CHECK IN ('running','stopped','crashed')`. Migration in `lib/db-postgres-migrations.ts` pattern. Upsert in `lib/dev-server-manager.ts` (or wherever `dev_server_start` lives — find via `Grep`).
6.B2	Auto-resume dev server on project page mount — server-render hook on the new layout: if (a) saved server config exists AND (b) `getDevContainerStatus()` returns `running` or `provisioning` AND (c) no live preview already in `useAnatomy().hosting.previews[]` → fire the saved `dev_server_start` server-side BEFORE the page paints. User lands; preview is live.	AI	2 hrs	opus	Risky if naive — could resume a server the user explicitly stopped, could thrash on idle-suspended containers, could race the existing on-mount `devcontainer_ensure`. Needs careful state-machine read. Idempotency comes from `dev_server_start` returning `alreadyRunning: true` when a process matches command+port.
6.C1	SSE `plan` event protocol — server emits `{ type: "plan", taskId, text, status: "queued"\|"in_progress"\|"done" }` whenever `plan_task_add` / `plan_task_complete` (or a new `plan_task_start`) MCP tool fires inside a chat turn. Coexists with existing `text` and `toolCall` events.	AI	2 hrs	opus	Protocol design — has to handle ordering (plan event must land before the tool's `toolResult`), client-side reconciliation with `fs_projects.plan.tasks[]` on next page load (server is source of truth, SSE is a hot stream), and the case where the AI calls `plan_task_complete` for a task added in a prior turn.
6.C2	Client TimelineEntry of `kind: "plan"` — render a checklist with status circles (○ queued / ◐ in-progress / ● done) inside the assistant message timeline. Each new `plan` SSE event upserts by `taskId`. Ledger pattern matches the existing `kind: "text"` / `kind: "tool"` rendering in `chat-panel.tsx`.	AI	1.5 hrs	opus-spec → cheap	Spec written into 6.C1's notes. Visual: indented under a "Plan" mini-header, same Outfit/Newsreader palette, status circles in `#a09a90` → `#3a3530` → `#1a1a1a`.
6.C3	Share + Publish buttons on the new shell — top-right of the right pane (next to artifact chrome). Share = copy `previewUrl \|\| fqdn`. Publish = fire existing `ship` MCP tool with auto-generated commit message.	AI	30 min	cheap	Both are existing tool calls; just buttons.
6.D1	⚙ Settings popover — single popover off the icon rail's ⚙. Sections: Domain (shows current `fqdn`, link to rename), Sentry (link to project's Sentry dashboard from `projects.get`), Secrets (links to 6.D2), Quick-add database (fires 6.D3 modal), All project settings (link to `/[workspace]/project/[id]/settings`).	AI	1 hr	cheap	Pattern matches existing `project-header-urls` popover.
6.D2	Project-level secret scratchpad — new table `fs_project_secrets (project_id, key, value_encrypted, created_at, updated_at)`. Encryption-at-rest via existing `lib/crypto.ts` (or scaffold one if absent — use AES-GCM with a workspace-scoped key derived from a master `VIBN_SECRETS_KEY` env). MCP tools: `secrets_get { projectId, key }`, `secrets_set { projectId, key, value }`, `secrets_list { projectId }` (returns keys only). AI can read/write user-supplied API keys before they're injected into a deploy.	AI	2 hrs	opus	Security-sensitive: encryption scheme, key rotation story, tenant isolation, what gets logged. Needs careful handling, not pattern-matching.
6.D3	Quick-add database modal — fires `databases_create` MCP tool, blocks until `databases_get` returns a connection URL, surfaces the URL with a copy button + "I'll inject this into your app's env" affordance that calls `apps_envs_set` if a target app exists.	AI	1 hr	cheap	Each step is an existing MCP call; modal orchestrates them.

Definition of done: open any project that's been built, the running preview is already live in the right pane without clicking anything; clicking ⟨⟩ shows the source tree; clicking 🗄 shows the databases; the AI's plan streams in as a checklist instead of paragraphs; ⚙ opens a single popover with all project config one click away.

Sequencing inside Phase 6:

6.A1 (structural) ──> 6.A2..A5 (panels)  ──┐
                                            ├── 6.D1 (settings popover)
6.B1 (persist) ──> 6.B2 (auto-resume)  ──┘
6.C1 (SSE)   ──> 6.C2 (client renderer) ──> 6.C3 (share/publish)
                                          └─ 6.D2 (secrets) ── 6.D3 (db modal)

A-track is the user-visible spine. B-track makes the spine populated on first paint. C-track makes the AI feel like it's working. D-track fills the settings drawer.

Out of scope for Phase 6 (intentional cuts, captured here so they don't get pulled in):

Built-app authentication scaffolding (auth-as-a-service for users' apps) — multi-week, real product call.
First-party connectors marketplace (Stripe / Twilio "click to add") — multi-week per integration. AI can install via shell today.
Multi-model picker / "Plan vs. Build" toggle on the composer — defer until BYOK lands and we have something to switch between.
Design-system picker on the composer — real product call.
Knowledge-base / RAG beyond scope-doc upload (3.7).
Server functions à la Bolt — different deploy model, not a gap.
Mobile preview QR — only matters once mobile is a real target.

Sequencing & dependencies

P1.1 → P1.2 → P1.3 → P1.4 → P1.5 → P1.6 → P1.7 → P1.8 ──┐
                                                          │
P2.1, P2.2, P2.3 (parallel)                              │
P2.4, P2.5, P2.6, P2.7 (parallel, low priority)          │
                                                          ├─ P3 (parallel internally)
                                                          │
                                                          ├─ P4.1 (depends on P3 being not-cringe)
                                                          ├─ P4.2 (parallel)
                                                          ├─ P4.3..4.8 (parallel)
                                                          │
                                                          ├─ P5 (parallel; some pieces gated by P1)
                                                          │
                                                          └─ P6 (gated by P1 + P5.7;
                                                                 6.A1 unblocks the rest)

P1 is the long pole. P6 is the next big-leverage move once smoke test passes — pre-invite UX upgrade, depends on previews (P1) and the auto-clone wiring (5.7) both being solid.

Suggested cadence

Today (in flight): P1.1 — Cloudflare signup + record verification.
Tonight / tomorrow: P1.2–P1.8 once nameservers propagate. AI does the cert + Traefik wiring; Mark does the clicks at Cloudflare/Namecheap.
Day 2: P2.1–P2.3 (runtime error chase + Sentry) + P3.1 (Hosting rewrite) in parallel.
Day 3: P3.2–P3.6 + P4.1 smoke test.
Day 4: P4.2 landing page + P4.3–P4.5 deletion/auth.
Day 5: P4.6–P4.8 quotas/audit/invite + P5.1 vibn-dev image.
Days 6–10: P5.2–P5.6 closeout, eval harness, polish, then invite first testers.

10 working days from today to "first 5 testers". Tight but doable if no nasty discoveries in P2.

What we are not doing for beta

Logged so we don't accidentally pull them in:

Stripe / billing (post-beta — we want to know what to charge for first)
Mobile-responsive polish (desktop-first beta)
Multi-region Coolify (single-host is fine for <50 users)
Replacing Coolify (out of scope; Path B is the abstraction over it)
Replacing Gitea (Path B's shell.exec already abstracts most of it)
Plugin marketplace, template marketplace, monetization paths
Anything requiring us to redo NextAuth / migrate to a different auth
Theme system / dark mode

Risks specific to this plan

Risk	Mitigation
Cloudflare DNS propagation breaks email forwarding	We pre-verified MX records in the audit; double-check at Cloudflare review screen before switching nameservers
Traefik wildcard cert acquisition fails on first try	DNS-01 against Cloudflare is well-trodden; if it fails it's fixable, not catastrophic. Old certs keep serving until replaced.
Runtime errors in P2 turn out to be a deeper architectural issue	Time-box investigation to 4 hrs each; if not solved, document workaround and ship anyway, debug after invite
Eval harness reveals Path B is slower than promised	Acceptable to invite testers without 100% Path B coverage as long as the prod-deploy-only path works. Path B is an upgrade, not a gate.
New users hit 100 unforeseen edge cases	This is the point of beta. Triage daily, fix the top-3 each morning.

Smoke-test runbook (4.1)

Goal: prove the user-visible flow from "first visit" through "shipped a deployed app" works end-to-end with all the new wiring (Sentry per-project, quotas, recovery middleware, URL chip popover, status-pill deep-link, deploy-failed Slack alerts).

Setup: open an incognito window. Have your Slack channel and Sentry dashboard visible in side tabs. You'll be the fresh user.

Steps

Visit https://vibnai.com → sign up with Google (use a different gmail than your normal one if possible — keeps test data clean). Confirm you land on the workspace home.
Create a project (any path: build / oss / import). Pick a slug like smoke-test-2026-05-01.
- Verify in Sentry: within ~10s, a new project named vibn-{your-workspace-slug}-smoke-test-2026-05-01 should appear at https://vibnai.sentry.io/projects/.
- Verify in DB (optional): fs_projects.data.sentry.dsn is populated for the new row.
Land in chat. AI should greet you and offer to scaffold something. Ask it to build something simple ("a Next.js todo app").
Watch the preview start. AI should call devcontainer_ensure, scaffold, then dev_server_start. A preview URL like preview-0-{slug}-{token}.preview.vibnai.com should be returned. Click it. Page should load over HTTPS with a valid cert.
Edit something via chat. Ask AI to add a button or change copy. HMR should update the preview without reload.
Ship it. Tell AI "ship it." It should apps_create against your Gitea repo + trigger Coolify deploy. Watch the project header status pill go Empty → Deploying → Live.
- Verify in Coolify env: the new app's env vars include NEXT_PUBLIC_SENTRY_DSN and SENTRY_AUTH_TOKEN.
- Verify Slack: if the deploy fails for any reason, your Slack channel pings within 30s. If it succeeds, no message (by design — we're noise-conscious).
Trigger a real error in the deployed app. Open the live URL, click around until something breaks. (If nothing breaks, ask AI to add a button that calls myUndefinedFunction().)
- Verify in Sentry: the error lands in the new Sentry project within ~10s, with a real stack trace (file/line in your project's source). Session Replay should be available.
- Open a new chat with this project and say "what's broken?" → AI should call project_recent_errors and surface the issue with a fix suggestion. This is the killer-feature path.
Hit the quota cap. Try to create a 4th project. Should get a friendly 402 with the "delete one or contact support" wording, NOT a generic error. AI in chat should explain the cap clearly without retrying.
Test the URL chip popover. Once you have ≥4 URLs on a project (e.g. preview + live + 2 services), the project header should collapse to 3 chips + a +N pill. Click it; popover opens with the rest as clickable links. Click outside; popover closes. Press Escape; closes.
Test the status-pill Logs link. During a deploy, the "Logs" link next to the pill should one-click into the Coolify project page (not the root).

What to do when something breaks

Take a screenshot, open a Vibn chat in a separate (parent-account) tab, paste the screenshot, and say "this just broke during smoke test." AI now has Sentry access + can read recent errors itself.
If a step is very broken, file a P0 against this checklist with the step number and what you saw.

Pass criteria

All 10 steps complete with no manual intervention by the AI's parent operator.
Every "Verify" line returns the expected positive signal.
Worst case the AI surfaces is a quota cap or known-recoverable error — never a generic "something went wrong."

How to use this doc

Treat phase boundaries as soft. If a P2 task unblocks a P3 task and you're there, do it.
When a task ships, check it off and move it under "Shipped" in AI_CAPABILITIES.md.
When the plan changes (it will), edit this doc directly, don't fork it.
Beta success criteria: 5 testers, all reach "shipped a thing", weekly active rate >60% in week 2. If we miss those, the next plan is "what did we get wrong."

34 KiB Raw Blame History Unescape Escape