22 KiB
Beta Launch Execution Plan
The path from "shipping to ourselves" to "5–10 friendly testers can use Vibn end-to-end without us hand-holding."
Companion to:
AI_PATH_B_EXECUTION_PLAN.md(architecture) andAI_CAPABILITIES.md(current state).Drafted: 2026-04-30. Owner: Mark + AI.
Scope: Everything we agreed in the 2026-04-30 review that's NOT already shipped. Pulls in the unfinished items from Path B (DNS, cert, previews, eval) AND the "before strangers see this" gaps that Path B doesn't cover (runtime errors, error surfaces, onboarding smoke test, landing page, safety rails).
North star for the beta
A non-technical founder receives a Vibn invite link, signs up, describes what they want to build, sees a working preview within a few minutes, can iterate on it through chat without seeing a stack trace, and can ship it to a real domain — all without us reaching into Coolify on their behalf.
If any of those steps requires us in the loop, beta isn't ready.
Phase ordering
Sequenced by leverage × blocking risk. Earlier phases unblock later ones.
P1 Previews unlock ── enables fast-iteration UX & demos ──┐
P2 Stability & visibility ── stops silent rot ─────────────────┤
P3 UX surfaces ── what the user actually touches ─────┼─── INVITE
P4 Onboarding & safety ── what a stranger needs day 1 ────────┤
P5 Path B closeout ── ship the architectural commitments ─┘
Phase 1 — Previews unlock — SHIPPED 2026-05-01
Goal: dev_server.start returns a clickable https://*.preview.vibnai.com
URL that loads in <30s, with HMR working over the proxy.
Why first: the single biggest UX cliff today is "user iterates → 3-7 min Coolify build". Previews collapse it to seconds. Everything else is polish on a slow loop until this lands.
| # | Task | Owner | Effort | Status |
|---|---|---|---|---|
| 1.1 | Sign up for Cloudflare; add vibnai.com; verify imported records (MX, SPF, wildcard A, apex A) |
Mark | 15 min | ✓ done |
| 1.2 | Switch Namecheap nameservers to Cloudflare-assigned NS pair | Mark | 2 min | ✓ done |
| 1.3 | Wait for propagation; verify dig @1.1.1.1 from multiple resolvers |
AI | 30–120 min | ✓ done — 34.19.250.135 from CF + Google resolvers |
| 1.4 | Generate Cloudflare API token (DNS edit, vibnai.com only) |
Mark | 2 min | ✓ done — stored in .coolify.env |
| 1.5 | Configure Traefik Let's Encrypt DNS-01 with the Cloudflare token | AI | 20 min | ✓ done — letsencrypt-dns resolver wired in coolify-proxy |
| 1.6 | Test wildcard cert issues for *.preview.vibnai.com (curl, browser) |
AI | 10 min | ✓ done — both *.vibnai.com and *.preview.vibnai.com certs issued; curl https://test.preview.vibnai.com returns valid LE cert |
| 1.7 | Wire dev_server.start to mint Traefik labels with the wildcard host |
AI | 1 hr | ✓ done — pre-baked labels for ports 3000–3009 in vibn-dev compose; YAML escape bug fixed; cert resolver fixed to letsencrypt-dns |
| 1.8 | Spike: WebSocket / Vite HMR through Traefik against vibn-dev container |
AI | 30 min | ✓ done — 101 Switching Protocols, vite-hmr subprotocol negotiated, js-update messages fire within ~1s of file edit. See verified config below. |
Definition of done: ✅ AI says "open a Vite dev server", user clicks the URL,
sees Vite's welcome page, edits a file via fs.edit, change appears in
browser within 5s without manual reload.
Verified Vite config for HMR through Traefik (the system prompt should advertise this exact shape when scaffolding Vite projects):
server: {
host: '0.0.0.0',
port: 3001, // any 3000–3009
strictPort: true,
hmr: {
clientPort: 443,
protocol: 'wss',
host: 'preview-{slot}-{slug}-{token}.preview.vibnai.com',
},
}
Phase 2 — Stability & visibility
Goal: when something breaks in production, we hear about it before users do.
| # | Task | Owner | Effort | Notes |
|---|---|---|---|---|
| 2.1 | Reproduce + diagnose ERR_HTTP_HEADERS_SENT from prod logs |
AI | 1–2 hrs | Likely a server action / API route returning twice |
| 2.2 | Reproduce + diagnose TypeError: reading 'z'/'j'/'aa' in prod bundle |
AI | 1–2 hrs | Minified prod error; suspect react-markdown server/client boundary |
| 2.3 | Wire Sentry (or alternative) for both client + server runtime errors | AI | ✓ done 2026-05-01 | @sentry/nextjs v10 wired in vibn-frontend. instrumentation.ts (server+edge), instrumentation-client.ts (browser w/ Session Replay free tier, all text masked), app/global-error.tsx, next.config.ts wrapped with withSentryConfig. NEXT_PUBLIC_SENTRY_DSN and SENTRY_AUTH_TOKEN in Coolify env, with matching ARG lines in vibn-frontend/Dockerfile. End-to-end verified via /sentry-example-page 2026-05-01: client + server errors capture, breadcrumbs work, stack traces de-minify to real filenames (app/sentry-example-page/page.tsx:49). |
| 2.4 | Wire deployment-failed Coolify webhook → Slack/email | AI | ✓ done 2026-05-01 | Slack webhook wired into slack_notification_settings for both Coolify teams. Defaults: failure events on (deploy, backup, scheduled task, docker cleanup, server unreachable, disk usage), success events off. Tested with a manual webhook ping — confirmed in user's Slack. |
| 2.5 | Tighten Coolify docker prune to every 6 hrs (vs daily) | AI | ✓ done 2026-05-01 | Already configured: both servers use docker_cleanup_frequency: "0 */6 * * *" with force_docker_cleanup: true. Verified via /api/v1/servers. |
| 2.6 | Bake HEALTHCHECK 127.0.0.1 into vibn-frontend/Dockerfile so future apps inherit |
AI | ✓ done 2026-05-01 | Already in vibn-frontend/Dockerfile:67-68; comment explains the IPv6 trap |
| 2.7 | Audit other Dockerfile-based apps for the same localhost/IPv6 trap |
AI | ✓ done 2026-05-01 | Audited vibn-dev/Dockerfile and vibn-agent-runner/Dockerfile — neither defines a HEALTHCHECK, so neither can hit the localhost/IPv6 trap. No action needed today; revisit when either gets a healthcheck added. |
| 2.8 | Tool-error recovery middleware (AI_HARNESS_GAPS.md §1) — pattern-match known-recoverable tool errors and inject synthetic instructions before the model's next round | AI | ✓ done 2026-05-01 | vibn-frontend/lib/ai/error-recovery.ts. Initial rules: orphan container conflict, image pull denied, port allocated. Wired into app/api/chat/route.ts tool-result loop. |
| 2.9 | Sentry-as-product loop (SENTRY_AS_PRODUCT.md) — auto-provision per-project Sentry, bake into scaffolds, expose error feed to AI as MCP tools, auto-surface unresolved errors at chat-turn start | AI | ✓ done 2026-05-01 | All 4 stages shipped: (1) lib/integrations/sentry.ts provisions per-project Sentry under shared vibnai org from POST /api/projects/create and lazily on apps.create; injects NEXT_PUBLIC_SENTRY_DSN + SENTRY_AUTH_TOKEN into Coolify app env. (2) lib/scaffold/sentry-snippets.ts ships canonical Next.js + Vite snippets; AI system prompt instructs it to wire Sentry on every new app; projects.get returns sentry: {slug, dsn}. (3) Three MCP tools: project_recent_errors, project_error_detail, project_error_resolve (tenant-safe). (4) app/api/chat/route.ts injects [PROJECT HEALTH] block at chat-turn start when ≥2-occurrence unresolved issues exist in last 6h. End-to-end verification deferred to smoke test (4.1). |
Definition of done: force-fail a route in staging → Sentry alert lands in
< 1 min. Force-fail a Coolify deploy → notification fires. Reproduce an
orphan-container conflict in prod → model calls apps_unstick instead of
delete-and-recreate.
Phase 3 — UX surfaces (what users actually touch)
Goal: every screen a beta tester lands on either does something useful or gets out of the way. No screens that exist "to teach the data model".
| # | Task | Owner | Effort | Notes |
|---|---|---|---|---|
| 3.1 | Hosting tab rewrite — focus on the domain (live URL, redeploy, env, logs) instead of master-detail of "live + previews" | AI | 4 hrs | Mark flagged earlier |
| 3.2 | Replace the chat's "⚠️ Failed to get response. Please try again." with structured errors that show what tool failed and why | AI | 2 hrs | Critical — currently zero feedback |
| 3.3 | Empty states across Plan/Product/Infrastructure/Hosting that suggest the next AI prompt to try (not just "nothing here") | AI | 2 hrs | Vibe coders need a nudge |
| 3.4 | Project header URL chips: collapse to a "+N" pill when there are >3 endpoints | AI | ✓ done 2026-05-01 | components/project/project-header-urls.tsx: bumped MAX_VISIBLE to 3, replaced title-tooltip with click-to-open popover (closes on outside-click + Escape). Each row in the popover is a real clickable link with icon + label + host. |
| 3.5 | Status pill: tooltip should link directly to Coolify build logs | AI | ✓ done 2026-05-01 | components/project/project-stage-pill.tsx: "Logs" affordance now appears on deploying, down, and build_failed (not just failures). Deep-links to <COOLIFY_URL>/project/<coolifyProjectUuid> — one click from build logs. (Direct deployment-uuid link blocked on extending anatomy to surface deployment UUIDs; tracked but low priority.) |
| 3.6 | Product tab: confirm it's actually useful day-to-day. Revise scope if not | Mark + AI | 1 hr | Open question |
| 3.7 | Scope-doc upload in Plan tab — drop a PDF/.md/.docx/.txt as the project brief; server extracts text, stores on fs_projects.brief_text + brief_meta, exposes via [PROJECT BRIEF] block in system prompt and a project_brief MCP tool for on-demand grep. New file: lib/integrations/brief-extract.ts. Empty state replaces "nothing here" on Plan. |
AI | 3 hrs | Came up during smoke test prep — users will arrive with scope docs (PDF/Notion-export/Doc); right now there's no way to hand the AI the source of truth except paste-into-chat. |
Definition of done: a stranger lands on every tab in turn. None of them make us cringe. Each one either shows useful info or gives the user a concrete next action.
Phase 4 — Onboarding & safety
Goal: a stranger with the invite link can get from "what is this" to "I shipped a thing" without us in the chat.
| # | Task | Owner | Effort | Notes |
|---|---|---|---|---|
| 4.1 | End-to-end smoke test on a fresh account: signup → workspace → project → first chat → first preview → first deploy | Mark + AI | 2 hrs | Walk through with an empty cookie jar; fix everything broken. Runbook below. |
| 4.2 | Landing page at vibnai.com that explains the product in 30s |
Mark + AI | 4 hrs | Currently a login screen |
| 4.3 | "Delete project" UI in project settings (and underlying Coolify cleanup) | AI | ✓ done 2026-05-04 | app/api/projects/delete/route.ts now cascades: stops + deletes the dev container service (with volumes + docker-cleanup), deletes every linked Coolify resource via fs_project_resources, deletes the per-project Coolify project shell when no other Vibn project shares it, drops fs_project_dev_containers + fs_project_resources rows, unlinks fs_sessions, then deletes fs_projects. Gitea repo + Sentry project are deliberately preserved (returned in the response so the user can recover code/error history). Failure inside cascade is logged but doesn't abort; partial failure leaves the orphan in Coolify for manual cleanup, which is strictly better than rolling back to a half-state. Smoke test 2026-05-04 found 2 ghost containers from previously-deleted projects consuming the user's full quota; cleaned up manually + shipped this fix to prevent recurrence. |
| 4.4 | "Delete workspace" UI — same | AI | 1 hr | |
| 4.5 | Auth hardening pass: NextAuth session expiry, CSRF on mutating routes, GitHub OAuth scope review | AI | 2 hrs | |
| 4.6 | Per-workspace compute quota: max N Coolify projects, max N dev containers, soft cap with friendly error | AI | ✓ done 2026-05-01 | lib/quotas.ts: 3 active projects + 3 active dev containers per workspace (suspended containers don't count). Overridable via VIBN_QUOTA_MAX_PROJECTS_PER_WORKSPACE / VIBN_QUOTA_MAX_DEV_CONTAINERS_PER_WORKSPACE env. Hits return HTTP 402 with structured payload; AI's error-recovery middleware has a workspace-quota-exceeded rule that explains the cap to the user without blind retries. Wired into POST /api/projects/create and lib/dev-container.ts ensure/resume paths. |
| 4.7 | Per-workspace audit log of mutating MCP calls (apps/databases/services create/delete) | AI | 2 hrs | We need this when something goes wrong |
| 4.8 | Invite link / waitlist page (manual approval) so we control who joins | Mark + AI | 1 hr |
Definition of done: Mark hands the invite link to one non-developer friend, they get to "shipped a thing" without messaging Mark for help.
Phase 5 — Path B closeout
Goal: finish the architectural commitments in AI_PATH_B_EXECUTION_PLAN.md
that aren't covered above.
| # | Task | Owner | Effort | Notes |
|---|---|---|---|---|
| 5.1 | Build ghcr.io/vibnai/vibn-dev:latest on the live Coolify host (ssh + setup-on-coolify.sh) |
AI | ✓ done 2026-05-01 | Image vibn-dev:latest built 2026-04-30 on Coolify host (589 MB, last Dockerfile change Apr 28 so build is current). Smoke-tested as vibn user: ripgrep, git, mise all functional. Toolchains install on demand via mise. |
| 5.2 | Hard-remove gitea_file_* from the AI tool list; keep REST routes alive 30 days with deprecation header |
AI | 1 hr | Path B week 3 task |
| 5.3 | Update AI_CAPABILITIES.md to reflect everything that shipped |
AI | 1 hr | |
| 5.4 | Eval harness: 10 reference prompts, measure time-to-first-preview, time-to-shipped, tool-call count, success rate | AI | 1–2 days | The actual proof Path B works |
| 5.5 | Theia / openvscode-server toggle: "Open IDE" button in chat → https://ide-{ws}-{project}.vibnai.com |
AI | 4 hrs | Week 4 nice-to-have; gates the "user becomes developer" graduation |
| 5.6 | Idle-suspend cron — wire POST /api/admin/path-b/idle-sweep to a 5-min schedule once we trust it |
AI | 30 min | Keeps cost bounded |
Definition of done: eval harness reports ≥3× speedup on time-to-first-preview vs. Path A baseline, ≥80% success rate across the 10 reference prompts.
Sequencing & dependencies
P1.1 → P1.2 → P1.3 → P1.4 → P1.5 → P1.6 → P1.7 → P1.8 ──┐
│
P2.1, P2.2, P2.3 (parallel) │
P2.4, P2.5, P2.6, P2.7 (parallel, low priority) │
├─ P3 (parallel internally)
│
├─ P4.1 (depends on P3 being not-cringe)
├─ P4.2 (parallel)
├─ P4.3..4.8 (parallel)
│
└─ P5 (parallel; some pieces gated by P1)
P1 is the long pole. Everything else can mostly proceed in parallel once P1 unblocks the iteration loop.
Suggested cadence
- Today (in flight): P1.1 — Cloudflare signup + record verification.
- Tonight / tomorrow: P1.2–P1.8 once nameservers propagate. AI does the cert + Traefik wiring; Mark does the clicks at Cloudflare/Namecheap.
- Day 2: P2.1–P2.3 (runtime error chase + Sentry) + P3.1 (Hosting rewrite) in parallel.
- Day 3: P3.2–P3.6 + P4.1 smoke test.
- Day 4: P4.2 landing page + P4.3–P4.5 deletion/auth.
- Day 5: P4.6–P4.8 quotas/audit/invite + P5.1 vibn-dev image.
- Days 6–10: P5.2–P5.6 closeout, eval harness, polish, then invite first testers.
10 working days from today to "first 5 testers". Tight but doable if no nasty discoveries in P2.
What we are not doing for beta
Logged so we don't accidentally pull them in:
- Stripe / billing (post-beta — we want to know what to charge for first)
- Mobile-responsive polish (desktop-first beta)
- Multi-region Coolify (single-host is fine for <50 users)
- Replacing Coolify (out of scope; Path B is the abstraction over it)
- Replacing Gitea (Path B's
shell.execalready abstracts most of it) - Plugin marketplace, template marketplace, monetization paths
- Anything requiring us to redo NextAuth / migrate to a different auth
- Theme system / dark mode
Risks specific to this plan
| Risk | Mitigation |
|---|---|
| Cloudflare DNS propagation breaks email forwarding | We pre-verified MX records in the audit; double-check at Cloudflare review screen before switching nameservers |
| Traefik wildcard cert acquisition fails on first try | DNS-01 against Cloudflare is well-trodden; if it fails it's fixable, not catastrophic. Old certs keep serving until replaced. |
| Runtime errors in P2 turn out to be a deeper architectural issue | Time-box investigation to 4 hrs each; if not solved, document workaround and ship anyway, debug after invite |
| Eval harness reveals Path B is slower than promised | Acceptable to invite testers without 100% Path B coverage as long as the prod-deploy-only path works. Path B is an upgrade, not a gate. |
| New users hit 100 unforeseen edge cases | This is the point of beta. Triage daily, fix the top-3 each morning. |
Smoke-test runbook (4.1)
Goal: prove the user-visible flow from "first visit" through "shipped a deployed app" works end-to-end with all the new wiring (Sentry per-project, quotas, recovery middleware, URL chip popover, status-pill deep-link, deploy-failed Slack alerts).
Setup: open an incognito window. Have your Slack channel and Sentry dashboard visible in side tabs. You'll be the fresh user.
Steps
- Visit
https://vibnai.com→ sign up with Google (use a different gmail than your normal one if possible — keeps test data clean). Confirm you land on the workspace home. - Create a project (any path: build / oss / import). Pick a slug like
smoke-test-2026-05-01.- Verify in Sentry: within ~10s, a new project named
vibn-{your-workspace-slug}-smoke-test-2026-05-01should appear at https://vibnai.sentry.io/projects/. - Verify in DB (optional):
fs_projects.data.sentry.dsnis populated for the new row.
- Verify in Sentry: within ~10s, a new project named
- Land in chat. AI should greet you and offer to scaffold something. Ask it to build something simple ("a Next.js todo app").
- Watch the preview start. AI should call
devcontainer_ensure, scaffold, thendev_server_start. A preview URL likepreview-0-{slug}-{token}.preview.vibnai.comshould be returned. Click it. Page should load over HTTPS with a valid cert. - Edit something via chat. Ask AI to add a button or change copy. HMR should update the preview without reload.
- Ship it. Tell AI "ship it." It should
apps_createagainst your Gitea repo + trigger Coolify deploy. Watch the project header status pill go Empty → Deploying → Live.- Verify in Coolify env: the new app's env vars include
NEXT_PUBLIC_SENTRY_DSNandSENTRY_AUTH_TOKEN. - Verify Slack: if the deploy fails for any reason, your Slack channel pings within 30s. If it succeeds, no message (by design — we're noise-conscious).
- Verify in Coolify env: the new app's env vars include
- Trigger a real error in the deployed app. Open the live URL, click around until something breaks. (If nothing breaks, ask AI to add a button that calls
myUndefinedFunction().)- Verify in Sentry: the error lands in the new Sentry project within ~10s, with a real stack trace (file/line in your project's source). Session Replay should be available.
- Open a new chat with this project and say "what's broken?" → AI should call
project_recent_errorsand surface the issue with a fix suggestion. This is the killer-feature path.
- Hit the quota cap. Try to create a 4th project. Should get a friendly 402 with the "delete one or contact support" wording, NOT a generic error. AI in chat should explain the cap clearly without retrying.
- Test the URL chip popover. Once you have ≥4 URLs on a project (e.g. preview + live + 2 services), the project header should collapse to 3 chips + a
+Npill. Click it; popover opens with the rest as clickable links. Click outside; popover closes. Press Escape; closes. - Test the status-pill Logs link. During a deploy, the "Logs" link next to the pill should one-click into the Coolify project page (not the root).
What to do when something breaks
- Take a screenshot, open a Vibn chat in a separate (parent-account) tab, paste the screenshot, and say "this just broke during smoke test." AI now has Sentry access + can read recent errors itself.
- If a step is very broken, file a P0 against this checklist with the step number and what you saw.
Pass criteria
- All 10 steps complete with no manual intervention by the AI's parent operator.
- Every "Verify" line returns the expected positive signal.
- Worst case the AI surfaces is a quota cap or known-recoverable error — never a generic "something went wrong."
How to use this doc
- Treat phase boundaries as soft. If a P2 task unblocks a P3 task and you're there, do it.
- When a task ships, check it off and move it under "Shipped" in
AI_CAPABILITIES.md. - When the plan changes (it will), edit this doc directly, don't fork it.
- Beta success criteria: 5 testers, all reach "shipped a thing", weekly active rate >60% in week 2. If we miss those, the next plan is "what did we get wrong."