master-ai/BETA_LAUNCH_PLAN.md

# Beta Launch Execution Plan

> The path from "shipping to ourselves" to **"5–10 friendly testers can use
> Vibn end-to-end without us hand-holding."**
>
> **Companion to:** [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md)
> (architecture) and [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (current state).
>
> **Drafted:** 2026-04-30. **Owner:** Mark + AI.
>
> **Scope:** Everything we agreed in the 2026-04-30 review that's NOT already
> shipped. Pulls in the unfinished items from Path B (DNS, cert, previews,
> eval) AND the "before strangers see this" gaps that Path B doesn't cover
> (runtime errors, error surfaces, onboarding smoke test, landing page,
> safety rails).

---

## North star for the beta

A non-technical founder receives a Vibn invite link, signs up, describes
what they want to build, sees a working preview within a few minutes, can
iterate on it through chat without seeing a stack trace, and can ship it
to a real domain — all without us reaching into Coolify on their behalf.

If any of those steps requires us in the loop, beta isn't ready.

---

## Phase ordering

Sequenced by **leverage × blocking risk**. Earlier phases unblock later ones.

```
P1 Previews unlock      ── enables fast-iteration UX & demos ──┐
P2 Stability & visibility ── stops silent rot ─────────────────┤
P3 UX surfaces          ── what the user actually touches ─────┼─── INVITE
P4 Onboarding & safety  ── what a stranger needs day 1 ────────┤
P5 Path B closeout      ── ship the architectural commitments ─┘
```

---

## Phase 1 — Previews unlock — **SHIPPED 2026-05-01**

**Goal:** `dev_server.start` returns a clickable `https://*.preview.vibnai.com`
URL that loads in <30s, with HMR working over the proxy.

**Why first:** the single biggest UX cliff today is "user iterates → 3-7 min
Coolify build". Previews collapse it to seconds. Everything else is polish on
a slow loop until this lands.

| # | Task | Owner | Effort | Status |
|---|---|---|---|---|
| 1.1 | Sign up for Cloudflare; add `vibnai.com`; verify imported records (MX, SPF, wildcard A, apex A) | Mark | 15 min | ✓ done |
| 1.2 | Switch Namecheap nameservers to Cloudflare-assigned NS pair | Mark | 2 min | ✓ done |
| 1.3 | Wait for propagation; verify `dig @1.1.1.1` from multiple resolvers | AI | 30–120 min | ✓ done — `34.19.250.135` from CF + Google resolvers |
| 1.4 | Generate Cloudflare API token (DNS edit, `vibnai.com` only) | Mark | 2 min | ✓ done — stored in `.coolify.env` |
| 1.5 | Configure Traefik Let's Encrypt DNS-01 with the Cloudflare token | AI | 20 min | ✓ done — `letsencrypt-dns` resolver wired in `coolify-proxy` |
| 1.6 | Test wildcard cert issues for `*.preview.vibnai.com` (curl, browser) | AI | 10 min | ✓ done — both `*.vibnai.com` and `*.preview.vibnai.com` certs issued; `curl https://test.preview.vibnai.com` returns valid LE cert |
| 1.7 | Wire `dev_server.start` to mint Traefik labels with the wildcard host | AI | 1 hr | ✓ done — pre-baked labels for ports 3000–3009 in `vibn-dev` compose; YAML escape bug fixed; cert resolver fixed to `letsencrypt-dns` |
| 1.8 | Spike: WebSocket / Vite HMR through Traefik against `vibn-dev` container | AI | 30 min | ✓ done — `101 Switching Protocols`, `vite-hmr` subprotocol negotiated, `js-update` messages fire within ~1s of file edit. See verified config below. |

**Definition of done:** ✅ AI says "open a Vite dev server", user clicks the URL,
sees Vite's welcome page, edits a file via `fs.edit`, change appears in
browser within 5s without manual reload.

**Verified Vite config for HMR through Traefik** (the system prompt should advertise this exact shape when scaffolding Vite projects):

```js
server: {
  host: '0.0.0.0',
  port: 3001,            // any 3000–3009
  strictPort: true,
  hmr: {
    clientPort: 443,
    protocol: 'wss',
    host: 'preview-{slot}-{slug}-{token}.preview.vibnai.com',
  },
}
```

---

## Phase 2 — Stability & visibility

**Goal:** when something breaks in production, we hear about it before users do.

| # | Task | Owner | Effort | Notes |
|---|---|---|---|---|
| 2.1 | Reproduce + diagnose `ERR_HTTP_HEADERS_SENT` from prod logs | AI | 1–2 hrs | Likely a server action / API route returning twice |
| 2.2 | Reproduce + diagnose `TypeError: reading 'z'/'j'/'aa'` in prod bundle | AI | 1–2 hrs | Minified prod error; suspect `react-markdown` server/client boundary |
| 2.3 | Wire Sentry (or alternative) for both client + server runtime errors | AI | ✓ done 2026-05-01 | `@sentry/nextjs` v10 wired in `vibn-frontend`. `instrumentation.ts` (server+edge), `instrumentation-client.ts` (browser w/ Session Replay free tier, all text masked), `app/global-error.tsx`, `next.config.ts` wrapped with `withSentryConfig`. `NEXT_PUBLIC_SENTRY_DSN` and `SENTRY_AUTH_TOKEN` in Coolify env, with matching `ARG` lines in `vibn-frontend/Dockerfile`. End-to-end verified via `/sentry-example-page` 2026-05-01: client + server errors capture, breadcrumbs work, **stack traces de-minify to real filenames** (`app/sentry-example-page/page.tsx:49`). |
| 2.4 | Wire deployment-failed Coolify webhook → Slack/email | AI | ✓ done 2026-05-01 | Slack webhook wired into `slack_notification_settings` for both Coolify teams. Defaults: failure events on (deploy, backup, scheduled task, docker cleanup, server unreachable, disk usage), success events off. Tested with a manual webhook ping — confirmed in user's Slack. |
| 2.5 | Tighten Coolify docker prune to every 6 hrs (vs daily) | AI | ✓ done 2026-05-01 | Already configured: both servers use `docker_cleanup_frequency: "0 */6 * * *"` with `force_docker_cleanup: true`. Verified via `/api/v1/servers`. |
| 2.6 | Bake `HEALTHCHECK 127.0.0.1` into `vibn-frontend/Dockerfile` so future apps inherit | AI | ✓ done 2026-05-01 | Already in `vibn-frontend/Dockerfile:67-68`; comment explains the IPv6 trap |
| 2.7 | Audit other Dockerfile-based apps for the same `localhost`/IPv6 trap | AI | ✓ done 2026-05-01 | Audited `vibn-dev/Dockerfile` and `vibn-agent-runner/Dockerfile` — neither defines a HEALTHCHECK, so neither can hit the localhost/IPv6 trap. No action needed today; revisit when either gets a healthcheck added. |
| 2.8 | **Tool-error recovery middleware** (AI_HARNESS_GAPS.md §1) — pattern-match known-recoverable tool errors and inject synthetic instructions before the model's next round | AI | ✓ done 2026-05-01 | `vibn-frontend/lib/ai/error-recovery.ts`. Initial rules: orphan container conflict, image pull denied, port allocated. Wired into `app/api/chat/route.ts` tool-result loop. |
| 2.9 | **Sentry-as-product loop** (SENTRY_AS_PRODUCT.md) — auto-provision per-project Sentry, bake into scaffolds, expose error feed to AI as MCP tools, auto-surface unresolved errors at chat-turn start | AI | ✓ done 2026-05-01 | All 4 stages shipped: (1) `lib/integrations/sentry.ts` provisions per-project Sentry under shared `vibnai` org from `POST /api/projects/create` and lazily on `apps.create`; injects `NEXT_PUBLIC_SENTRY_DSN` + `SENTRY_AUTH_TOKEN` into Coolify app env. (2) `lib/scaffold/sentry-snippets.ts` ships canonical Next.js + Vite snippets; AI system prompt instructs it to wire Sentry on every new app; `projects.get` returns `sentry: {slug, dsn}`. (3) Three MCP tools: `project_recent_errors`, `project_error_detail`, `project_error_resolve` (tenant-safe). (4) `app/api/chat/route.ts` injects `[PROJECT HEALTH]` block at chat-turn start when ≥2-occurrence unresolved issues exist in last 6h. End-to-end verification deferred to smoke test (4.1). |

**Definition of done:** force-fail a route in staging → Sentry alert lands in
< 1 min. Force-fail a Coolify deploy → notification fires. Reproduce an
orphan-container conflict in prod → model calls `apps_unstick` instead of
delete-and-recreate.

---

## Phase 3 — UX surfaces (what users actually touch)

**Goal:** every screen a beta tester lands on either does something useful
or gets out of the way. No screens that exist "to teach the data model".

| # | Task | Owner | Effort | Notes |
|---|---|---|---|---|
| 3.1 | **Hosting tab rewrite** — focus on the domain (live URL, redeploy, env, logs) instead of master-detail of "live + previews" | AI | 4 hrs | Mark flagged earlier |
| 3.2 | Replace the chat's "⚠️ Failed to get response. Please try again." with structured errors that show what tool failed and why | AI | 2 hrs | Critical — currently zero feedback |
| 3.3 | Empty states across Plan/Product/Infrastructure/Hosting that suggest the **next** AI prompt to try (not just "nothing here") | AI | 2 hrs | Vibe coders need a nudge |
| 3.4 | Project header URL chips: collapse to a "+N" pill when there are >3 endpoints | AI | ✓ done 2026-05-01 | `components/project/project-header-urls.tsx`: bumped MAX_VISIBLE to 3, replaced title-tooltip with click-to-open popover (closes on outside-click + Escape). Each row in the popover is a real clickable link with icon + label + host. |
| 3.5 | Status pill: tooltip should link directly to Coolify build logs | AI | ✓ done 2026-05-01 | `components/project/project-stage-pill.tsx`: "Logs" affordance now appears on `deploying`, `down`, and `build_failed` (not just failures). Deep-links to `<COOLIFY_URL>/project/<coolifyProjectUuid>` — one click from build logs. (Direct deployment-uuid link blocked on extending anatomy to surface deployment UUIDs; tracked but low priority.) |
| 3.6 | Product tab: confirm it's actually useful day-to-day. Revise scope if not | Mark + AI | 1 hr | Open question |

**Definition of done:** a stranger lands on every tab in turn. None of them
make us cringe. Each one either shows useful info or gives the user a
concrete next action.

---

## Phase 4 — Onboarding & safety

**Goal:** a stranger with the invite link can get from "what is this" to
"I shipped a thing" without us in the chat.

| # | Task | Owner | Effort | Notes |
|---|---|---|---|---|
| 4.1 | End-to-end smoke test on a fresh account: signup → workspace → project → first chat → first preview → first deploy | Mark + AI | 2 hrs | Walk through with an empty cookie jar; fix everything broken. **Runbook below.** |
| 4.2 | Landing page at `vibnai.com` that explains the product in 30s | Mark + AI | 4 hrs | Currently a login screen |
| 4.3 | "Delete project" UI in project settings (and underlying Coolify cleanup) | AI | 2 hrs | Today only AI can clean up via MCP |
| 4.4 | "Delete workspace" UI — same | AI | 1 hr | |
| 4.5 | Auth hardening pass: NextAuth session expiry, CSRF on mutating routes, GitHub OAuth scope review | AI | 2 hrs | |
| 4.6 | Per-workspace compute quota: max N Coolify projects, max N dev containers, soft cap with friendly error | AI | ✓ done 2026-05-01 | `lib/quotas.ts`: 3 active projects + 3 active dev containers per workspace (suspended containers don't count). Overridable via `VIBN_QUOTA_MAX_PROJECTS_PER_WORKSPACE` / `VIBN_QUOTA_MAX_DEV_CONTAINERS_PER_WORKSPACE` env. Hits return HTTP 402 with structured payload; AI's error-recovery middleware has a `workspace-quota-exceeded` rule that explains the cap to the user without blind retries. Wired into `POST /api/projects/create` and `lib/dev-container.ts` ensure/resume paths. |
| 4.7 | Per-workspace audit log of mutating MCP calls (apps/databases/services create/delete) | AI | 2 hrs | We need this when something goes wrong |
| 4.8 | Invite link / waitlist page (manual approval) so we control who joins | Mark + AI | 1 hr | |

**Definition of done:** Mark hands the invite link to one non-developer
friend, they get to "shipped a thing" without messaging Mark for help.

---

## Phase 5 — Path B closeout

**Goal:** finish the architectural commitments in `AI_PATH_B_EXECUTION_PLAN.md`
that aren't covered above.

| # | Task | Owner | Effort | Notes |
|---|---|---|---|---|
| 5.1 | Build `ghcr.io/vibnai/vibn-dev:latest` on the live Coolify host (`ssh + setup-on-coolify.sh`) | AI | ✓ done 2026-05-01 | Image `vibn-dev:latest` built 2026-04-30 on Coolify host (589 MB, last Dockerfile change Apr 28 so build is current). Smoke-tested as `vibn` user: ripgrep, git, mise all functional. Toolchains install on demand via mise. |
| 5.2 | Hard-remove `gitea_file_*` from the AI tool list; keep REST routes alive 30 days with deprecation header | AI | 1 hr | Path B week 3 task |
| 5.3 | Update `AI_CAPABILITIES.md` to reflect everything that shipped | AI | 1 hr | |
| 5.4 | Eval harness: 10 reference prompts, measure time-to-first-preview, time-to-shipped, tool-call count, success rate | AI | 1–2 days | The actual proof Path B works |
| 5.5 | Theia / openvscode-server toggle: "Open IDE" button in chat → `https://ide-{ws}-{project}.vibnai.com` | AI | 4 hrs | Week 4 nice-to-have; gates the "user becomes developer" graduation |
| 5.6 | Idle-suspend cron — wire `POST /api/admin/path-b/idle-sweep` to a 5-min schedule once we trust it | AI | 30 min | Keeps cost bounded |

**Definition of done:** eval harness reports ≥3× speedup on time-to-first-preview
vs. Path A baseline, ≥80% success rate across the 10 reference prompts.

---

## Sequencing & dependencies

```
P1.1 → P1.2 → P1.3 → P1.4 → P1.5 → P1.6 → P1.7 → P1.8 ──┐
                                                          │
P2.1, P2.2, P2.3 (parallel)                              │
P2.4, P2.5, P2.6, P2.7 (parallel, low priority)          │
                                                          ├─ P3 (parallel internally)
                                                          │
                                                          ├─ P4.1 (depends on P3 being not-cringe)
                                                          ├─ P4.2 (parallel)
                                                          ├─ P4.3..4.8 (parallel)
                                                          │
                                                          └─ P5 (parallel; some pieces gated by P1)
```

P1 is the long pole. Everything else can mostly proceed in parallel once P1
unblocks the iteration loop.

---

## Suggested cadence

- **Today (in flight):** P1.1 — Cloudflare signup + record verification.
- **Tonight / tomorrow:** P1.2–P1.8 once nameservers propagate. **AI does
  the cert + Traefik wiring; Mark does the clicks at Cloudflare/Namecheap.**
- **Day 2:** P2.1–P2.3 (runtime error chase + Sentry) + P3.1 (Hosting rewrite)
  in parallel.
- **Day 3:** P3.2–P3.6 + P4.1 smoke test.
- **Day 4:** P4.2 landing page + P4.3–P4.5 deletion/auth.
- **Day 5:** P4.6–P4.8 quotas/audit/invite + P5.1 vibn-dev image.
- **Days 6–10:** P5.2–P5.6 closeout, eval harness, polish, then invite first
  testers.

10 working days from today to "first 5 testers". Tight but doable if no
nasty discoveries in P2.

---

## What we are *not* doing for beta

Logged so we don't accidentally pull them in:

- Stripe / billing (post-beta — we want to know what to charge for first)
- Mobile-responsive polish (desktop-first beta)
- Multi-region Coolify (single-host is fine for <50 users)
- Replacing Coolify (out of scope; Path B is the abstraction over it)
- Replacing Gitea (Path B's `shell.exec` already abstracts most of it)
- Plugin marketplace, template marketplace, monetization paths
- Anything requiring us to redo NextAuth / migrate to a different auth
- Theme system / dark mode

---

## Risks specific to this plan

| Risk | Mitigation |
|---|---|
| Cloudflare DNS propagation breaks email forwarding | We pre-verified MX records in the audit; double-check at Cloudflare review screen before switching nameservers |
| Traefik wildcard cert acquisition fails on first try | DNS-01 against Cloudflare is well-trodden; if it fails it's fixable, not catastrophic. Old certs keep serving until replaced. |
| Runtime errors in P2 turn out to be a deeper architectural issue | Time-box investigation to 4 hrs each; if not solved, document workaround and ship anyway, debug after invite |
| Eval harness reveals Path B is slower than promised | Acceptable to invite testers without 100% Path B coverage as long as the prod-deploy-only path works. Path B is an upgrade, not a gate. |
| New users hit 100 unforeseen edge cases | This is the point of beta. Triage daily, fix the top-3 each morning. |

---

## Smoke-test runbook (4.1)

**Goal:** prove the user-visible flow from "first visit" through "shipped a deployed app" works end-to-end with all the new wiring (Sentry per-project, quotas, recovery middleware, URL chip popover, status-pill deep-link, deploy-failed Slack alerts).

**Setup:** open an incognito window. Have your Slack channel and Sentry dashboard visible in side tabs. You'll be the fresh user.

### Steps

1. **Visit `https://vibnai.com`** → sign up with Google (use a different gmail than your normal one if possible — keeps test data clean). Confirm you land on the workspace home.
2. **Create a project** (any path: build / oss / import). Pick a slug like `smoke-test-2026-05-01`.
   - **Verify in Sentry:** within ~10s, a new project named `vibn-{your-workspace-slug}-smoke-test-2026-05-01` should appear at <https://vibnai.sentry.io/projects/>.
   - **Verify in DB (optional):** `fs_projects.data.sentry.dsn` is populated for the new row.
3. **Land in chat.** AI should greet you and offer to scaffold something. Ask it to build something simple ("a Next.js todo app").
4. **Watch the preview start.** AI should call `devcontainer_ensure`, scaffold, then `dev_server_start`. A preview URL like `preview-0-{slug}-{token}.preview.vibnai.com` should be returned. Click it. Page should load over HTTPS with a valid cert.
5. **Edit something via chat.** Ask AI to add a button or change copy. HMR should update the preview without reload.
6. **Ship it.** Tell AI "ship it." It should `apps_create` against your Gitea repo + trigger Coolify deploy. Watch the project header status pill go Empty → Deploying → Live.
   - **Verify in Coolify env:** the new app's env vars include `NEXT_PUBLIC_SENTRY_DSN` and `SENTRY_AUTH_TOKEN`.
   - **Verify Slack:** if the deploy fails for any reason, your Slack channel pings within 30s. If it succeeds, no message (by design — we're noise-conscious).
7. **Trigger a real error in the deployed app.** Open the live URL, click around until something breaks. (If nothing breaks, ask AI to add a button that calls `myUndefinedFunction()`.)
   - **Verify in Sentry:** the error lands in the new Sentry project within ~10s, **with a real stack trace** (file/line in your project's source). Session Replay should be available.
   - **Open a new chat with this project** and say "what's broken?" → AI should call `project_recent_errors` and surface the issue with a fix suggestion. This is the killer-feature path.
8. **Hit the quota cap.** Try to create a 4th project. Should get a friendly 402 with the "delete one or contact support" wording, NOT a generic error. AI in chat should explain the cap clearly without retrying.
9. **Test the URL chip popover.** Once you have ≥4 URLs on a project (e.g. preview + live + 2 services), the project header should collapse to 3 chips + a `+N` pill. Click it; popover opens with the rest as clickable links. Click outside; popover closes. Press Escape; closes.
10. **Test the status-pill Logs link.** During a deploy, the "Logs" link next to the pill should one-click into the Coolify project page (not the root).

### What to do when something breaks

- Take a screenshot, open a Vibn chat in a separate (parent-account) tab, paste the screenshot, and say "this just broke during smoke test." AI now has Sentry access + can read recent errors itself.
- If a step is *very* broken, file a P0 against this checklist with the step number and what you saw.

### Pass criteria

- All 10 steps complete with no manual intervention by the AI's parent operator.
- Every "Verify" line returns the expected positive signal.
- Worst case the AI surfaces is a quota cap or known-recoverable error — never a generic "something went wrong."

---

## How to use this doc

- Treat phase boundaries as soft. If a P2 task unblocks a P3 task and you're
  there, do it.
- When a task ships, check it off and move it under "Shipped" in
  [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md).
- When the plan changes (it will), edit this doc directly, don't fork it.
- Beta success criteria: **5 testers, all reach "shipped a thing", weekly
  active rate >60% in week 2.** If we miss those, the next plan is "what
  did we get wrong."