This repository has been archived on 2026-06-07. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
master-ai/BETA_LAUNCH_PLAN.md

243 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Beta Launch Execution Plan
> The path from "shipping to ourselves" to **"510 friendly testers can use
> Vibn end-to-end without us hand-holding."**
>
> **Companion to:** [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md)
> (architecture) and [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (current state).
>
> **Drafted:** 2026-04-30. **Owner:** Mark + AI.
>
> **Scope:** Everything we agreed in the 2026-04-30 review that's NOT already
> shipped. Pulls in the unfinished items from Path B (DNS, cert, previews,
> eval) AND the "before strangers see this" gaps that Path B doesn't cover
> (runtime errors, error surfaces, onboarding smoke test, landing page,
> safety rails).
---
## North star for the beta
A non-technical founder receives a Vibn invite link, signs up, describes
what they want to build, sees a working preview within a few minutes, can
iterate on it through chat without seeing a stack trace, and can ship it
to a real domain — all without us reaching into Coolify on their behalf.
If any of those steps requires us in the loop, beta isn't ready.
---
## Phase ordering
Sequenced by **leverage × blocking risk**. Earlier phases unblock later ones.
```
P1 Previews unlock ── enables fast-iteration UX & demos ──┐
P2 Stability & visibility ── stops silent rot ─────────────────┤
P3 UX surfaces ── what the user actually touches ─────┼─── INVITE
P4 Onboarding & safety ── what a stranger needs day 1 ────────┤
P5 Path B closeout ── ship the architectural commitments ─┘
```
---
## Phase 1 — Previews unlock — **SHIPPED 2026-05-01**
**Goal:** `dev_server.start` returns a clickable `https://*.preview.vibnai.com`
URL that loads in <30s, with HMR working over the proxy.
**Why first:** the single biggest UX cliff today is "user iterates → 3-7 min
Coolify build". Previews collapse it to seconds. Everything else is polish on
a slow loop until this lands.
| # | Task | Owner | Effort | Status |
|---|---|---|---|---|
| 1.1 | Sign up for Cloudflare; add `vibnai.com`; verify imported records (MX, SPF, wildcard A, apex A) | Mark | 15 min | ✓ done |
| 1.2 | Switch Namecheap nameservers to Cloudflare-assigned NS pair | Mark | 2 min | ✓ done |
| 1.3 | Wait for propagation; verify `dig @1.1.1.1` from multiple resolvers | AI | 30120 min | ✓ done — `34.19.250.135` from CF + Google resolvers |
| 1.4 | Generate Cloudflare API token (DNS edit, `vibnai.com` only) | Mark | 2 min | ✓ done — stored in `.coolify.env` |
| 1.5 | Configure Traefik Let's Encrypt DNS-01 with the Cloudflare token | AI | 20 min | ✓ done — `letsencrypt-dns` resolver wired in `coolify-proxy` |
| 1.6 | Test wildcard cert issues for `*.preview.vibnai.com` (curl, browser) | AI | 10 min | ✓ done — both `*.vibnai.com` and `*.preview.vibnai.com` certs issued; `curl https://test.preview.vibnai.com` returns valid LE cert |
| 1.7 | Wire `dev_server.start` to mint Traefik labels with the wildcard host | AI | 1 hr | ✓ done — pre-baked labels for ports 30003009 in `vibn-dev` compose; YAML escape bug fixed; cert resolver fixed to `letsencrypt-dns` |
| 1.8 | Spike: WebSocket / Vite HMR through Traefik against `vibn-dev` container | AI | 30 min | ✓ done — `101 Switching Protocols`, `vite-hmr` subprotocol negotiated, `js-update` messages fire within ~1s of file edit. See verified config below. |
**Definition of done:** ✅ AI says "open a Vite dev server", user clicks the URL,
sees Vite's welcome page, edits a file via `fs.edit`, change appears in
browser within 5s without manual reload.
**Verified Vite config for HMR through Traefik** (the system prompt should advertise this exact shape when scaffolding Vite projects):
```js
server: {
host: '0.0.0.0',
port: 3001, // any 30003009
strictPort: true,
hmr: {
clientPort: 443,
protocol: 'wss',
host: 'preview-{slot}-{slug}-{token}.preview.vibnai.com',
},
}
```
---
## Phase 2 — Stability & visibility
**Goal:** when something breaks in production, we hear about it before users do.
| # | Task | Owner | Effort | Notes |
|---|---|---|---|---|
| 2.1 | Reproduce + diagnose `ERR_HTTP_HEADERS_SENT` from prod logs | AI | 12 hrs | Likely a server action / API route returning twice |
| 2.2 | Reproduce + diagnose `TypeError: reading 'z'/'j'/'aa'` in prod bundle | AI | 12 hrs | Minified prod error; suspect `react-markdown` server/client boundary |
| 2.3 | Wire Sentry (or alternative) for both client + server runtime errors | AI | 2 hrs | Free tier, scoped DSN per environment |
| 2.4 | Wire deployment-failed Coolify webhook → Slack/email | AI | 30 min | So we don't find out by users complaining |
| 2.5 | Tighten Coolify docker prune to every 6 hrs (vs daily) | AI | ✓ done 2026-05-01 | Already configured: both servers use `docker_cleanup_frequency: "0 */6 * * *"` with `force_docker_cleanup: true`. Verified via `/api/v1/servers`. |
| 2.6 | Bake `HEALTHCHECK 127.0.0.1` into `vibn-frontend/Dockerfile` so future apps inherit | AI | ✓ done 2026-05-01 | Already in `vibn-frontend/Dockerfile:67-68`; comment explains the IPv6 trap |
| 2.7 | Audit other Dockerfile-based apps for the same `localhost`/IPv6 trap | AI | ✓ done 2026-05-01 | Audited `vibn-dev/Dockerfile` and `vibn-agent-runner/Dockerfile` — neither defines a HEALTHCHECK, so neither can hit the localhost/IPv6 trap. No action needed today; revisit when either gets a healthcheck added. |
| 2.8 | **Tool-error recovery middleware** (AI_HARNESS_GAPS.md §1) — pattern-match known-recoverable tool errors and inject synthetic instructions before the model's next round | AI | ✓ done 2026-05-01 | `vibn-frontend/lib/ai/error-recovery.ts`. Initial rules: orphan container conflict, image pull denied, port allocated. Wired into `app/api/chat/route.ts` tool-result loop. |
**Definition of done:** force-fail a route in staging → Sentry alert lands in
< 1 min. Force-fail a Coolify deploy → notification fires. Reproduce an
orphan-container conflict in prod → model calls `apps_unstick` instead of
delete-and-recreate.
---
## Phase 3 — UX surfaces (what users actually touch)
**Goal:** every screen a beta tester lands on either does something useful
or gets out of the way. No screens that exist "to teach the data model".
| # | Task | Owner | Effort | Notes |
|---|---|---|---|---|
| 3.1 | **Hosting tab rewrite** — focus on the domain (live URL, redeploy, env, logs) instead of master-detail of "live + previews" | AI | 4 hrs | Mark flagged earlier |
| 3.2 | Replace the chat's "⚠️ Failed to get response. Please try again." with structured errors that show what tool failed and why | AI | 2 hrs | Critical — currently zero feedback |
| 3.3 | Empty states across Plan/Product/Infrastructure/Hosting that suggest the **next** AI prompt to try (not just "nothing here") | AI | 2 hrs | Vibe coders need a nudge |
| 3.4 | Project header URL chips: collapse to a "+N" pill when there are >3 endpoints | AI | 30 min | Polish |
| 3.5 | Status pill: tooltip should link directly to Coolify build logs | AI | 30 min | When user sees "Build failed" they want to know why |
| 3.6 | Product tab: confirm it's actually useful day-to-day. Revise scope if not | Mark + AI | 1 hr | Open question |
**Definition of done:** a stranger lands on every tab in turn. None of them
make us cringe. Each one either shows useful info or gives the user a
concrete next action.
---
## Phase 4 — Onboarding & safety
**Goal:** a stranger with the invite link can get from "what is this" to
"I shipped a thing" without us in the chat.
| # | Task | Owner | Effort | Notes |
|---|---|---|---|---|
| 4.1 | End-to-end smoke test on a fresh account: signup → workspace → project → first chat → first preview → first deploy | Mark + AI | 2 hrs | Walk through with an empty cookie jar; fix everything broken |
| 4.2 | Landing page at `vibnai.com` that explains the product in 30s | Mark + AI | 4 hrs | Currently a login screen |
| 4.3 | "Delete project" UI in project settings (and underlying Coolify cleanup) | AI | 2 hrs | Today only AI can clean up via MCP |
| 4.4 | "Delete workspace" UI — same | AI | 1 hr | |
| 4.5 | Auth hardening pass: NextAuth session expiry, CSRF on mutating routes, GitHub OAuth scope review | AI | 2 hrs | |
| 4.6 | Per-workspace compute quota: max N Coolify projects, max N dev containers, soft cap with friendly error | AI | 3 hrs | One bad actor today = unbounded GCE bill |
| 4.7 | Per-workspace audit log of mutating MCP calls (apps/databases/services create/delete) | AI | 2 hrs | We need this when something goes wrong |
| 4.8 | Invite link / waitlist page (manual approval) so we control who joins | Mark + AI | 1 hr | |
**Definition of done:** Mark hands the invite link to one non-developer
friend, they get to "shipped a thing" without messaging Mark for help.
---
## Phase 5 — Path B closeout
**Goal:** finish the architectural commitments in `AI_PATH_B_EXECUTION_PLAN.md`
that aren't covered above.
| # | Task | Owner | Effort | Notes |
|---|---|---|---|---|
| 5.1 | Build `ghcr.io/vibnai/vibn-dev:latest` on the live Coolify host (`ssh + setup-on-coolify.sh`) | AI | 30 min | Pre-req for any new project's dev container |
| 5.2 | Hard-remove `gitea_file_*` from the AI tool list; keep REST routes alive 30 days with deprecation header | AI | 1 hr | Path B week 3 task |
| 5.3 | Update `AI_CAPABILITIES.md` to reflect everything that shipped | AI | 1 hr | |
| 5.4 | Eval harness: 10 reference prompts, measure time-to-first-preview, time-to-shipped, tool-call count, success rate | AI | 12 days | The actual proof Path B works |
| 5.5 | Theia / openvscode-server toggle: "Open IDE" button in chat → `https://ide-{ws}-{project}.vibnai.com` | AI | 4 hrs | Week 4 nice-to-have; gates the "user becomes developer" graduation |
| 5.6 | Idle-suspend cron — wire `POST /api/admin/path-b/idle-sweep` to a 5-min schedule once we trust it | AI | 30 min | Keeps cost bounded |
**Definition of done:** eval harness reports ≥3× speedup on time-to-first-preview
vs. Path A baseline, ≥80% success rate across the 10 reference prompts.
---
## Sequencing & dependencies
```
P1.1 → P1.2 → P1.3 → P1.4 → P1.5 → P1.6 → P1.7 → P1.8 ──┐
P2.1, P2.2, P2.3 (parallel) │
P2.4, P2.5, P2.6, P2.7 (parallel, low priority) │
├─ P3 (parallel internally)
├─ P4.1 (depends on P3 being not-cringe)
├─ P4.2 (parallel)
├─ P4.3..4.8 (parallel)
└─ P5 (parallel; some pieces gated by P1)
```
P1 is the long pole. Everything else can mostly proceed in parallel once P1
unblocks the iteration loop.
---
## Suggested cadence
- **Today (in flight):** P1.1 — Cloudflare signup + record verification.
- **Tonight / tomorrow:** P1.2P1.8 once nameservers propagate. **AI does
the cert + Traefik wiring; Mark does the clicks at Cloudflare/Namecheap.**
- **Day 2:** P2.1P2.3 (runtime error chase + Sentry) + P3.1 (Hosting rewrite)
in parallel.
- **Day 3:** P3.2P3.6 + P4.1 smoke test.
- **Day 4:** P4.2 landing page + P4.3P4.5 deletion/auth.
- **Day 5:** P4.6P4.8 quotas/audit/invite + P5.1 vibn-dev image.
- **Days 610:** P5.2P5.6 closeout, eval harness, polish, then invite first
testers.
10 working days from today to "first 5 testers". Tight but doable if no
nasty discoveries in P2.
---
## What we are *not* doing for beta
Logged so we don't accidentally pull them in:
- Stripe / billing (post-beta — we want to know what to charge for first)
- Mobile-responsive polish (desktop-first beta)
- Multi-region Coolify (single-host is fine for <50 users)
- Replacing Coolify (out of scope; Path B is the abstraction over it)
- Replacing Gitea (Path B's `shell.exec` already abstracts most of it)
- Plugin marketplace, template marketplace, monetization paths
- Anything requiring us to redo NextAuth / migrate to a different auth
- Theme system / dark mode
---
## Risks specific to this plan
| Risk | Mitigation |
|---|---|
| Cloudflare DNS propagation breaks email forwarding | We pre-verified MX records in the audit; double-check at Cloudflare review screen before switching nameservers |
| Traefik wildcard cert acquisition fails on first try | DNS-01 against Cloudflare is well-trodden; if it fails it's fixable, not catastrophic. Old certs keep serving until replaced. |
| Runtime errors in P2 turn out to be a deeper architectural issue | Time-box investigation to 4 hrs each; if not solved, document workaround and ship anyway, debug after invite |
| Eval harness reveals Path B is slower than promised | Acceptable to invite testers without 100% Path B coverage as long as the prod-deploy-only path works. Path B is an upgrade, not a gate. |
| New users hit 100 unforeseen edge cases | This is the point of beta. Triage daily, fix the top-3 each morning. |
---
## How to use this doc
- Treat phase boundaries as soft. If a P2 task unblocks a P3 task and you're
there, do it.
- When a task ships, check it off and move it under "Shipped" in
[`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md).
- When the plan changes (it will), edit this doc directly, don't fork it.
- Beta success criteria: **5 testers, all reach "shipped a thing", weekly
active rate >60% in week 2.** If we miss those, the next plan is "what
did we get wrong."