366 lines
26 KiB
Markdown
366 lines
26 KiB
Markdown
# Beta Launch Execution Plan
|
||
|
||
> The path from "shipping to ourselves" to **"5–10 friendly testers can use
|
||
> Vibn end-to-end without us hand-holding."**
|
||
>
|
||
> **Companion to:** [`AI_PATH_B_EXECUTION_PLAN.md`](./AI_PATH_B_EXECUTION_PLAN.md)
|
||
> (architecture) and [`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md) (current state).
|
||
>
|
||
> **Drafted:** 2026-04-30. **Owner:** Mark + AI.
|
||
>
|
||
> **Scope:** Everything we agreed in the 2026-04-30 review that's NOT already
|
||
> shipped. Pulls in the unfinished items from Path B (DNS, cert, previews,
|
||
> eval) AND the "before strangers see this" gaps that Path B doesn't cover
|
||
> (runtime errors, error surfaces, onboarding smoke test, landing page,
|
||
> safety rails).
|
||
|
||
---
|
||
|
||
## North star for the beta
|
||
|
||
A non-technical founder receives a Vibn invite link, signs up, describes
|
||
what they want to build, sees a working preview within a few minutes, can
|
||
iterate on it through chat without seeing a stack trace, and can ship it
|
||
to a real domain — all without us reaching into Coolify on their behalf.
|
||
|
||
If any of those steps requires us in the loop, beta isn't ready.
|
||
|
||
---
|
||
|
||
## Phase ordering
|
||
|
||
Sequenced by **leverage × blocking risk**. Earlier phases unblock later ones.
|
||
|
||
```
|
||
P1 Previews unlock ── enables fast-iteration UX & demos ──┐
|
||
P2 Stability & visibility ── stops silent rot ───────────────────┤
|
||
P3 UX surfaces ── what the user actually touches ─────┼─── INVITE
|
||
P4 Onboarding & safety ── what a stranger needs day 1 ────────┤
|
||
P5 Path B closeout ── ship the architectural commitments ─┤
|
||
P6 Artifact-first UX ── two-pane shell, preview-as-default ─┘
|
||
```
|
||
|
||
## Model assignment convention
|
||
|
||
**Opus is reserved for judgment-heavy work** — anything that touches
|
||
multiple subsystems, has security implications, designs a protocol,
|
||
or requires reading existing architecture before deciding what to
|
||
ship. Mechanical, well-specified work goes to a cheaper coder model.
|
||
|
||
Per-task tags inside each Phase table:
|
||
|
||
- **opus** — architectural, cross-cutting, security-sensitive. Opus
|
||
reads the relevant code, decides the approach, writes the code.
|
||
- **cheap** — well-specified, single-file or local-scope, pattern
|
||
exists. Cheaper model executes from the row's notes.
|
||
- **opus-spec → cheap** — Opus writes a tight one-paragraph spec
|
||
in the row's notes (schema columns, function signature, exact
|
||
files to touch); cheaper model implements verbatim.
|
||
|
||
If a row has no model tag, default is `cheap`. The expensive default
|
||
is opt-in, not opt-out.
|
||
|
||
---
|
||
|
||
## Phase 1 — Previews unlock — **SHIPPED 2026-05-01**
|
||
|
||
**Goal:** `dev_server.start` returns a clickable `https://*.preview.vibnai.com`
|
||
URL that loads in <30s, with HMR working over the proxy.
|
||
|
||
**Why first:** the single biggest UX cliff today is "user iterates → 3-7 min
|
||
Coolify build". Previews collapse it to seconds. Everything else is polish on
|
||
a slow loop until this lands.
|
||
|
||
| # | Task | Owner | Effort | Status |
|
||
|---|---|---|---|---|
|
||
|
||
**Definition of done:** ✅ AI says "open a Vite dev server", user clicks the URL,
|
||
sees Vite's welcome page, edits a file via `fs.edit`, change appears in
|
||
browser within 5s without manual reload.
|
||
|
||
**Verified Vite config for HMR through Traefik** (the system prompt should advertise this exact shape when scaffolding Vite projects):
|
||
|
||
```js
|
||
server: {
|
||
host: '0.0.0.0',
|
||
port: 3001, // any 3000–3009
|
||
strictPort: true,
|
||
hmr: {
|
||
clientPort: 443,
|
||
protocol: 'wss',
|
||
host: 'preview-{slot}-{slug}-{token}.preview.vibnai.com',
|
||
},
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Phase 2 — Stability & visibility
|
||
|
||
**Goal:** when something breaks in production, we hear about it before users do.
|
||
|
||
| # | Task | Owner | Effort | Notes |
|
||
|---|---|---|---|---|
|
||
| 2.1 | Reproduce + diagnose `ERR_HTTP_HEADERS_SENT` from prod logs | AI | 1–2 hrs | Likely a server action / API route returning twice |
|
||
| 2.2 | Reproduce + diagnose `TypeError: reading 'z'/'j'/'aa'` in prod bundle | AI | 1–2 hrs | Minified prod error; suspect `react-markdown` server/client boundary |
|
||
|
||
**Definition of done:** force-fail a route in staging → Sentry alert lands in
|
||
< 1 min. Force-fail a Coolify deploy → notification fires. Reproduce an
|
||
orphan-container conflict in prod → model calls `apps_unstick` instead of
|
||
delete-and-recreate.
|
||
|
||
---
|
||
|
||
## Phase 3 — UX surfaces (what users actually touch)
|
||
|
||
**Goal:** every screen a beta tester lands on either does something useful
|
||
or gets out of the way. No screens that exist "to teach the data model".
|
||
|
||
| # | Task | Owner | Effort | Notes |
|
||
|---|---|---|---|---|
|
||
| 3.1 | **Hosting tab rewrite** — focus on the domain (live URL, redeploy, env, logs) instead of master-detail of "live + previews" | AI | 4 hrs | Mark flagged earlier |
|
||
| 3.2 | Replace the chat's "⚠️ Failed to get response. Please try again." with structured errors that show what tool failed and why | AI | 2 hrs | Critical — currently zero feedback |
|
||
| 3.3 | Empty states across Plan/Product/Infrastructure/Hosting that suggest the **next** AI prompt to try (not just "nothing here") | AI | 2 hrs | Vibe coders need a nudge |
|
||
| 3.6 | Product tab: confirm it's actually useful day-to-day. Revise scope if not | Mark + AI | 1 hr | Open question |
|
||
| 3.7 | **Scope-doc upload in Plan tab** — drop a PDF/.md/.docx/.txt as the project brief; server extracts text, stores on `fs_projects.brief_text` + `brief_meta`, exposes via `[PROJECT BRIEF]` block in system prompt and a `project_brief` MCP tool for on-demand grep. New file: `lib/integrations/brief-extract.ts`. Empty state replaces "nothing here" on Plan. | AI | 3 hrs | Came up during smoke test prep — users will arrive with scope docs (PDF/Notion-export/Doc); right now there's no way to hand the AI the source of truth except paste-into-chat. |
|
||
| 3.8 | **"Stop at something tangible" — three layers** | AI | partially done | Came up watching Manifest scaffold — AI stopped at "everything is wired together" with no preview, leaving the user to wonder if any of it was real. Code on disk is invisible; preview URL is the proof. |
|
||
| 3.8c | Server-side enforcement: if a turn called `fs_write` ≥10 times for source files but never `dev_server_start` or `apps_deploy`, append a synthetic recovery instruction telling the model to either start a server or explain the blocker | AI | 1 hr | Safety net for when the model ignores the prompt rule under load. Add a tracker in `app/api/chat/route.ts` tool loop, fire the instruction inside the round 2 system message. |
|
||
|
||
**Definition of done:** a stranger lands on every tab in turn. None of them
|
||
make us cringe. Each one either shows useful info or gives the user a
|
||
concrete next action.
|
||
|
||
---
|
||
|
||
## Phase 4 — Onboarding & safety
|
||
|
||
**Goal:** a stranger with the invite link can get from "what is this" to
|
||
"I shipped a thing" without us in the chat.
|
||
|
||
| # | Task | Owner | Effort | Notes |
|
||
|---|---|---|---|---|
|
||
| 4.1 | End-to-end smoke test on a fresh account: signup → workspace → project → first chat → first preview → first deploy | Mark + AI | 2 hrs | Walk through with an empty cookie jar; fix everything broken. **Runbook below.** |
|
||
| 4.2 | Landing page at `vibnai.com` that explains the product in 30s | Mark + AI | 4 hrs | Currently a login screen |
|
||
| 4.4 | "Delete workspace" UI — same | AI | 1 hr | |
|
||
| 4.5 | Auth hardening pass: NextAuth session expiry, CSRF on mutating routes, GitHub OAuth scope review | AI | 2 hrs | |
|
||
| 4.7 | Per-workspace audit log of mutating MCP calls (apps/databases/services create/delete) | AI | 2 hrs | We need this when something goes wrong |
|
||
| 4.8 | Invite link / waitlist page (manual approval) so we control who joins | Mark + AI | 1 hr | |
|
||
|
||
**Definition of done:** Mark hands the invite link to one non-developer
|
||
friend, they get to "shipped a thing" without messaging Mark for help.
|
||
|
||
---
|
||
|
||
## Phase 5 — Path B closeout
|
||
|
||
**Goal:** finish the architectural commitments in `AI_PATH_B_EXECUTION_PLAN.md`
|
||
that aren't covered above.
|
||
|
||
| # | Task | Owner | Effort | Notes |
|
||
|---|---|---|---|---|
|
||
| 5.2 | Hard-remove `gitea_file_*` from the AI tool list; keep REST routes alive 30 days with deprecation header | AI | 1 hr | Path B week 3 task |
|
||
| 5.3 | Update `AI_CAPABILITIES.md` to reflect everything that shipped | AI | 1 hr | |
|
||
| 5.4 | Eval harness: 10 reference prompts, measure time-to-first-preview, time-to-shipped, tool-call count, success rate | AI | 1–2 days | The actual proof Path B works |
|
||
| 5.5 | Theia / openvscode-server toggle: "Open IDE" button in chat → `https://ide-{ws}-{project}.vibnai.com` | AI | 4 hrs | Week 4 nice-to-have; gates the "user becomes developer" graduation |
|
||
| 5.6 | Idle-suspend cron — wire `POST /api/admin/path-b/idle-sweep` to a 5-min schedule once we trust it | AI | 30 min | Keeps cost bounded |
|
||
|
||
**Definition of done:** eval harness reports ≥3× speedup on time-to-first-preview
|
||
vs. Path A baseline, ≥80% success rate across the 10 reference prompts.
|
||
|
||
---
|
||
|
||
## Phase 6 — Artifact-first UX (Bolt parity, Vibn taste)
|
||
|
||
**Goal:** the running app is the dominant surface on every project page,
|
||
not a thing-to-summon. The user should never have to wonder if the AI
|
||
actually built something — it's right there. Lift the structural
|
||
patterns Bolt/Lovable/v0 have proven (two-pane, preview chrome, plan-
|
||
as-checklist) without the dark glassy aesthetic.
|
||
|
||
**Why now (after smoke test prep):** today the AI can scaffold a full
|
||
six-service stack and a non-technical founder still has no idea where
|
||
"their app" is. The composer chip + header Preview button (3.8) helped,
|
||
but the structural fix is two-pane.
|
||
|
||
| # | Task | Owner | Effort | Model | Notes |
|
||
|---|---|---|---|---|---|
|
||
| 6.A1 | **Two-pane structural refactor** — replace `app/[workspace]/project/[projectId]/(home)/layout.tsx` shell. Workspace sidebar (unchanged) → **main column**: horizontal icon bar (👁 / ⟨⟩ / … / ⚙) above a row of permanent chat ~380px + pure artifact pane. Bar spans full width above chat+content so the vertical strip does not steal horizontal space from the preview. Default view = Preview. Coordinates with existing `ProjectStagePill`, `ProjectAssociationPrompt`, `--chat-panel-width` CSS var (obsolete on project pages). | AI | 3 hrs | **opus** | Cross-cutting; touches layout, chat panel, header. Has to keep `ProjectHeaderUrls` working. |
|
||
| 6.A2 | **PreviewPane component** — iframe of `previewUrl || fqdn`, with empty-state placeholder ("Your preview will appear here" + Vibn V mark). Reads from `useAnatomy()`. Exports `kind: "preview"` rendered by 6.A1's right pane. | AI | 1 hr | opus-spec → cheap | Spec: read `anatomy.hosting.previews[0].url` first, fall back to `anatomy.hosting.live[0].fqdn`, fall back to placeholder. Same poll cadence as `ProjectHeaderUrls`. Iframe sandboxed with `allow-scripts allow-forms allow-same-origin allow-popups`. |
|
||
| 6.A3 | **Iframe chrome (artifact-local)** — top-right of the iframe: ↻ reload (force iframe `key` bump), ↗ open-in-new-tab, ⛶ fullscreen (toggles a panel-level `expanded` state that hides chat column), 📱 device-frame (desktop / tablet / mobile widths, persisted to localStorage per-project). | AI | 1 hr | cheap | Pure UI. Width tokens: desktop=100%, tablet=820px, mobile=390px. Frame is a CSS wrapper, not real device emulation. |
|
||
| 6.A4 | **Code view** — when ⟨⟩ is selected on the rail, render the existing `gitea-file-tree.tsx` + `gitea-file-viewer.tsx` in the right pane. Two-column inside the right pane: tree on the left, viewer on the right. | AI | 30 min | cheap | Components already exist; just compose them. Shared `selectedPath` state. |
|
||
| 6.A5 | **Resources view** — when 🗄 is selected, render the existing `database-table-tree.tsx` + `table-viewer.tsx` plus a small list of running services (from `anatomy.hosting.live[]` and `apps_containers_list`). | AI | 1 hr | cheap | Same wire-up pattern as 6.A4. |
|
||
| 6.B1 | **Persist last-known dev server config** — new table `fs_project_dev_servers (project_id PK, command, port, framework, last_started_at, status)`. Hook `dev_server_start` MCP tool to upsert on success; `dev_server_stop` to flip status. | AI | 1 hr | opus-spec → cheap | Spec: schema is `project_id UUID PK, command TEXT NOT NULL, port INT NOT NULL, framework TEXT, last_started_at TIMESTAMPTZ, status TEXT CHECK IN ('running','stopped','crashed')`. Migration in `lib/db-postgres-migrations.ts` pattern. Upsert in `lib/dev-server-manager.ts` (or wherever `dev_server_start` lives — find via `Grep`). |
|
||
| 6.B2 | **Auto-resume dev server on project page mount** — server-render hook on the new layout: if (a) saved server config exists AND (b) `getDevContainerStatus()` returns `running` or `provisioning` AND (c) no live preview already in `useAnatomy().hosting.previews[]` → fire the saved `dev_server_start` server-side BEFORE the page paints. User lands; preview is live. | AI | 2 hrs | **opus** | Risky if naive — could resume a server the user explicitly stopped, could thrash on idle-suspended containers, could race the existing on-mount `devcontainer_ensure`. Needs careful state-machine read. Idempotency comes from `dev_server_start` returning `alreadyRunning: true` when a process matches command+port. |
|
||
| 6.C1 | **SSE `plan` event protocol** — server emits `{ type: "plan", taskId, text, status: "queued"\|"in_progress"\|"done" }` whenever `plan_task_add` / `plan_task_complete` (or a new `plan_task_start`) MCP tool fires inside a chat turn. Coexists with existing `text` and `toolCall` events. | AI | 2 hrs | **opus** | Protocol design — has to handle ordering (plan event must land before the tool's `toolResult`), client-side reconciliation with `fs_projects.plan.tasks[]` on next page load (server is source of truth, SSE is a hot stream), and the case where the AI calls `plan_task_complete` for a task added in a prior turn. |
|
||
| 6.C2 | **Client TimelineEntry of `kind: "plan"`** — render a checklist with status circles (○ queued / ◐ in-progress / ● done) inside the assistant message timeline. Each new `plan` SSE event upserts by `taskId`. Ledger pattern matches the existing `kind: "text"` / `kind: "tool"` rendering in `chat-panel.tsx`. | AI | 1.5 hrs | opus-spec → cheap | Spec written into 6.C1's notes. Visual: indented under a "Plan" mini-header, same Outfit/Newsreader palette, status circles in `#a09a90` → `#3a3530` → `#1a1a1a`. |
|
||
| 6.C3 | **Share + Publish buttons on the new shell** — top-right of the right pane (next to artifact chrome). Share = copy `previewUrl \|\| fqdn`. Publish = fire existing `ship` MCP tool with auto-generated commit message. | AI | 30 min | cheap | Both are existing tool calls; just buttons. |
|
||
| 6.D1 | **⚙ Settings popover** — single popover off the icon rail's ⚙. Sections: Domain (shows current `fqdn`, link to rename), Sentry (link to project's Sentry dashboard from `projects.get`), Secrets (links to 6.D2), Quick-add database (fires 6.D3 modal), All project settings (link to `/[workspace]/project/[id]/settings`). | AI | 1 hr | cheap | Pattern matches existing `project-header-urls` popover. |
|
||
| 6.D2 | **Project-level secret scratchpad** — new table `fs_project_secrets (project_id, key, value_encrypted, created_at, updated_at)`. Encryption-at-rest via existing `lib/crypto.ts` (or scaffold one if absent — use AES-GCM with a workspace-scoped key derived from a master `VIBN_SECRETS_KEY` env). MCP tools: `secrets_get { projectId, key }`, `secrets_set { projectId, key, value }`, `secrets_list { projectId }` (returns keys only). AI can read/write user-supplied API keys before they're injected into a deploy. | AI | 2 hrs | **opus** | Security-sensitive: encryption scheme, key rotation story, tenant isolation, what gets logged. Needs careful handling, not pattern-matching. |
|
||
| 6.D3 | **Quick-add database modal** — fires `databases_create` MCP tool, blocks until `databases_get` returns a connection URL, surfaces the URL with a copy button + "I'll inject this into your app's env" affordance that calls `apps_envs_set` if a target app exists. | AI | 1 hr | cheap | Each step is an existing MCP call; modal orchestrates them. |
|
||
|
||
**Definition of done:** open any project that's been built, the running
|
||
preview is already live in the right pane without clicking anything;
|
||
clicking ⟨⟩ shows the source tree; clicking 🗄 shows the databases;
|
||
the AI's plan streams in as a checklist instead of paragraphs; ⚙
|
||
opens a single popover with all project config one click away.
|
||
|
||
**Sequencing inside Phase 6:**
|
||
|
||
```
|
||
6.A1 (structural) ──> 6.A2..A5 (panels) ──┐
|
||
├── 6.D1 (settings popover)
|
||
6.B1 (persist) ──> 6.B2 (auto-resume) ──┘
|
||
6.C1 (SSE) ──> 6.C2 (client renderer) ──> 6.C3 (share/publish)
|
||
└─ 6.D2 (secrets) ── 6.D3 (db modal)
|
||
```
|
||
|
||
A-track is the user-visible spine. B-track makes the spine populated
|
||
on first paint. C-track makes the AI feel like it's working. D-track
|
||
fills the settings drawer.
|
||
|
||
**Out of scope for Phase 6 (intentional cuts, captured here so they
|
||
don't get pulled in):**
|
||
|
||
- Built-app authentication scaffolding (auth-as-a-service for users'
|
||
apps) — multi-week, real product call.
|
||
- First-party connectors marketplace (Stripe / Twilio "click to add")
|
||
— multi-week per integration. AI can install via shell today.
|
||
- Multi-model picker / "Plan vs. Build" toggle on the composer —
|
||
defer until BYOK lands and we have something to switch between.
|
||
- Design-system picker on the composer — real product call.
|
||
- Knowledge-base / RAG beyond scope-doc upload (3.7).
|
||
- Server functions à la Bolt — different deploy model, not a gap.
|
||
- Mobile preview QR — only matters once mobile is a real target.
|
||
|
||
---
|
||
|
||
## Sequencing & dependencies
|
||
|
||
```
|
||
P1.1 → P1.2 → P1.3 → P1.4 → P1.5 → P1.6 → P1.7 → P1.8 ──┐
|
||
│
|
||
P2.1, P2.2, P2.3 (parallel) │
|
||
P2.4, P2.5, P2.6, P2.7 (parallel, low priority) │
|
||
├─ P3 (parallel internally)
|
||
│
|
||
├─ P4.1 (depends on P3 being not-cringe)
|
||
├─ P4.2 (parallel)
|
||
├─ P4.3..4.8 (parallel)
|
||
│
|
||
├─ P5 (parallel; some pieces gated by P1)
|
||
│
|
||
└─ P6 (gated by P1 + P5.7;
|
||
6.A1 unblocks the rest)
|
||
```
|
||
|
||
P1 is the long pole. P6 is the next big-leverage move once smoke test
|
||
passes — pre-invite UX upgrade, depends on previews (P1) and the
|
||
auto-clone wiring (5.7) both being solid.
|
||
|
||
---
|
||
|
||
## Suggested cadence
|
||
|
||
- **Today (in flight):** P1.1 — Cloudflare signup + record verification.
|
||
- **Tonight / tomorrow:** P1.2–P1.8 once nameservers propagate. **AI does
|
||
the cert + Traefik wiring; Mark does the clicks at Cloudflare/Namecheap.**
|
||
- **Day 2:** P2.1–P2.3 (runtime error chase + Sentry) + P3.1 (Hosting rewrite)
|
||
in parallel.
|
||
- **Day 3:** P3.2–P3.6 + P4.1 smoke test.
|
||
- **Day 4:** P4.2 landing page + P4.3–P4.5 deletion/auth.
|
||
- **Day 5:** P4.6–P4.8 quotas/audit/invite + P5.1 vibn-dev image.
|
||
- **Days 6–10:** P5.2–P5.6 closeout, eval harness, polish, then invite first
|
||
testers.
|
||
|
||
10 working days from today to "first 5 testers". Tight but doable if no
|
||
nasty discoveries in P2.
|
||
|
||
---
|
||
|
||
## What we are *not* doing for beta
|
||
|
||
Logged so we don't accidentally pull them in:
|
||
|
||
- Stripe / billing (post-beta — we want to know what to charge for first)
|
||
- Mobile-responsive polish (desktop-first beta)
|
||
- Multi-region Coolify (single-host is fine for <50 users)
|
||
- Replacing Coolify (out of scope; Path B is the abstraction over it)
|
||
- Replacing Gitea (Path B's `shell.exec` already abstracts most of it)
|
||
- Plugin marketplace, template marketplace, monetization paths
|
||
- Anything requiring us to redo NextAuth / migrate to a different auth
|
||
- Theme system / dark mode
|
||
|
||
---
|
||
|
||
## Risks specific to this plan
|
||
|
||
| Risk | Mitigation |
|
||
|---|---|
|
||
| Cloudflare DNS propagation breaks email forwarding | We pre-verified MX records in the audit; double-check at Cloudflare review screen before switching nameservers |
|
||
| Traefik wildcard cert acquisition fails on first try | DNS-01 against Cloudflare is well-trodden; if it fails it's fixable, not catastrophic. Old certs keep serving until replaced. |
|
||
| Runtime errors in P2 turn out to be a deeper architectural issue | Time-box investigation to 4 hrs each; if not solved, document workaround and ship anyway, debug after invite |
|
||
| Eval harness reveals Path B is slower than promised | Acceptable to invite testers without 100% Path B coverage as long as the prod-deploy-only path works. Path B is an upgrade, not a gate. |
|
||
| New users hit 100 unforeseen edge cases | This is the point of beta. Triage daily, fix the top-3 each morning. |
|
||
|
||
---
|
||
|
||
## Smoke-test runbook (4.1)
|
||
|
||
**Goal:** prove the user-visible flow from "first visit" through "shipped a deployed app" works end-to-end with all the new wiring (Sentry per-project, quotas, recovery middleware, URL chip popover, status-pill deep-link, deploy-failed Slack alerts).
|
||
|
||
**Setup:** open an incognito window. Have your Slack channel and Sentry dashboard visible in side tabs. You'll be the fresh user.
|
||
|
||
### Steps
|
||
|
||
1. **Visit `https://vibnai.com`** → sign up with Google (use a different gmail than your normal one if possible — keeps test data clean). Confirm you land on the workspace home.
|
||
2. **Create a project** (any path: build / oss / import). Pick a slug like `smoke-test-2026-05-01`.
|
||
- **Verify in Sentry:** within ~10s, a new project named `vibn-{your-workspace-slug}-smoke-test-2026-05-01` should appear at <https://vibnai.sentry.io/projects/>.
|
||
- **Verify in DB (optional):** `fs_projects.data.sentry.dsn` is populated for the new row.
|
||
3. **Land in chat.** AI should greet you and offer to scaffold something. Ask it to build something simple ("a Next.js todo app").
|
||
4. **Watch the preview start.** AI should call `devcontainer_ensure`, scaffold, then `dev_server_start`. A preview URL like `preview-0-{slug}-{token}.preview.vibnai.com` should be returned. Click it. Page should load over HTTPS with a valid cert.
|
||
5. **Edit something via chat.** Ask AI to add a button or change copy. HMR should update the preview without reload.
|
||
6. **Ship it.** Tell AI "ship it." It should `apps_create` against your Gitea repo + trigger Coolify deploy. Watch the project header status pill go Empty → Deploying → Live.
|
||
- **Verify in Coolify env:** the new app's env vars include `NEXT_PUBLIC_SENTRY_DSN` and `SENTRY_AUTH_TOKEN`.
|
||
- **Verify Slack:** if the deploy fails for any reason, your Slack channel pings within 30s. If it succeeds, no message (by design — we're noise-conscious).
|
||
7. **Trigger a real error in the deployed app.** Open the live URL, click around until something breaks. (If nothing breaks, ask AI to add a button that calls `myUndefinedFunction()`.)
|
||
- **Verify in Sentry:** the error lands in the new Sentry project within ~10s, **with a real stack trace** (file/line in your project's source). Session Replay should be available.
|
||
- **Open a new chat with this project** and say "what's broken?" → AI should call `project_recent_errors` and surface the issue with a fix suggestion. This is the killer-feature path.
|
||
8. **Hit the quota cap.** Try to create a 4th project. Should get a friendly 402 with the "delete one or contact support" wording, NOT a generic error. AI in chat should explain the cap clearly without retrying.
|
||
9. **Test the URL chip popover.** Once you have ≥4 URLs on a project (e.g. preview + live + 2 services), the project header should collapse to 3 chips + a `+N` pill. Click it; popover opens with the rest as clickable links. Click outside; popover closes. Press Escape; closes.
|
||
10. **Test the status-pill Logs link.** During a deploy, the "Logs" link next to the pill should one-click into the Coolify project page (not the root).
|
||
|
||
### What to do when something breaks
|
||
|
||
- Take a screenshot, open a Vibn chat in a separate (parent-account) tab, paste the screenshot, and say "this just broke during smoke test." AI now has Sentry access + can read recent errors itself.
|
||
- If a step is *very* broken, file a P0 against this checklist with the step number and what you saw.
|
||
|
||
### Pass criteria
|
||
|
||
- All 10 steps complete with no manual intervention by the AI's parent operator.
|
||
- Every "Verify" line returns the expected positive signal.
|
||
- Worst case the AI surfaces is a quota cap or known-recoverable error — never a generic "something went wrong."
|
||
|
||
---
|
||
|
||
## How to use this doc
|
||
|
||
- Treat phase boundaries as soft. If a P2 task unblocks a P3 task and you're
|
||
there, do it.
|
||
- When a task ships, check it off and move it under "Shipped" in
|
||
[`AI_CAPABILITIES.md`](./AI_CAPABILITIES.md).
|
||
- When the plan changes (it will), edit this doc directly, don't fork it.
|
||
- Beta success criteria: **5 testers, all reach "shipped a thing", weekly
|
||
active rate >60% in week 2.** If we miss those, the next plan is "what
|
||
did we get wrong."
|
||
|
||
---
|
||
|
||
## Phase 7 — Data Enrichment (Post-Beta)
|
||
|
||
**Goal:** Pre-populate BigQuery with national market sizes so the AI doesn't have to fetch TAM dynamically.
|
||
|
||
| # | Task | Owner | Effort | Notes |
|
||
|---|---|---|---|---|
|
||
| 7.1 | Run National TAM Backfill Script | Mark | 1 hr | Execute the `search/live` endpoint using `address_info.country_code = 'CA'` across all 4,000+ categories. This will cost ~$41.00 USD via DataForSEO. Write results to a new column in `vibn_market_data.gbp_categories` (e.g., `national_tam_ca`). |
|
||
| 7.2 | Run National TAM Backfill Script (US) | Mark | 1 hr | Same as above, using `country_code = 'US'`. |
|