From 78d468d365c1fee1d6996f9a5a1550ec7647a851 Mon Sep 17 00:00:00 2001 From: mawkone Date: Mon, 4 May 2026 13:22:21 -0700 Subject: [PATCH] plan: add Phase 6 (artifact-first UX) + model-assignment convention MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 6 captures the Bolt parity work that came out of today's review of Bolt screenshots — two-pane shell, preview-as-default, plan-as-checklist, settings popover, project-level secrets. Adds a "Model assignment convention" section so we can explicitly route mechanical work to a cheaper model and reserve Opus for judgment-heavy tasks. Each Phase 6 row tagged opus / cheap / opus-spec→cheap. Net: 9 hrs Opus, 8 hrs cheap. Also brings forward two items shipped today that weren't in the plan yet: - 5.7 dev container <-> Gitea wiring (auto-clone + auto-commit + GITEA_USERNAME fallback fix) - 3.8a/b/c "stop at something tangible" rule + reverted composer chip row + queued server-side enforcement Sequencing diagram + cadence note updated to include P6. Co-authored-by: Cursor --- BETA_LAUNCH_PLAN.md | 111 ++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 103 insertions(+), 8 deletions(-) diff --git a/BETA_LAUNCH_PLAN.md b/BETA_LAUNCH_PLAN.md index 7963c11..d9e8fd6 100644 --- a/BETA_LAUNCH_PLAN.md +++ b/BETA_LAUNCH_PLAN.md @@ -32,13 +32,34 @@ If any of those steps requires us in the loop, beta isn't ready. Sequenced by **leverage × blocking risk**. Earlier phases unblock later ones. ``` -P1 Previews unlock ── enables fast-iteration UX & demos ──┐ -P2 Stability & visibility ── stops silent rot ─────────────────┤ -P3 UX surfaces ── what the user actually touches ─────┼─── INVITE -P4 Onboarding & safety ── what a stranger needs day 1 ────────┤ -P5 Path B closeout ── ship the architectural commitments ─┘ +P1 Previews unlock ── enables fast-iteration UX & demos ──┐ +P2 Stability & visibility ── stops silent rot ───────────────────┤ +P3 UX surfaces ── what the user actually touches ─────┼─── INVITE +P4 Onboarding & safety ── what a stranger needs day 1 ────────┤ +P5 Path B closeout ── ship the architectural commitments ─┤ +P6 Artifact-first UX ── two-pane shell, preview-as-default ─┘ ``` +## Model assignment convention + +**Opus is reserved for judgment-heavy work** — anything that touches +multiple subsystems, has security implications, designs a protocol, +or requires reading existing architecture before deciding what to +ship. Mechanical, well-specified work goes to a cheaper coder model. + +Per-task tags inside each Phase table: + +- **opus** — architectural, cross-cutting, security-sensitive. Opus + reads the relevant code, decides the approach, writes the code. +- **cheap** — well-specified, single-file or local-scope, pattern + exists. Cheaper model executes from the row's notes. +- **opus-spec → cheap** — Opus writes a tight one-paragraph spec + in the row's notes (schema columns, function signature, exact + files to touch); cheaper model implements verbatim. + +If a row has no model tag, default is `cheap`. The expensive default +is opt-in, not opt-out. + --- ## Phase 1 — Previews unlock — **SHIPPED 2026-05-01** @@ -119,6 +140,10 @@ or gets out of the way. No screens that exist "to teach the data model". | 3.5 | Status pill: tooltip should link directly to Coolify build logs | AI | ✓ done 2026-05-01 | `components/project/project-stage-pill.tsx`: "Logs" affordance now appears on `deploying`, `down`, and `build_failed` (not just failures). Deep-links to `/project/` — one click from build logs. (Direct deployment-uuid link blocked on extending anatomy to surface deployment UUIDs; tracked but low priority.) | | 3.6 | Product tab: confirm it's actually useful day-to-day. Revise scope if not | Mark + AI | 1 hr | Open question | | 3.7 | **Scope-doc upload in Plan tab** — drop a PDF/.md/.docx/.txt as the project brief; server extracts text, stores on `fs_projects.brief_text` + `brief_meta`, exposes via `[PROJECT BRIEF]` block in system prompt and a `project_brief` MCP tool for on-demand grep. New file: `lib/integrations/brief-extract.ts`. Empty state replaces "nothing here" on Plan. | AI | 3 hrs | Came up during smoke test prep — users will arrive with scope docs (PDF/Notion-export/Doc); right now there's no way to hand the AI the source of truth except paste-into-chat. | +| 3.8 | **"Stop at something tangible" — three layers** | AI | partially done | Came up watching Manifest scaffold — AI stopped at "everything is wired together" with no preview, leaving the user to wonder if any of it was real. Code on disk is invisible; preview URL is the proof. | +| 3.8a | System-prompt rule: dedicated "Stop at something the user can see" section + tightened build-me-X recipe so `previewUrl` is the explicit stopping point | AI | ✓ done 2026-05-04 | `app/api/chat/route.ts` `buildSystemPrompt`. For multi-service stacks, instructs AI to start the user-facing service first even if other services aren't done. | +| 3.8b | ~~Persistent quick-action chips above the chat input~~ **REVERTED 2026-05-04** | AI | reverted | Tried it; pulled it. The chip menu was prescriptive ("here's what to type") which conflicts with the principle that the AI should drive toward the goal without presenting the user a menu of homework. Welcome-screen suggested prompts kept (different context — empty conversation, user genuinely needs a starting nudge). The `sendMessage(override)` refactor + welcome-screen auto-send shipped from this work survived; only the composer chip row was removed. | +| 3.8c | Server-side enforcement: if a turn called `fs_write` ≥10 times for source files but never `dev_server_start` or `apps_deploy`, append a synthetic recovery instruction telling the model to either start a server or explain the blocker | AI | 1 hr | Safety net for when the model ignores the prompt rule under load. Add a tracker in `app/api/chat/route.ts` tool loop, fire the instruction inside the round 2 system message. | **Definition of done:** a stranger lands on every tab in turn. None of them make us cringe. Each one either shows useful info or gives the user a @@ -160,12 +185,78 @@ that aren't covered above. | 5.4 | Eval harness: 10 reference prompts, measure time-to-first-preview, time-to-shipped, tool-call count, success rate | AI | 1–2 days | The actual proof Path B works | | 5.5 | Theia / openvscode-server toggle: "Open IDE" button in chat → `https://ide-{ws}-{project}.vibnai.com` | AI | 4 hrs | Week 4 nice-to-have; gates the "user becomes developer" graduation | | 5.6 | Idle-suspend cron — wire `POST /api/admin/path-b/idle-sweep` to a 5-min schedule once we trust it | AI | 30 min | Keeps cost bounded | +| 5.7 | **Persistent dev container ↔ Gitea wiring** — auto-clone repo into `/workspace//` on first chat turn; auto-commit + push at end of every turn so AI work surfaces in the Product tab without manual `gitea_*` calls | AI | ✓ done 2026-05-04 | `lib/dev-container-git.ts` (`ensureProjectRepoCloned`, `commitAndPushIfDirty`) wired into `app/api/chat/route.ts` pre-loop + turn-end. Tri-state probe (`git` / `dir` / `absent`) so projects with files-but-no-git auto-heal on next turn. Production fix shipped today: `GITEA_USERNAME` was missing from prod env so `isGiteaConfigured()` silently no-op'd; added the env value AND a defensive fallback to `GITEA_ADMIN_USER` in code. Backfilled `vibn-mark/manifest` repo manually from the dev container after the env fix. Smoke-tested by inspecting `/workspace/manifest/` over SSH bridge — 64 tracked files pushed, all 6 phase directories present. | **Definition of done:** eval harness reports ≥3× speedup on time-to-first-preview vs. Path A baseline, ≥80% success rate across the 10 reference prompts. --- +## Phase 6 — Artifact-first UX (Bolt parity, Vibn taste) + +**Goal:** the running app is the dominant surface on every project page, +not a thing-to-summon. The user should never have to wonder if the AI +actually built something — it's right there. Lift the structural +patterns Bolt/Lovable/v0 have proven (two-pane, preview chrome, plan- +as-checklist) without the dark glassy aesthetic. + +**Why now (after smoke test prep):** today the AI can scaffold a full +six-service stack and a non-technical founder still has no idea where +"their app" is. The composer chip + header Preview button (3.8) helped, +but the structural fix is two-pane. + +| # | Task | Owner | Effort | Model | Notes | +|---|---|---|---|---|---| +| 6.A1 | **Two-pane structural refactor** — replace `app/[workspace]/project/[projectId]/(home)/layout.tsx` shell. Left side: workspace sidebar (unchanged) → vertical icon rail (👁 / ⟨⟩ / 🗄 / ⚙) → permanent chat column ~380px. Right side: pure content pane keyed off icon rail selection. Default view = Preview. Coordinates with existing `ProjectStagePill`, `ProjectAssociationPrompt`, `--chat-panel-width` CSS var (now obsolete on project pages, kept elsewhere). | AI | 3 hrs | **opus** | Cross-cutting; touches layout, chat panel, header. Has to keep `ProjectHeaderUrls` working. | +| 6.A2 | **PreviewPane component** — iframe of `previewUrl || fqdn`, with empty-state placeholder ("Your preview will appear here" + Vibn V mark). Reads from `useAnatomy()`. Exports `kind: "preview"` rendered by 6.A1's right pane. | AI | 1 hr | opus-spec → cheap | Spec: read `anatomy.hosting.previews[0].url` first, fall back to `anatomy.hosting.live[0].fqdn`, fall back to placeholder. Same poll cadence as `ProjectHeaderUrls`. Iframe sandboxed with `allow-scripts allow-forms allow-same-origin allow-popups`. | +| 6.A3 | **Iframe chrome (artifact-local)** — top-right of the iframe: ↻ reload (force iframe `key` bump), ↗ open-in-new-tab, ⛶ fullscreen (toggles a panel-level `expanded` state that hides chat column), 📱 device-frame (desktop / tablet / mobile widths, persisted to localStorage per-project). | AI | 1 hr | cheap | Pure UI. Width tokens: desktop=100%, tablet=820px, mobile=390px. Frame is a CSS wrapper, not real device emulation. | +| 6.A4 | **Code view** — when ⟨⟩ is selected on the rail, render the existing `gitea-file-tree.tsx` + `gitea-file-viewer.tsx` in the right pane. Two-column inside the right pane: tree on the left, viewer on the right. | AI | 30 min | cheap | Components already exist; just compose them. Shared `selectedPath` state. | +| 6.A5 | **Resources view** — when 🗄 is selected, render the existing `database-table-tree.tsx` + `table-viewer.tsx` plus a small list of running services (from `anatomy.hosting.live[]` and `apps_containers_list`). | AI | 1 hr | cheap | Same wire-up pattern as 6.A4. | +| 6.B1 | **Persist last-known dev server config** — new table `fs_project_dev_servers (project_id PK, command, port, framework, last_started_at, status)`. Hook `dev_server_start` MCP tool to upsert on success; `dev_server_stop` to flip status. | AI | 1 hr | opus-spec → cheap | Spec: schema is `project_id UUID PK, command TEXT NOT NULL, port INT NOT NULL, framework TEXT, last_started_at TIMESTAMPTZ, status TEXT CHECK IN ('running','stopped','crashed')`. Migration in `lib/db-postgres-migrations.ts` pattern. Upsert in `lib/dev-server-manager.ts` (or wherever `dev_server_start` lives — find via `Grep`). | +| 6.B2 | **Auto-resume dev server on project page mount** — server-render hook on the new layout: if (a) saved server config exists AND (b) `getDevContainerStatus()` returns `running` or `provisioning` AND (c) no live preview already in `useAnatomy().hosting.previews[]` → fire the saved `dev_server_start` server-side BEFORE the page paints. User lands; preview is live. | AI | 2 hrs | **opus** | Risky if naive — could resume a server the user explicitly stopped, could thrash on idle-suspended containers, could race the existing on-mount `devcontainer_ensure`. Needs careful state-machine read. Idempotency comes from `dev_server_start` returning `alreadyRunning: true` when a process matches command+port. | +| 6.C1 | **SSE `plan` event protocol** — server emits `{ type: "plan", taskId, text, status: "queued"\|"in_progress"\|"done" }` whenever `plan_task_add` / `plan_task_complete` (or a new `plan_task_start`) MCP tool fires inside a chat turn. Coexists with existing `text` and `toolCall` events. | AI | 2 hrs | **opus** | Protocol design — has to handle ordering (plan event must land before the tool's `toolResult`), client-side reconciliation with `fs_projects.plan.tasks[]` on next page load (server is source of truth, SSE is a hot stream), and the case where the AI calls `plan_task_complete` for a task added in a prior turn. | +| 6.C2 | **Client TimelineEntry of `kind: "plan"`** — render a checklist with status circles (○ queued / ◐ in-progress / ● done) inside the assistant message timeline. Each new `plan` SSE event upserts by `taskId`. Ledger pattern matches the existing `kind: "text"` / `kind: "tool"` rendering in `chat-panel.tsx`. | AI | 1.5 hrs | opus-spec → cheap | Spec written into 6.C1's notes. Visual: indented under a "Plan" mini-header, same Outfit/Newsreader palette, status circles in `#a09a90` → `#3a3530` → `#1a1a1a`. | +| 6.C3 | **Share + Publish buttons on the new shell** — top-right of the right pane (next to artifact chrome). Share = copy `previewUrl \|\| fqdn`. Publish = fire existing `ship` MCP tool with auto-generated commit message. | AI | 30 min | cheap | Both are existing tool calls; just buttons. | +| 6.D1 | **⚙ Settings popover** — single popover off the icon rail's ⚙. Sections: Domain (shows current `fqdn`, link to rename), Sentry (link to project's Sentry dashboard from `projects.get`), Secrets (links to 6.D2), Quick-add database (fires 6.D3 modal), All project settings (link to `/[workspace]/project/[id]/settings`). | AI | 1 hr | cheap | Pattern matches existing `project-header-urls` popover. | +| 6.D2 | **Project-level secret scratchpad** — new table `fs_project_secrets (project_id, key, value_encrypted, created_at, updated_at)`. Encryption-at-rest via existing `lib/crypto.ts` (or scaffold one if absent — use AES-GCM with a workspace-scoped key derived from a master `VIBN_SECRETS_KEY` env). MCP tools: `secrets_get { projectId, key }`, `secrets_set { projectId, key, value }`, `secrets_list { projectId }` (returns keys only). AI can read/write user-supplied API keys before they're injected into a deploy. | AI | 2 hrs | **opus** | Security-sensitive: encryption scheme, key rotation story, tenant isolation, what gets logged. Needs careful handling, not pattern-matching. | +| 6.D3 | **Quick-add database modal** — fires `databases_create` MCP tool, blocks until `databases_get` returns a connection URL, surfaces the URL with a copy button + "I'll inject this into your app's env" affordance that calls `apps_envs_set` if a target app exists. | AI | 1 hr | cheap | Each step is an existing MCP call; modal orchestrates them. | + +**Definition of done:** open any project that's been built, the running +preview is already live in the right pane without clicking anything; +clicking ⟨⟩ shows the source tree; clicking 🗄 shows the databases; +the AI's plan streams in as a checklist instead of paragraphs; ⚙ +opens a single popover with all project config one click away. + +**Sequencing inside Phase 6:** + +``` +6.A1 (structural) ──> 6.A2..A5 (panels) ──┐ + ├── 6.D1 (settings popover) +6.B1 (persist) ──> 6.B2 (auto-resume) ──┘ +6.C1 (SSE) ──> 6.C2 (client renderer) ──> 6.C3 (share/publish) + └─ 6.D2 (secrets) ── 6.D3 (db modal) +``` + +A-track is the user-visible spine. B-track makes the spine populated +on first paint. C-track makes the AI feel like it's working. D-track +fills the settings drawer. + +**Out of scope for Phase 6 (intentional cuts, captured here so they +don't get pulled in):** + +- Built-app authentication scaffolding (auth-as-a-service for users' + apps) — multi-week, real product call. +- First-party connectors marketplace (Stripe / Twilio "click to add") + — multi-week per integration. AI can install via shell today. +- Multi-model picker / "Plan vs. Build" toggle on the composer — + defer until BYOK lands and we have something to switch between. +- Design-system picker on the composer — real product call. +- Knowledge-base / RAG beyond scope-doc upload (3.7). +- Server functions à la Bolt — different deploy model, not a gap. +- Mobile preview QR — only matters once mobile is a real target. + +--- + ## Sequencing & dependencies ``` @@ -179,11 +270,15 @@ P2.4, P2.5, P2.6, P2.7 (parallel, low priority) │ ├─ P4.2 (parallel) ├─ P4.3..4.8 (parallel) │ - └─ P5 (parallel; some pieces gated by P1) + ├─ P5 (parallel; some pieces gated by P1) + │ + └─ P6 (gated by P1 + P5.7; + 6.A1 unblocks the rest) ``` -P1 is the long pole. Everything else can mostly proceed in parallel once P1 -unblocks the iteration loop. +P1 is the long pole. P6 is the next big-leverage move once smoke test +passes — pre-invite UX upgrade, depends on previews (P1) and the +auto-clone wiring (5.7) both being solid. ---