feat: enable marketing site registration and launch-prompt preservation (T12)

This commit is contained in:
2026-06-06 18:16:19 -07:00
parent d1cb116e30
commit 135fc2d1e6
7 changed files with 153 additions and 625 deletions

View File

@@ -8,6 +8,10 @@
> >
> **Drafted:** 2026-04-30. **Owner:** Mark + AI. > **Drafted:** 2026-04-30. **Owner:** Mark + AI.
> >
> **Scope note for AI:** this plan is about the **vibnai.com web product** beta — it is *not* the `vibn-code`
> desktop thin-client effort (that's `VIBNCODE_THIN_CLIENT_CHANGES.md`). Treat dates/phases as historical;
> verify status against the codebase before acting.
>
> **Scope:** Everything we agreed in the 2026-04-30 review that's NOT already > **Scope:** Everything we agreed in the 2026-04-30 review that's NOT already
> shipped. Pulls in the unfinished items from Path B (DNS, cert, previews, > shipped. Pulls in the unfinished items from Path B (DNS, cert, previews,
> eval) AND the "before strangers see this" gaps that Path B doesn't cover > eval) AND the "before strangers see this" gaps that Path B doesn't cover

View File

@@ -1,5 +1,9 @@
# VibnCode: Cloud-Powered Agent Desktop IDE Architecture & Implementation Plan # VibnCode: Cloud-Powered Agent Desktop IDE Architecture & Implementation Plan
> **This is the original product VISION.** For the live, prioritized work (with exact files, steps, status, and
> what's already shipped), use **`VIBNCODE_THIN_CLIENT_CHANGES.md`**. Infra/deploy details are in `VIBNDEV.md`;
> new-thread bootstrap context is in `ai-new-thread.md`.
**Project Name:** `vibncode` (formerly TalkCody) **Project Name:** `vibncode` (formerly TalkCody)
**Target Architecture:** Desktop Thin Client with Monaco + Native Cloud Hosting Integration **Target Architecture:** Desktop Thin Client with Monaco + Native Cloud Hosting Integration
**Backend Platform:** Vibnai Cloud Infrastructure (`vibn-frontend`, `vibn-agent-runner`, Gitea, Coolify) **Backend Platform:** Vibnai Cloud Infrastructure (`vibn-frontend`, `vibn-agent-runner`, Gitea, Coolify)

View File

@@ -82,20 +82,56 @@ pnpm dev
`.env.local` needs: `DATABASE_URL`, `NEXTAUTH_URL`, `NEXTAUTH_SECRET`, `NEXT_PUBLIC_DEV_LOCAL_AUTH_EMAIL`, `NEXT_PUBLIC_DEV_BYPASS_PROJECT_AUTH`, `GOOGLE_API_KEY`, `COOLIFY_*`, `GITEA_*`, `VIBN_SECRETS_KEY`, plus optionally `VIBN_CHAT_PROVIDER=deepseek` and `DEEPSEEK_API_KEY`. `.env.local` needs: `DATABASE_URL`, `NEXTAUTH_URL`, `NEXTAUTH_SECRET`, `NEXT_PUBLIC_DEV_LOCAL_AUTH_EMAIL`, `NEXT_PUBLIC_DEV_BYPASS_PROJECT_AUTH`, `GOOGLE_API_KEY`, `COOLIFY_*`, `GITEA_*`, `VIBN_SECRETS_KEY`, plus optionally `VIBN_CHAT_PROVIDER=deepseek` and `DEEPSEEK_API_KEY`.
## Deploy vibn-frontend ## Git topology & deploying apps
**`master-ai` is ONE git repo.** `vibn-frontend/`, `vibn-agent-runner/`, and `vibn-api/` are **subfolders** of it
(not separate repos). `vibn-code/` is a **nested submodule** with its own `.git`. Each cloud app builds from its
**own Gitea remote**, from the matching subfolder (Coolify's base-directory points at the subfolder):
| App | Coolify app uuid | Push remote (run from anywhere in `master-ai`) | Builds from subfolder |
|---|---|---|---|
| vibn-frontend | `y4cscsc8s08c8808go0448s0` | `coolify_gitea` | `vibn-frontend/` |
| vibn-agent-runner | `jss08wssogw4kw8gok0sk0w0` | `coolify_agent_gitea` | `vibn-agent-runner/` |
| vibn-api | `m84cc4wsc0ckws8g8k44kkk8` | `coolify_api_gitea` | `vibn-api/` |
- `master-ai.git` (`gitea` remote) and GitHub (`origin`) are **share/mirror only — builds do NOT use them.**
- Secret `.env*` files at the repo root are **gitignored** (verified). Never commit them.
- These remotes share history, so `git push <remote> HEAD:main` fast-forwards (no force needed).
### Deploy steps (any app)
```sh ```sh
cd /Users/markhenderson/master-ai/vibn-frontend cd /Users/markhenderson/master-ai
git add -A && git commit -m "message" && git push origin main # 1. Commit the change (stage only the app's subfolder to keep commits scoped)
git add vibn-agent-runner/ && git commit -m "message"
# Then trigger deploy (correct endpoint for Coolify v4): # 2. Push to the app's deploy remote's main branch
git push coolify_agent_gitea HEAD:main # runner
# git push coolify_gitea HEAD:main # frontend
# 3. Trigger the Coolify deploy (correct endpoint for Coolify v4)
source /Users/markhenderson/master-ai/.coolify.env source /Users/markhenderson/master-ai/.coolify.env
curl -s -X POST \ curl -s -X POST -H "Authorization: Bearer $COOLIFY_API_TOKEN" \
-H "Authorization: Bearer $COOLIFY_API_TOKEN" \ "$COOLIFY_URL/api/v1/deploy?uuid=jss08wssogw4kw8gok0sk0w0" # runner uuid; use the frontend uuid for the frontend
"$COOLIFY_URL/api/v1/deploy?uuid=y4cscsc8s08c8808go0448s0"
``` ```
**Note:** `/api/v1/applications/{uuid}/start` or `/deploy` returns 404 on Coolify v4. The correct deploy path is `/api/v1/deploy?uuid=...`. Add `&force=true` to force a full rebuild. **Notes:**
- `/api/v1/applications/{uuid}/start` or `/deploy` returns 404 on Coolify v4. The correct deploy path is `/api/v1/deploy?uuid=...`. Add `&force=true` to force a full rebuild.
- The runner builds from `vibn-agent-runner/Dockerfile`, which runs `npm run build` (tsc) on `src/` — you do **not** need to hand-build `dist/` for the deploy (but keeping `dist/` in sync is tidy).
## The agent runner (chat backend)
`vibn-agent-runner` (FQDN `https://agents.vibnai.com`, port 3333) is what actually answers desktop/web chat:
- Frontend `POST /api/projects/:id/agent/sessions` inserts an `agent_sessions` row and fire-and-forgets
`POST {AGENT_RUNNER_URL}/agent/execute` to the runner. The runner clones the project's Gitea repo, runs the
**Coder** agent, and `PATCH`es output/status back to the session row (auth via `x-agent-runner-secret`).
- The desktop/web then polls `GET /api/projects/:id/agent/sessions/:sid` for streamed output.
- **Model:** set by the runner env `GEMINI_MODEL` (currently `gemini-3.1-pro-preview`). The desktop model picker
is cosmetic until model-passthrough is wired.
- Health check: `curl https://agents.vibnai.com/health``{"status":"ok"}`.
- The happy path of `/agent/execute` has **no logging** — only failures log. To inspect:
`gcloud compute ssh coolify-server-mtl --zone=northamerica-northeast1-a --project=master-ai-484822 --command="sudo docker logs --tail 100 jss08wssogw4kw8gok0sk0w0-<suffix>"` (find the exact container name with `docker ps`).
## Coolify API Reference ## Coolify API Reference

View File

@@ -59,8 +59,14 @@ DO NOT treat `master-ai` as a single monorepo on Gitea. You must push changes in
* `coolify_agent_gitea` : `https://git.vibnai.com/mark/vibn-agent-runner.git` * `coolify_agent_gitea` : `https://git.vibnai.com/mark/vibn-agent-runner.git`
* `coolify_gitea` : `https://git.vibnai.com/mark/vibn-frontend.git` * `coolify_gitea` : `https://git.vibnai.com/mark/vibn-frontend.git`
* `coolify_api_gitea` : `https://git.vibnai.com/mark/vibn-api.git` * `coolify_api_gitea` : `https://git.vibnai.com/mark/vibn-api.git`
* `gitea` : `https://git.vibnai.com/mark/master-ai.git` * `gitea` : `https://git.vibnai.com/mark/master-ai.git` *(share-only: for a coworker's local setup; **builds do NOT use this**)*
* `origin` : `https://github.com/MawkOne/master-ai.git` * `origin` : `https://github.com/MawkOne/master-ai.git` *(GitHub mirror)*
**How deploys actually work:** `master-ai` is a single git repo. Each cloud app builds from its **own** Gitea
remote, from the matching subfolder. To ship a change, commit in `master-ai`, then
`git push <remote> HEAD:main` (e.g. `git push coolify_agent_gitea HEAD:main` for the runner), then trigger the
Coolify deploy for that app (see `VIBNDEV.md`). `vibn-code` is a nested submodule with its own `.git` — commit &
push it via its own `origin`. Secret `.env*` files at the repo root are gitignored — never commit them.
--- ---
@@ -128,13 +134,30 @@ VibnCode overrides local OS actions to communicate with your cloud containers (o
--- ---
## 6. Where We Left Off (As of May 28, 2026) ## 6. Where We Left Off (As of May 31, 2026)
* **Deep-Link Protocol Scheme Resolved**: **Read `VIBNCODE_THIN_CLIENT_CHANGES.md` first** — it is the live, prioritized change list with exact files,
Fixed `src-tauri/Info.plist` which was still configured with `com.talkcody` / `talkcody`. macOS Launch Services now correctly maps `vibncode://` deep links directly to the local dev app. steps, and acceptance criteria for the thin-client conversion, plus a STATUS section of what's done.
* **Rust Compiling Errors Resolved**:
Patched cargo clippy errors in `dashscope.rs`, `openai_responses_protocol.rs`, and `openai_responses_ws.rs` (collapsed match statements and annotated unused structs). **Chat works end-to-end.** A desktop message → `POST /api/projects/:id/agent/sessions` → cloud runner executes
* **Repositories Synchronized**: the Coder agent (Gemini) → output polled back into the Monaco chat. Recent fixes that got it there:
Merged, committed, and pushed all updated code:
* `vibn-code` pushed to Gitea `origin main`. * **Local SQLite was wiping chats (fixed):** `database-service.ts` used `INSERT OR REPLACE INTO projects`, which
* `vibn-agent-runner` and `vibn-frontend` modifications pushed to `coolify_agent_gitea` and `coolify_gitea` on branch `frontend-deploy-13`. (via `ON DELETE CASCADE`) deleted the active conversation mid-run. Switched to UPSERT; made `task-service`
persistence non-blocking. The cloud is the source of truth; local SQLite is just a cache.
* **Empty `appPath` broke every run (fixed):** the desktop sent `appPath: ""`; the runner's `/agent/execute`
rejects falsy `appPath` with HTTP 400 and does nothing (no logs). Desktop now sends `appPath: "."`.
* **Agent tools `fetch failed` (fixed, pushed):** the runner's `buildContext()` hardcoded
`vibnApiUrl: 'http://localhost:3000'` and an empty `mcpToken`, so tool calls fetched a dead port. Now
`/agent/execute` reads `mcpToken` from the body and sets `ctx.vibnApiUrl` (from `VIBN_API_URL`) + `mcpToken`.
Pushed to `coolify_agent_gitea/main` — confirm the runner redeploy.
* **Single model:** desktop model picker restricted to the VibnAI model, relabeled "Gemini 3.5 Flash". The
runner's real model is set by `GEMINI_MODEL` env (currently `gemini-3.1-pro-preview`); the desktop label is
cosmetic until model-passthrough is wired (CHANGE 4.1 in the change doc).
**Known open items (in the change doc):** the desktop still has a hardcoded `vibn_sk_` API key to remove;
`/agent/sessions/:id/stop` returns 401 to the desktop (uses browser-session auth, not the workspace key); runner
early-failures are silently swallowed (failure PATCHes omit the `x-agent-runner-secret` header).
**Earlier (still true):** `vibncode://` deep link scheme is registered in `src-tauri/Info.plist`; Rust clippy is
treated as errors on commit.

File diff suppressed because one or more lines are too long

View File

@@ -1,601 +0,0 @@
# Vibn Chat Harness — Fix Checklist
Work through items in order. Each fix has a clear **What**, **Where**,
**How**, and **Verify** section. Don't skip the verify step — many of
these fixes interact with each other and silent failures will
compound.
Mark `[x]` as you complete each item. If you can't complete an item,
add a short note under it explaining why and move on.
---
## Phase 1 — Backend fixes (highest leverage; do these first)
These three fix the failure modes the prompt currently promises but
the backend doesn't deliver. Until they're done, the prompt's hard
rules are partly fiction.
### [ ] 1. Add `sha256` and `bytes` to `fs.write` and `fs.edit` responses
**What:** The prompt's hard rules tell the model to cite `sha256` and
`bytes` as evidence of file changes. The tools don't return those
fields today, so the model is looking for evidence that doesn't exist.
**Where:** `app/api/mcp/route.ts` — functions `toolFsWrite` and
`toolFsEdit`.
**How:**
- In `toolFsWrite`, after the `runFsCmd` success branch, before
returning, compute the sha256 of `content` and return it alongside
`bytesWritten` renamed to `bytes`:
```ts
import { createHash } from "crypto";
// ...
const bytes = Buffer.byteLength(content, "utf8");
const sha256 = createHash("sha256").update(content, "utf8").digest("hex");
return NextResponse.json({
result: { ok: true, path, bytes, sha256 },
});
```
- In `toolFsEdit`, you don't have the final content in memory. Add a
second command that prints the sha + bytes after the edit:
```ts
const cmd = `python3 -c "$(printf %s ${shq(pyB64)} | base64 -d)" <<< "$(printf %s ${shq(b64)} | base64 -d)" && echo "---" && sha256sum ${shq(path)} | cut -d' ' -f1 && wc -c < ${shq(path)}`;
```
Then parse the trailing two lines after `---` for sha and bytes.
- Update the response shape:
```ts
return NextResponse.json({
result: { ok: true, path, replacements, bytes, sha256 },
});
```
**Verify:**
- [ ] Call `fs_write` with `{ path: "test.txt", content: "hello" }`.
Confirm response contains `sha256` (64 hex chars) and `bytes: 5`.
- [ ] Call `fs_edit` to change the same file. Confirm response
contains a new `sha256` and updated `bytes`.
- [ ] Replay a turn that does `fs_write` followed by `fs_read` of the
same file in chat. The model should now produce text like
"Updated `test.txt` (sha=a3f5c2…, 5b)" instead of a bare claim.
---
### [ ] 2. Add project-slug scoping to `normalizeFsPath`
**What:** The prompt tells the model to use paths like `src/app/page.tsx`
and claims the tool layer rejects writes outside the project root.
The tool layer does NOT do this today. It resolves all relative paths
under `/workspace` (workspace-level), so `fs_write { path: "src/app/page.tsx" }`
ends up at `/workspace/src/app/page.tsx` — the ghost file from the
failing session. Five different path conventions were used for the
same file in one session because nothing enforces the rule.
**Where:** `app/api/mcp/route.ts` — function `normalizeFsPath` and
every caller in `toolFsRead`, `toolFsWrite`, `toolFsEdit`,
`toolFsList`, `toolFsDelete`, `toolFsGlob`, `toolFsGrep`, `toolFsTree`,
`toolRequestVisualQA`, `toolGenerateMedia`.
**How:**
- Change `normalizeFsPath` to accept an optional `projectSlug`:
```ts
function normalizeFsPath(
p: string,
projectSlug?: string,
): string | NextResponse {
if (!p || typeof p !== "string") {
return NextResponse.json(
{ error: 'Param "path" is required' },
{ status: 400 },
);
}
const projectRoot = projectSlug ? `${FS_ROOT}/${projectSlug}` : FS_ROOT;
let abs: string;
if (p.startsWith("/")) {
abs = p;
} else {
abs = `${projectRoot}/${p}`.replace(/\/+/g, "/");
}
const norm = abs.replace(/\/[^/]+\/\.\.(?=\/|$)/g, "").replace(/\/+/g, "/");
// When projectSlug is set, REJECT paths outside the project root.
if (projectSlug) {
if (!norm.startsWith(projectRoot) && norm !== projectRoot) {
return NextResponse.json(
{
ok: false,
error: `PATH_OUTSIDE_PROJECT: path "${p}" resolves to "${norm}" which is outside the active project at "${projectRoot}". Did you mean "${projectRoot}/${p.replace(/^\/+/, "")}"?`,
},
{ status: 400 },
);
}
} else {
// Workspace-level fallback (legacy behaviour)
if (!norm.startsWith(FS_ROOT) && norm !== FS_ROOT) {
return NextResponse.json(
{ error: `Path "${p}" is outside ${FS_ROOT}; use shell.exec for system paths.` },
{ status: 400 },
);
}
}
return norm;
}
```
- In every fs_* tool that already calls `resolveProjectOr404`, pass
the slug:
```ts
const path = normalizeFsPath(String(params.path ?? ""), project.slug);
```
- `toolFsRead`, `toolFsWrite`, `toolFsEdit`, `toolFsDelete`,
`toolRequestVisualQA`, `toolGenerateMedia` all already have
`project` in scope — pass `project.slug`.
- `toolFsList`, `toolFsGlob`, `toolFsGrep`, `toolFsTree` use
`params.path` or `params.cwd` — same treatment, pass `project.slug`.
**Verify:**
- [ ] From a project-scoped thread, call `fs_write { path: "/workspace/src/app/page.tsx", content: "x" }`.
Expect `PATH_OUTSIDE_PROJECT` error.
- [ ] From the same thread, call `fs_write { path: "src/app/page.tsx", content: "x" }`.
Expect success at `/workspace/<slug>/src/app/page.tsx`.
- [ ] Confirm `dev_server_start` with `command: "npm run dev"` runs
from the project root, not `/workspace`. (This is mostly already
true via dev-container logic; just confirm.)
---
### [ ] 3. Fix the broken `plan-extract` block
**What:** The fire-and-forget `plan-extract` block in
`app/api/chat/route.ts` has a syntax error — the `try` block builds a
transcript and then hits `}` followed by `catch` with no actual call
to `autoExtractPlanUpdates`. The body of the try is missing. Either
the auto-extraction was intentionally removed (in which case the
dead transcript-building code should also be deleted) or it was
accidentally truncated (in which case the call needs to be restored).
**Where:** `app/api/chat/route.ts`, around the second fire-and-forget
block (after the title/summary block, before `emit({ type: "done" })`).
**How:**
- Decide first: do we want auto-extraction to run? If YES, restore
the call:
```ts
(async () => {
try {
if (!threadProjectId) return;
const allMessages = [...history, finalMsg];
if (allMessages.length < 2) return;
const transcript = allMessages
.map((m) => {
const text =
typeof m.content === "string"
? m.content
: JSON.stringify(m.content);
return `${m.role.toUpperCase()}: ${text.slice(0, 1200)}`;
})
.join("\n\n");
const result = await autoExtractPlanUpdates(
threadProjectId,
transcript,
);
if (result) {
console.log(
"[chat] plan-extract:",
`${result.tasks} tasks, ${result.decisions} decisions, vision=${result.vision}`,
);
}
} catch (err) {
console.warn("[chat] plan-extract failed (non-fatal):", err);
}
})().catch(() => {});
```
And re-add the `import { autoExtractPlanUpdates } from "@/lib/ai/plan-extract";`
at the top of the file.
- If NO (you removed it intentionally), delete the entire IIFE
including the transcript-building so the file compiles cleanly.
**Verify:**
- [ ] Run `tsc --noEmit` on the file. Confirm no syntax errors.
- [ ] If auto-extraction restored: have a chat that mentions a
decision (e.g. "let's use Postgres"). Confirm a new entry appears
in the project's `plan.decisions` with `confidence: "auto"`.
- [ ] Tail prod logs for `[chat] plan-extract:` — should fire on
every turn with content.
---
## Phase 2 — Prompt fixes (now that the backend matches)
These bring the prompt into line with what the tools actually do.
### [ ] 4. Fix the `apps_containers_list` typo in the prompt
**What:** The troubleshooting section references `apps_containers_list`
but the actual tool is `apps_containers_ps`. The model will call a
tool that doesn't exist.
**Where:** `app/api/chat/route.ts`, inside `buildSystemPrompt`, in the
"## Troubleshooting" section.
**How:**
- Find: `apps_logs { uuid } + apps_containers_list { uuid }`
- Replace: `apps_logs { uuid } + apps_containers_ps { uuid }`
**Verify:**
- [ ] Grep the prompt for `apps_containers_list` — no matches.
- [ ] Grep for `apps_containers_ps` — should appear in
troubleshooting and at least once in the apps section.
---
### [ ] 5. Soften the `ok` field rule
**What:** The current rule says "If `ok` is false (or absent, or
`exitCode` is non-zero, or `healthCheck.status` is >= 400) the
operation FAILED." The "or absent" clause is wrong — many tools
return data without an `ok` field (e.g. `projects_get`, `apps_list`,
`databases_get`). The model will treat every read as a failure.
**Where:** `buildSystemPrompt`, "Hard rules" section, the "Trust the
`ok` field" bullet.
**How:**
Replace the current rule with:
```
- **Read tool results carefully.** A tool FAILED when ANY of these
signals are present: `ok: false`, `error: "..."`, a non-zero
`exitCode`, or a `healthCheck.status` >= 400. If NONE of those
signals are present, look at the actual content of the response
to decide whether the operation succeeded. Many read-only tools
return data directly without an `ok` field — that's not a failure.
```
**Verify:**
- [ ] Pick a recent thread where the agent called `projects_get` or
`apps_list`. Confirm the agent didn't treat the response as a
failure (look at its post-tool text — should be a normal summary,
not "the operation failed").
---
### [ ] 6. Tighten the status-nudge threshold
**What:** Current thresholds are `roundsSinceText >= 8 ||
toolCallsSinceText >= 12`. With `MAX_TOOL_ROUNDS = 8`, the round-based
nudge can never fire (loop ends first). The tool-call threshold of 12
is also too lenient — users typed "test" / "hello" by round 4-5 of
silence in the failing session.
**Where:** `app/api/chat/route.ts`, near the top of the main while
loop, the `isSilent` constant.
**How:**
```ts
const isSilent = roundsSinceText >= 3 || toolCallsSinceText >= 6;
```
**Verify:**
- [ ] Replay a chat that triggers 6+ tool calls without text. Confirm
the `[STATUS NUDGE]` system addendum is injected before the next
round.
- [ ] Confirm the model produces a one-line status sentence in
response to the nudge.
---
### [ ] 7. Update the path-convention guidance in the prompt
**What:** After Fix 2 ships, the path convention is now enforced. The
prompt should state this plainly without the "cd into your project"
workaround.
**Where:** `buildSystemPrompt`, inside `activeBlock`, the path
guidance section. Also inside "Dev servers" → "Directory" bullet.
**How:**
Replace the "Directory" bullet under "Dev servers":
```
- **Directory:** Tool paths are scoped to your project root
automatically. Pass `command: "npm run dev"` directly — no `cd`
prefix needed. The tool rejects any `fs_*` write outside
`/workspace/<slug>/`.
```
And after the "Project repo is auto-cloned" paragraph in
`activeBlock`, add:
```
**Path convention for fs_* tools:** Pass paths relative to the
project root — `src/app/page.tsx`, NOT `/workspace/<slug>/src/app/page.tsx`
and NOT `<slug>/src/app/page.tsx`. The tool layer rejects writes
outside the project root with a `PATH_OUTSIDE_PROJECT` error
suggesting the corrected path.
```
**Verify:**
- [ ] In a fresh chat, ask the agent to "edit the homepage". Confirm
the first `fs_read` call uses `src/app/page.tsx` (no slug prefix,
no `/workspace/` prefix).
---
## Phase 3 — Polish and safety nets
These are lower-priority but each removes a small foot-gun.
### [ ] 8. Add `fs_tree` recommendation to first-turn behavior
**What:** The agent ran `fs_tree` 5 times and `fs_glob` 9+ times in
the failing session, re-discovering paths it should have learned once.
The tool description already says "ALWAYS call this first" but the
prompt doesn't reinforce it.
**Where:** `buildSystemPrompt`, in the "Writing code" section.
**How:**
Add this near the top of "Writing code — dev container is the default":
```
**Orient yourself once.** On the first code-modifying turn of a
chat, call `fs_tree` once to learn the repo layout. Don't re-run it
on every turn — the layout doesn't change between user messages.
```
**Verify:**
- [ ] Manual review of the next 5 sessions: confirm `fs_tree` is
called at most once per chat (not per turn).
---
### [ ] 9. Add `browser_navigate` and `browser_console` as verification primitives
**What:** The backend has `browser_navigate` and `browser_console`
tools that headlessly render a page and capture console errors. The
prompt never mentions them. These are the missing post-deploy
verification step that the `healthCheck` field gestures at.
**Where:** `buildSystemPrompt`, in the "Dev servers" subsection or as
its own section after "Visual QA".
**How:**
Add a new bullet right after "Visual QA" guidance:
```
**Verify the page actually renders:**
- After `dev_server_start` returns a `previewUrl` AND `healthCheck.status === 200`,
for any UI-facing turn, call `browser_console { url: previewUrl }` to
capture frontend console errors. Hydration errors, missing assets,
and uncaught exceptions show up here even when the server is
technically "running".
- If `browser_console` returns errors, fix them with `fs_edit`
before declaring done. A green `healthCheck` plus a clean console
is the real "done" signal for UI work.
- Skip this for backend / SQL / config-only changes.
```
**Verify:**
- [ ] On the next UI-modifying chat, confirm `browser_console` is
called once after `dev_server_start`.
- [ ] Confirm any errors it returns get acknowledged in the agent's
reply.
---
### [ ] 10. Add the market research stack to the prompt
**What:** The backend exposes six market research tools
(`market_categories_suggest`, `market_research_run`, `market_seo_analyze`,
`tech_stack_analyze`, `market_competitor_research`,
`market_aggregate_insights`). The prompt never mentions them. For a
non-technical-founder product, these are some of the highest-leverage
tools — they answer "should I build for dentists or summer camps?"
with real TAM counts.
**Where:** `buildSystemPrompt`, add a new section after "Common
questions → tools" or before "How to deploy".
**How:**
Add this section:
```
## Helping the user pick what to build
Vibn has a market-research toolkit for non-technical founders who
need data on their target niche. Use it when the user is undecided,
validating an idea, or comparing markets:
- **"How big is the market for X in <location>?"** → `market_categories_suggest { niche }` to
propose Google Business categories, then `market_research_run` after
the user approves. Returns TAM count, sample domains, and review
data. NOTE: `market_research_run` costs real money — always confirm
with the user and pass `user_explicitly_approved: true`.
- **"What are competitors spending on Google Ads?"** → `market_seo_analyze { domain }`.
Returns organic traffic, paid traffic, ad spend, and top paid
keywords. Use to tell the user how aggressive a market is.
- **"What software do these businesses already use?"** → `tech_stack_analyze { urls, software_category_id }`.
Detects WordPress, Shopify, named competitors, and any custom
domains/scripts you pass. Use to find "X businesses use WordPress
but lack Y" market gaps.
- **"What are customers complaining about?"** → `market_aggregate_insights { category, location }`.
Returns top review topics — use the actual words customers use as
marketing copy and value-prop seeds.
- **"Who are the players in this niche?"** → `market_competitor_research { niche }`.
Returns proprietary competitors with pricing AND open-source
alternatives that could be forked.
These are conversational research tools — they don't build anything.
Use them BEFORE scaffolding when the user is exploring direction;
SKIP them once the user has committed to building.
```
**Verify:**
- [ ] In a fresh chat, ask the agent "should I build for dentists or
summer camps?". Confirm it proposes using `market_categories_suggest`
or `market_aggregate_insights` rather than guessing.
---
### [ ] 11. Surface `apps_exec`, `auth_create`, `generate_media`, `storage_*` briefly
**What:** Several capabilities the agent has are completely absent
from the prompt. The agent doesn't know it can run commands inside
production containers, deploy real auth servers, generate images, or
wire S3 storage. One-line mentions are enough.
**Where:** `buildSystemPrompt`, scattered across existing sections.
**How:**
- Under "Common questions → tools", add:
```
- "Run a migration / psql in prod" → `apps_exec { uuid, command }`.
- "Generate a hero image / illustration" → `generate_media { prompt, type, outputPath }`.
- "Wire up file storage / uploads" → `storage_provision` (if not already), then `storage_inject_env { uuid }`.
```
- In the "Decision defaults" section, augment the auth bullet:
```
- **Auth:** NextAuth with email magic-link for in-app auth.
Deploy a separate Pocketbase / Authentik / Keycloak service via
`auth_create { provider }` only if the user needs SSO, multi-app
SSO, or admin user management.
```
**Verify:**
- [ ] Ask the agent "can you run a SQL migration on prod?". Confirm
it references `apps_exec`.
- [ ] Ask "I need a hero image for the landing page". Confirm it
references `generate_media`.
---
### [ ] 12. Flip the `fs_edit` preference to match the tool's documentation
**What:** The tool description says `startLine`/`endLine` is
preferred and `oldString` is the fallback. The prompt currently says
the opposite ("prefer `oldString` for small replacements"). The tool
is authoritative — match it.
**Where:** `buildSystemPrompt`, in the "Iterate" bullet under "Writing
code", the `fs_edit` guidance.
**How:**
Replace the current `fs_edit` guidance with:
```
- `fs_read` / `fs_write` / `fs_edit { path, oldString, newString, startLine, endLine }`.
**For `fs_edit`:** prefer `startLine`/`endLine` (deterministic; never
fails on duplicate strings). Use `oldString` only when you cannot
read the file first to get line numbers — and when you do, include
2-3 lines of surrounding context for uniqueness. If `fs_edit` keeps
failing, do NOT escape to `shell_exec` with patch scripts — read
the file fresh with `fs_read`, get the line numbers, and try again.
```
**Verify:**
- [ ] Confirm next 3 `fs_edit` calls in the wild use `startLine`/`endLine`,
not `oldString`.
---
## Phase 4 — Monitoring (after Phases 1-3 land)
Once the changes are in production, watch these for a week. Tune if
the numbers don't move.
### [ ] 13. Track the recovery-summary fire rate
**What:** The `needsRecovery` path runs when a turn ends badly (hit
the round cap, hit a loop, or last tool returned failure). It should
fire on <10% of turns. If it fires more often, the cap is too low or
the model is hitting real bugs.
**How:** Add a metric. In the `needsRecovery` block, before calling
`callVibnChat`, emit:
```ts
console.log("[chat] recovery_fired", {
turnId,
reason: loopBreakReason ? "loop"
: round >= MAX_TOOL_ROUNDS ? "round_cap"
: assistantText.trim().length === 0 ? "no_text"
: "tool_failure",
toolCalls: assistantToolCalls.length,
});
```
**Verify:**
- [ ] Aggregate over 1 week. If `round_cap` is the dominant reason
and the turns look legitimate, raise `MAX_TOOL_ROUNDS` to 10 or 12.
- [ ] If `loop` is dominant, the fingerprinter may need tuning.
---
### [ ] 14. Track conversational-guard fire rate
**What:** The first-turn conversational guard (regex match on user
message → no tools on round 1) is the biggest single behavioral
change in this revision. We want to know how often it fires and
whether it ever causes a problem.
**How:** Before the loop, after computing `firstMessageIsConversational`:
```ts
console.log("[chat] turn_start", {
turnId,
firstMessageIsConversational,
messagePreview: message.trim().slice(0, 80),
});
```
**Verify:**
- [ ] Aggregate over 1 week. Should fire on ~30-40% of first messages
in a chat.
- [ ] Spot-check 10 cases where it fired: confirm none were
legitimate "build me X" requests being miscategorized.
---
### [ ] 15. Track `PATH_OUTSIDE_PROJECT` rejections
**What:** After Fix 2 ships, this error tells you how often the model
was about to write to the wrong path. Should taper toward zero as the
prompt guidance + enforcement settles in.
**How:** Server-side log in `normalizeFsPath` when the project-scoped
rejection fires.
**Verify:**
- [ ] Week 1: count rejections per day.
- [ ] Week 2: count should be lower (model has internalized the rule
via the error messages it's seen).
---
## What this checklist deliberately doesn't include
A few things from earlier reviews that I'm intentionally leaving off:
- **Phase-aware behavior.** Phases were removed from the product; the
prompt no longer references them. No work needed.
- **Codebase summary auto-generation.** Lower priority once Fix 2
ships (paths can no longer drift) and Fix 8 lands (one `fs_tree`
per chat instead of nine).
- **Tool history reconstruction for DeepSeek compatibility.** Already
shipped in the current code via the `_rawToolResults` compact
summary. No additional work.
Don't add work for these unless a clear failure mode appears in
production after Phases 1-3 land.

View File

@@ -2354,7 +2354,7 @@ function Closing() {
<div className="closing-cta"> <div className="closing-cta">
<div className="row"> <div className="row">
<a href="Beta Signup.html" className="btn btn-primary"> <a href="/auth?new=1" className="btn btn-primary">
Request invite <Arrow /> Request invite <Arrow />
</a> </a>
<a href="#how" className="btn btn-ghost"> <a href="#how" className="btn btn-ghost">
@@ -2598,6 +2598,17 @@ function LaunchModal({ prompt, onClose }) {
return () => window.removeEventListener("keydown", onKey); return () => window.removeEventListener("keydown", onKey);
}, [onClose]); }, [onClose]);
// Preserve prompt for onboarding seeding (T12)
useEffect(() => {
if (typeof window !== "undefined" && prompt) {
try {
localStorage.setItem("vibn:firstName", prompt);
} catch (err) {
console.error("Failed to save hero prompt to localStorage:", err);
}
}
}, [prompt]);
const [step, setStep] = useState(0); const [step, setStep] = useState(0);
useEffect(() => { useEffect(() => {
if (step >= 4) return undefined; if (step >= 4) return undefined;
@@ -2605,6 +2616,17 @@ function LaunchModal({ prompt, onClose }) {
return () => clearTimeout(t); return () => clearTimeout(t);
}, [step]); }, [step]);
const [redirectCount, setRedirectCount] = useState(3);
useEffect(() => {
if (step < 4) return undefined;
if (redirectCount <= 0) {
window.location.href = "/auth";
return undefined;
}
const t = setTimeout(() => setRedirectCount(redirectCount - 1), 1000);
return () => clearTimeout(t);
}, [step, redirectCount]);
return ( return (
<div className="modal-backdrop" onClick={onClose}> <div className="modal-backdrop" onClick={onClose}>
<style>{` <style>{`
@@ -2693,9 +2715,50 @@ function LaunchModal({ prompt, onClose }) {
))} ))}
</div> </div>
<div className="modal-foot"> {step === 4 ? (
No homework · No setup · No new tools to learn <div
</div> className="modal-actions"
style={{
marginTop: "24px",
display: "flex",
flexDirection: "column",
gap: "12px",
alignItems: "center",
}}
>
<a
href="/auth"
className="btn btn-primary"
style={{
width: "100%",
height: "48px",
display: "inline-flex",
alignItems: "center",
justifyContent: "center",
gap: "8px",
fontSize: "15px",
fontWeight: "600",
textDecoration: "none",
}}
>
Launch Your Workspace <Arrow size={14} />
</a>
<span
style={{
fontSize: "11px",
color: "var(--fg-faint)",
fontFamily: "var(--font-mono)",
letterSpacing: "0.04em",
}}
>
Redirecting to registration in {redirectCount}s...
</span>
</div>
) : (
<div className="modal-foot">
No homework · No setup · No new tools to learn
</div>
)}
</div> </div>
</div> </div>
); );