feat: enable marketing site registration and launch-prompt preservation (T12)
This commit is contained in:
@@ -8,6 +8,10 @@
|
||||
>
|
||||
> **Drafted:** 2026-04-30. **Owner:** Mark + AI.
|
||||
>
|
||||
> **Scope note for AI:** this plan is about the **vibnai.com web product** beta — it is *not* the `vibn-code`
|
||||
> desktop thin-client effort (that's `VIBNCODE_THIN_CLIENT_CHANGES.md`). Treat dates/phases as historical;
|
||||
> verify status against the codebase before acting.
|
||||
>
|
||||
> **Scope:** Everything we agreed in the 2026-04-30 review that's NOT already
|
||||
> shipped. Pulls in the unfinished items from Path B (DNS, cert, previews,
|
||||
> eval) AND the "before strangers see this" gaps that Path B doesn't cover
|
||||
|
||||
@@ -1,5 +1,9 @@
|
||||
# VibnCode: Cloud-Powered Agent Desktop IDE Architecture & Implementation Plan
|
||||
|
||||
> **This is the original product VISION.** For the live, prioritized work (with exact files, steps, status, and
|
||||
> what's already shipped), use **`VIBNCODE_THIN_CLIENT_CHANGES.md`**. Infra/deploy details are in `VIBNDEV.md`;
|
||||
> new-thread bootstrap context is in `ai-new-thread.md`.
|
||||
|
||||
**Project Name:** `vibncode` (formerly TalkCody)
|
||||
**Target Architecture:** Desktop Thin Client with Monaco + Native Cloud Hosting Integration
|
||||
**Backend Platform:** Vibnai Cloud Infrastructure (`vibn-frontend`, `vibn-agent-runner`, Gitea, Coolify)
|
||||
|
||||
52
VIBNDEV.md
52
VIBNDEV.md
@@ -82,20 +82,56 @@ pnpm dev
|
||||
|
||||
`.env.local` needs: `DATABASE_URL`, `NEXTAUTH_URL`, `NEXTAUTH_SECRET`, `NEXT_PUBLIC_DEV_LOCAL_AUTH_EMAIL`, `NEXT_PUBLIC_DEV_BYPASS_PROJECT_AUTH`, `GOOGLE_API_KEY`, `COOLIFY_*`, `GITEA_*`, `VIBN_SECRETS_KEY`, plus optionally `VIBN_CHAT_PROVIDER=deepseek` and `DEEPSEEK_API_KEY`.
|
||||
|
||||
## Deploy vibn-frontend
|
||||
## Git topology & deploying apps
|
||||
|
||||
**`master-ai` is ONE git repo.** `vibn-frontend/`, `vibn-agent-runner/`, and `vibn-api/` are **subfolders** of it
|
||||
(not separate repos). `vibn-code/` is a **nested submodule** with its own `.git`. Each cloud app builds from its
|
||||
**own Gitea remote**, from the matching subfolder (Coolify's base-directory points at the subfolder):
|
||||
|
||||
| App | Coolify app uuid | Push remote (run from anywhere in `master-ai`) | Builds from subfolder |
|
||||
|---|---|---|---|
|
||||
| vibn-frontend | `y4cscsc8s08c8808go0448s0` | `coolify_gitea` | `vibn-frontend/` |
|
||||
| vibn-agent-runner | `jss08wssogw4kw8gok0sk0w0` | `coolify_agent_gitea` | `vibn-agent-runner/` |
|
||||
| vibn-api | `m84cc4wsc0ckws8g8k44kkk8` | `coolify_api_gitea` | `vibn-api/` |
|
||||
|
||||
- `master-ai.git` (`gitea` remote) and GitHub (`origin`) are **share/mirror only — builds do NOT use them.**
|
||||
- Secret `.env*` files at the repo root are **gitignored** (verified). Never commit them.
|
||||
- These remotes share history, so `git push <remote> HEAD:main` fast-forwards (no force needed).
|
||||
|
||||
### Deploy steps (any app)
|
||||
|
||||
```sh
|
||||
cd /Users/markhenderson/master-ai/vibn-frontend
|
||||
git add -A && git commit -m "message" && git push origin main
|
||||
cd /Users/markhenderson/master-ai
|
||||
# 1. Commit the change (stage only the app's subfolder to keep commits scoped)
|
||||
git add vibn-agent-runner/ && git commit -m "message"
|
||||
|
||||
# Then trigger deploy (correct endpoint for Coolify v4):
|
||||
# 2. Push to the app's deploy remote's main branch
|
||||
git push coolify_agent_gitea HEAD:main # runner
|
||||
# git push coolify_gitea HEAD:main # frontend
|
||||
|
||||
# 3. Trigger the Coolify deploy (correct endpoint for Coolify v4)
|
||||
source /Users/markhenderson/master-ai/.coolify.env
|
||||
curl -s -X POST \
|
||||
-H "Authorization: Bearer $COOLIFY_API_TOKEN" \
|
||||
"$COOLIFY_URL/api/v1/deploy?uuid=y4cscsc8s08c8808go0448s0"
|
||||
curl -s -X POST -H "Authorization: Bearer $COOLIFY_API_TOKEN" \
|
||||
"$COOLIFY_URL/api/v1/deploy?uuid=jss08wssogw4kw8gok0sk0w0" # runner uuid; use the frontend uuid for the frontend
|
||||
```
|
||||
|
||||
**Note:** `/api/v1/applications/{uuid}/start` or `/deploy` returns 404 on Coolify v4. The correct deploy path is `/api/v1/deploy?uuid=...`. Add `&force=true` to force a full rebuild.
|
||||
**Notes:**
|
||||
- `/api/v1/applications/{uuid}/start` or `/deploy` returns 404 on Coolify v4. The correct deploy path is `/api/v1/deploy?uuid=...`. Add `&force=true` to force a full rebuild.
|
||||
- The runner builds from `vibn-agent-runner/Dockerfile`, which runs `npm run build` (tsc) on `src/` — you do **not** need to hand-build `dist/` for the deploy (but keeping `dist/` in sync is tidy).
|
||||
|
||||
## The agent runner (chat backend)
|
||||
|
||||
`vibn-agent-runner` (FQDN `https://agents.vibnai.com`, port 3333) is what actually answers desktop/web chat:
|
||||
|
||||
- Frontend `POST /api/projects/:id/agent/sessions` inserts an `agent_sessions` row and fire-and-forgets
|
||||
`POST {AGENT_RUNNER_URL}/agent/execute` to the runner. The runner clones the project's Gitea repo, runs the
|
||||
**Coder** agent, and `PATCH`es output/status back to the session row (auth via `x-agent-runner-secret`).
|
||||
- The desktop/web then polls `GET /api/projects/:id/agent/sessions/:sid` for streamed output.
|
||||
- **Model:** set by the runner env `GEMINI_MODEL` (currently `gemini-3.1-pro-preview`). The desktop model picker
|
||||
is cosmetic until model-passthrough is wired.
|
||||
- Health check: `curl https://agents.vibnai.com/health` → `{"status":"ok"}`.
|
||||
- The happy path of `/agent/execute` has **no logging** — only failures log. To inspect:
|
||||
`gcloud compute ssh coolify-server-mtl --zone=northamerica-northeast1-a --project=master-ai-484822 --command="sudo docker logs --tail 100 jss08wssogw4kw8gok0sk0w0-<suffix>"` (find the exact container name with `docker ps`).
|
||||
|
||||
## Coolify API Reference
|
||||
|
||||
|
||||
@@ -59,8 +59,14 @@ DO NOT treat `master-ai` as a single monorepo on Gitea. You must push changes in
|
||||
* `coolify_agent_gitea` : `https://git.vibnai.com/mark/vibn-agent-runner.git`
|
||||
* `coolify_gitea` : `https://git.vibnai.com/mark/vibn-frontend.git`
|
||||
* `coolify_api_gitea` : `https://git.vibnai.com/mark/vibn-api.git`
|
||||
* `gitea` : `https://git.vibnai.com/mark/master-ai.git`
|
||||
* `origin` : `https://github.com/MawkOne/master-ai.git`
|
||||
* `gitea` : `https://git.vibnai.com/mark/master-ai.git` *(share-only: for a coworker's local setup; **builds do NOT use this**)*
|
||||
* `origin` : `https://github.com/MawkOne/master-ai.git` *(GitHub mirror)*
|
||||
|
||||
**How deploys actually work:** `master-ai` is a single git repo. Each cloud app builds from its **own** Gitea
|
||||
remote, from the matching subfolder. To ship a change, commit in `master-ai`, then
|
||||
`git push <remote> HEAD:main` (e.g. `git push coolify_agent_gitea HEAD:main` for the runner), then trigger the
|
||||
Coolify deploy for that app (see `VIBNDEV.md`). `vibn-code` is a nested submodule with its own `.git` — commit &
|
||||
push it via its own `origin`. Secret `.env*` files at the repo root are gitignored — never commit them.
|
||||
|
||||
---
|
||||
|
||||
@@ -128,13 +134,30 @@ VibnCode overrides local OS actions to communicate with your cloud containers (o
|
||||
|
||||
---
|
||||
|
||||
## 6. Where We Left Off (As of May 28, 2026)
|
||||
## 6. Where We Left Off (As of May 31, 2026)
|
||||
|
||||
* **Deep-Link Protocol Scheme Resolved**:
|
||||
Fixed `src-tauri/Info.plist` which was still configured with `com.talkcody` / `talkcody`. macOS Launch Services now correctly maps `vibncode://` deep links directly to the local dev app.
|
||||
* **Rust Compiling Errors Resolved**:
|
||||
Patched cargo clippy errors in `dashscope.rs`, `openai_responses_protocol.rs`, and `openai_responses_ws.rs` (collapsed match statements and annotated unused structs).
|
||||
* **Repositories Synchronized**:
|
||||
Merged, committed, and pushed all updated code:
|
||||
* `vibn-code` pushed to Gitea `origin main`.
|
||||
* `vibn-agent-runner` and `vibn-frontend` modifications pushed to `coolify_agent_gitea` and `coolify_gitea` on branch `frontend-deploy-13`.
|
||||
**Read `VIBNCODE_THIN_CLIENT_CHANGES.md` first** — it is the live, prioritized change list with exact files,
|
||||
steps, and acceptance criteria for the thin-client conversion, plus a STATUS section of what's done.
|
||||
|
||||
**Chat works end-to-end.** A desktop message → `POST /api/projects/:id/agent/sessions` → cloud runner executes
|
||||
the Coder agent (Gemini) → output polled back into the Monaco chat. Recent fixes that got it there:
|
||||
|
||||
* **Local SQLite was wiping chats (fixed):** `database-service.ts` used `INSERT OR REPLACE INTO projects`, which
|
||||
(via `ON DELETE CASCADE`) deleted the active conversation mid-run. Switched to UPSERT; made `task-service`
|
||||
persistence non-blocking. The cloud is the source of truth; local SQLite is just a cache.
|
||||
* **Empty `appPath` broke every run (fixed):** the desktop sent `appPath: ""`; the runner's `/agent/execute`
|
||||
rejects falsy `appPath` with HTTP 400 and does nothing (no logs). Desktop now sends `appPath: "."`.
|
||||
* **Agent tools `fetch failed` (fixed, pushed):** the runner's `buildContext()` hardcoded
|
||||
`vibnApiUrl: 'http://localhost:3000'` and an empty `mcpToken`, so tool calls fetched a dead port. Now
|
||||
`/agent/execute` reads `mcpToken` from the body and sets `ctx.vibnApiUrl` (from `VIBN_API_URL`) + `mcpToken`.
|
||||
Pushed to `coolify_agent_gitea/main` — confirm the runner redeploy.
|
||||
* **Single model:** desktop model picker restricted to the VibnAI model, relabeled "Gemini 3.5 Flash". The
|
||||
runner's real model is set by `GEMINI_MODEL` env (currently `gemini-3.1-pro-preview`); the desktop label is
|
||||
cosmetic until model-passthrough is wired (CHANGE 4.1 in the change doc).
|
||||
|
||||
**Known open items (in the change doc):** the desktop still has a hardcoded `vibn_sk_` API key to remove;
|
||||
`/agent/sessions/:id/stop` returns 401 to the desktop (uses browser-session auth, not the workspace key); runner
|
||||
early-failures are silently swallowed (failure PATCHes omit the `x-agent-runner-secret` header).
|
||||
|
||||
**Earlier (still true):** `vibncode://` deep link scheme is registered in `src-tauri/Info.plist`; Rust clippy is
|
||||
treated as errors on commit.
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -1,601 +0,0 @@
|
||||
# Vibn Chat Harness — Fix Checklist
|
||||
|
||||
Work through items in order. Each fix has a clear **What**, **Where**,
|
||||
**How**, and **Verify** section. Don't skip the verify step — many of
|
||||
these fixes interact with each other and silent failures will
|
||||
compound.
|
||||
|
||||
Mark `[x]` as you complete each item. If you can't complete an item,
|
||||
add a short note under it explaining why and move on.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — Backend fixes (highest leverage; do these first)
|
||||
|
||||
These three fix the failure modes the prompt currently promises but
|
||||
the backend doesn't deliver. Until they're done, the prompt's hard
|
||||
rules are partly fiction.
|
||||
|
||||
### [ ] 1. Add `sha256` and `bytes` to `fs.write` and `fs.edit` responses
|
||||
|
||||
**What:** The prompt's hard rules tell the model to cite `sha256` and
|
||||
`bytes` as evidence of file changes. The tools don't return those
|
||||
fields today, so the model is looking for evidence that doesn't exist.
|
||||
|
||||
**Where:** `app/api/mcp/route.ts` — functions `toolFsWrite` and
|
||||
`toolFsEdit`.
|
||||
|
||||
**How:**
|
||||
- In `toolFsWrite`, after the `runFsCmd` success branch, before
|
||||
returning, compute the sha256 of `content` and return it alongside
|
||||
`bytesWritten` renamed to `bytes`:
|
||||
```ts
|
||||
import { createHash } from "crypto";
|
||||
// ...
|
||||
const bytes = Buffer.byteLength(content, "utf8");
|
||||
const sha256 = createHash("sha256").update(content, "utf8").digest("hex");
|
||||
return NextResponse.json({
|
||||
result: { ok: true, path, bytes, sha256 },
|
||||
});
|
||||
```
|
||||
- In `toolFsEdit`, you don't have the final content in memory. Add a
|
||||
second command that prints the sha + bytes after the edit:
|
||||
```ts
|
||||
const cmd = `python3 -c "$(printf %s ${shq(pyB64)} | base64 -d)" <<< "$(printf %s ${shq(b64)} | base64 -d)" && echo "---" && sha256sum ${shq(path)} | cut -d' ' -f1 && wc -c < ${shq(path)}`;
|
||||
```
|
||||
Then parse the trailing two lines after `---` for sha and bytes.
|
||||
- Update the response shape:
|
||||
```ts
|
||||
return NextResponse.json({
|
||||
result: { ok: true, path, replacements, bytes, sha256 },
|
||||
});
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
- [ ] Call `fs_write` with `{ path: "test.txt", content: "hello" }`.
|
||||
Confirm response contains `sha256` (64 hex chars) and `bytes: 5`.
|
||||
- [ ] Call `fs_edit` to change the same file. Confirm response
|
||||
contains a new `sha256` and updated `bytes`.
|
||||
- [ ] Replay a turn that does `fs_write` followed by `fs_read` of the
|
||||
same file in chat. The model should now produce text like
|
||||
"Updated `test.txt` (sha=a3f5c2…, 5b)" instead of a bare claim.
|
||||
|
||||
---
|
||||
|
||||
### [ ] 2. Add project-slug scoping to `normalizeFsPath`
|
||||
|
||||
**What:** The prompt tells the model to use paths like `src/app/page.tsx`
|
||||
and claims the tool layer rejects writes outside the project root.
|
||||
The tool layer does NOT do this today. It resolves all relative paths
|
||||
under `/workspace` (workspace-level), so `fs_write { path: "src/app/page.tsx" }`
|
||||
ends up at `/workspace/src/app/page.tsx` — the ghost file from the
|
||||
failing session. Five different path conventions were used for the
|
||||
same file in one session because nothing enforces the rule.
|
||||
|
||||
**Where:** `app/api/mcp/route.ts` — function `normalizeFsPath` and
|
||||
every caller in `toolFsRead`, `toolFsWrite`, `toolFsEdit`,
|
||||
`toolFsList`, `toolFsDelete`, `toolFsGlob`, `toolFsGrep`, `toolFsTree`,
|
||||
`toolRequestVisualQA`, `toolGenerateMedia`.
|
||||
|
||||
**How:**
|
||||
|
||||
- Change `normalizeFsPath` to accept an optional `projectSlug`:
|
||||
```ts
|
||||
function normalizeFsPath(
|
||||
p: string,
|
||||
projectSlug?: string,
|
||||
): string | NextResponse {
|
||||
if (!p || typeof p !== "string") {
|
||||
return NextResponse.json(
|
||||
{ error: 'Param "path" is required' },
|
||||
{ status: 400 },
|
||||
);
|
||||
}
|
||||
const projectRoot = projectSlug ? `${FS_ROOT}/${projectSlug}` : FS_ROOT;
|
||||
let abs: string;
|
||||
if (p.startsWith("/")) {
|
||||
abs = p;
|
||||
} else {
|
||||
abs = `${projectRoot}/${p}`.replace(/\/+/g, "/");
|
||||
}
|
||||
const norm = abs.replace(/\/[^/]+\/\.\.(?=\/|$)/g, "").replace(/\/+/g, "/");
|
||||
|
||||
// When projectSlug is set, REJECT paths outside the project root.
|
||||
if (projectSlug) {
|
||||
if (!norm.startsWith(projectRoot) && norm !== projectRoot) {
|
||||
return NextResponse.json(
|
||||
{
|
||||
ok: false,
|
||||
error: `PATH_OUTSIDE_PROJECT: path "${p}" resolves to "${norm}" which is outside the active project at "${projectRoot}". Did you mean "${projectRoot}/${p.replace(/^\/+/, "")}"?`,
|
||||
},
|
||||
{ status: 400 },
|
||||
);
|
||||
}
|
||||
} else {
|
||||
// Workspace-level fallback (legacy behaviour)
|
||||
if (!norm.startsWith(FS_ROOT) && norm !== FS_ROOT) {
|
||||
return NextResponse.json(
|
||||
{ error: `Path "${p}" is outside ${FS_ROOT}; use shell.exec for system paths.` },
|
||||
{ status: 400 },
|
||||
);
|
||||
}
|
||||
}
|
||||
return norm;
|
||||
}
|
||||
```
|
||||
- In every fs_* tool that already calls `resolveProjectOr404`, pass
|
||||
the slug:
|
||||
```ts
|
||||
const path = normalizeFsPath(String(params.path ?? ""), project.slug);
|
||||
```
|
||||
- `toolFsRead`, `toolFsWrite`, `toolFsEdit`, `toolFsDelete`,
|
||||
`toolRequestVisualQA`, `toolGenerateMedia` all already have
|
||||
`project` in scope — pass `project.slug`.
|
||||
- `toolFsList`, `toolFsGlob`, `toolFsGrep`, `toolFsTree` use
|
||||
`params.path` or `params.cwd` — same treatment, pass `project.slug`.
|
||||
|
||||
**Verify:**
|
||||
- [ ] From a project-scoped thread, call `fs_write { path: "/workspace/src/app/page.tsx", content: "x" }`.
|
||||
Expect `PATH_OUTSIDE_PROJECT` error.
|
||||
- [ ] From the same thread, call `fs_write { path: "src/app/page.tsx", content: "x" }`.
|
||||
Expect success at `/workspace/<slug>/src/app/page.tsx`.
|
||||
- [ ] Confirm `dev_server_start` with `command: "npm run dev"` runs
|
||||
from the project root, not `/workspace`. (This is mostly already
|
||||
true via dev-container logic; just confirm.)
|
||||
|
||||
---
|
||||
|
||||
### [ ] 3. Fix the broken `plan-extract` block
|
||||
|
||||
**What:** The fire-and-forget `plan-extract` block in
|
||||
`app/api/chat/route.ts` has a syntax error — the `try` block builds a
|
||||
transcript and then hits `}` followed by `catch` with no actual call
|
||||
to `autoExtractPlanUpdates`. The body of the try is missing. Either
|
||||
the auto-extraction was intentionally removed (in which case the
|
||||
dead transcript-building code should also be deleted) or it was
|
||||
accidentally truncated (in which case the call needs to be restored).
|
||||
|
||||
**Where:** `app/api/chat/route.ts`, around the second fire-and-forget
|
||||
block (after the title/summary block, before `emit({ type: "done" })`).
|
||||
|
||||
**How:**
|
||||
|
||||
- Decide first: do we want auto-extraction to run? If YES, restore
|
||||
the call:
|
||||
```ts
|
||||
(async () => {
|
||||
try {
|
||||
if (!threadProjectId) return;
|
||||
const allMessages = [...history, finalMsg];
|
||||
if (allMessages.length < 2) return;
|
||||
const transcript = allMessages
|
||||
.map((m) => {
|
||||
const text =
|
||||
typeof m.content === "string"
|
||||
? m.content
|
||||
: JSON.stringify(m.content);
|
||||
return `${m.role.toUpperCase()}: ${text.slice(0, 1200)}`;
|
||||
})
|
||||
.join("\n\n");
|
||||
const result = await autoExtractPlanUpdates(
|
||||
threadProjectId,
|
||||
transcript,
|
||||
);
|
||||
if (result) {
|
||||
console.log(
|
||||
"[chat] plan-extract:",
|
||||
`${result.tasks} tasks, ${result.decisions} decisions, vision=${result.vision}`,
|
||||
);
|
||||
}
|
||||
} catch (err) {
|
||||
console.warn("[chat] plan-extract failed (non-fatal):", err);
|
||||
}
|
||||
})().catch(() => {});
|
||||
```
|
||||
And re-add the `import { autoExtractPlanUpdates } from "@/lib/ai/plan-extract";`
|
||||
at the top of the file.
|
||||
- If NO (you removed it intentionally), delete the entire IIFE
|
||||
including the transcript-building so the file compiles cleanly.
|
||||
|
||||
**Verify:**
|
||||
- [ ] Run `tsc --noEmit` on the file. Confirm no syntax errors.
|
||||
- [ ] If auto-extraction restored: have a chat that mentions a
|
||||
decision (e.g. "let's use Postgres"). Confirm a new entry appears
|
||||
in the project's `plan.decisions` with `confidence: "auto"`.
|
||||
- [ ] Tail prod logs for `[chat] plan-extract:` — should fire on
|
||||
every turn with content.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — Prompt fixes (now that the backend matches)
|
||||
|
||||
These bring the prompt into line with what the tools actually do.
|
||||
|
||||
### [ ] 4. Fix the `apps_containers_list` typo in the prompt
|
||||
|
||||
**What:** The troubleshooting section references `apps_containers_list`
|
||||
but the actual tool is `apps_containers_ps`. The model will call a
|
||||
tool that doesn't exist.
|
||||
|
||||
**Where:** `app/api/chat/route.ts`, inside `buildSystemPrompt`, in the
|
||||
"## Troubleshooting" section.
|
||||
|
||||
**How:**
|
||||
- Find: `apps_logs { uuid } + apps_containers_list { uuid }`
|
||||
- Replace: `apps_logs { uuid } + apps_containers_ps { uuid }`
|
||||
|
||||
**Verify:**
|
||||
- [ ] Grep the prompt for `apps_containers_list` — no matches.
|
||||
- [ ] Grep for `apps_containers_ps` — should appear in
|
||||
troubleshooting and at least once in the apps section.
|
||||
|
||||
---
|
||||
|
||||
### [ ] 5. Soften the `ok` field rule
|
||||
|
||||
**What:** The current rule says "If `ok` is false (or absent, or
|
||||
`exitCode` is non-zero, or `healthCheck.status` is >= 400) the
|
||||
operation FAILED." The "or absent" clause is wrong — many tools
|
||||
return data without an `ok` field (e.g. `projects_get`, `apps_list`,
|
||||
`databases_get`). The model will treat every read as a failure.
|
||||
|
||||
**Where:** `buildSystemPrompt`, "Hard rules" section, the "Trust the
|
||||
`ok` field" bullet.
|
||||
|
||||
**How:**
|
||||
|
||||
Replace the current rule with:
|
||||
|
||||
```
|
||||
- **Read tool results carefully.** A tool FAILED when ANY of these
|
||||
signals are present: `ok: false`, `error: "..."`, a non-zero
|
||||
`exitCode`, or a `healthCheck.status` >= 400. If NONE of those
|
||||
signals are present, look at the actual content of the response
|
||||
to decide whether the operation succeeded. Many read-only tools
|
||||
return data directly without an `ok` field — that's not a failure.
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
- [ ] Pick a recent thread where the agent called `projects_get` or
|
||||
`apps_list`. Confirm the agent didn't treat the response as a
|
||||
failure (look at its post-tool text — should be a normal summary,
|
||||
not "the operation failed").
|
||||
|
||||
---
|
||||
|
||||
### [ ] 6. Tighten the status-nudge threshold
|
||||
|
||||
**What:** Current thresholds are `roundsSinceText >= 8 ||
|
||||
toolCallsSinceText >= 12`. With `MAX_TOOL_ROUNDS = 8`, the round-based
|
||||
nudge can never fire (loop ends first). The tool-call threshold of 12
|
||||
is also too lenient — users typed "test" / "hello" by round 4-5 of
|
||||
silence in the failing session.
|
||||
|
||||
**Where:** `app/api/chat/route.ts`, near the top of the main while
|
||||
loop, the `isSilent` constant.
|
||||
|
||||
**How:**
|
||||
|
||||
```ts
|
||||
const isSilent = roundsSinceText >= 3 || toolCallsSinceText >= 6;
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
- [ ] Replay a chat that triggers 6+ tool calls without text. Confirm
|
||||
the `[STATUS NUDGE]` system addendum is injected before the next
|
||||
round.
|
||||
- [ ] Confirm the model produces a one-line status sentence in
|
||||
response to the nudge.
|
||||
|
||||
---
|
||||
|
||||
### [ ] 7. Update the path-convention guidance in the prompt
|
||||
|
||||
**What:** After Fix 2 ships, the path convention is now enforced. The
|
||||
prompt should state this plainly without the "cd into your project"
|
||||
workaround.
|
||||
|
||||
**Where:** `buildSystemPrompt`, inside `activeBlock`, the path
|
||||
guidance section. Also inside "Dev servers" → "Directory" bullet.
|
||||
|
||||
**How:**
|
||||
|
||||
Replace the "Directory" bullet under "Dev servers":
|
||||
|
||||
```
|
||||
- **Directory:** Tool paths are scoped to your project root
|
||||
automatically. Pass `command: "npm run dev"` directly — no `cd`
|
||||
prefix needed. The tool rejects any `fs_*` write outside
|
||||
`/workspace/<slug>/`.
|
||||
```
|
||||
|
||||
And after the "Project repo is auto-cloned" paragraph in
|
||||
`activeBlock`, add:
|
||||
|
||||
```
|
||||
**Path convention for fs_* tools:** Pass paths relative to the
|
||||
project root — `src/app/page.tsx`, NOT `/workspace/<slug>/src/app/page.tsx`
|
||||
and NOT `<slug>/src/app/page.tsx`. The tool layer rejects writes
|
||||
outside the project root with a `PATH_OUTSIDE_PROJECT` error
|
||||
suggesting the corrected path.
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
- [ ] In a fresh chat, ask the agent to "edit the homepage". Confirm
|
||||
the first `fs_read` call uses `src/app/page.tsx` (no slug prefix,
|
||||
no `/workspace/` prefix).
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — Polish and safety nets
|
||||
|
||||
These are lower-priority but each removes a small foot-gun.
|
||||
|
||||
### [ ] 8. Add `fs_tree` recommendation to first-turn behavior
|
||||
|
||||
**What:** The agent ran `fs_tree` 5 times and `fs_glob` 9+ times in
|
||||
the failing session, re-discovering paths it should have learned once.
|
||||
The tool description already says "ALWAYS call this first" but the
|
||||
prompt doesn't reinforce it.
|
||||
|
||||
**Where:** `buildSystemPrompt`, in the "Writing code" section.
|
||||
|
||||
**How:**
|
||||
|
||||
Add this near the top of "Writing code — dev container is the default":
|
||||
|
||||
```
|
||||
**Orient yourself once.** On the first code-modifying turn of a
|
||||
chat, call `fs_tree` once to learn the repo layout. Don't re-run it
|
||||
on every turn — the layout doesn't change between user messages.
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
- [ ] Manual review of the next 5 sessions: confirm `fs_tree` is
|
||||
called at most once per chat (not per turn).
|
||||
|
||||
---
|
||||
|
||||
### [ ] 9. Add `browser_navigate` and `browser_console` as verification primitives
|
||||
|
||||
**What:** The backend has `browser_navigate` and `browser_console`
|
||||
tools that headlessly render a page and capture console errors. The
|
||||
prompt never mentions them. These are the missing post-deploy
|
||||
verification step that the `healthCheck` field gestures at.
|
||||
|
||||
**Where:** `buildSystemPrompt`, in the "Dev servers" subsection or as
|
||||
its own section after "Visual QA".
|
||||
|
||||
**How:**
|
||||
|
||||
Add a new bullet right after "Visual QA" guidance:
|
||||
|
||||
```
|
||||
**Verify the page actually renders:**
|
||||
- After `dev_server_start` returns a `previewUrl` AND `healthCheck.status === 200`,
|
||||
for any UI-facing turn, call `browser_console { url: previewUrl }` to
|
||||
capture frontend console errors. Hydration errors, missing assets,
|
||||
and uncaught exceptions show up here even when the server is
|
||||
technically "running".
|
||||
- If `browser_console` returns errors, fix them with `fs_edit`
|
||||
before declaring done. A green `healthCheck` plus a clean console
|
||||
is the real "done" signal for UI work.
|
||||
- Skip this for backend / SQL / config-only changes.
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
- [ ] On the next UI-modifying chat, confirm `browser_console` is
|
||||
called once after `dev_server_start`.
|
||||
- [ ] Confirm any errors it returns get acknowledged in the agent's
|
||||
reply.
|
||||
|
||||
---
|
||||
|
||||
### [ ] 10. Add the market research stack to the prompt
|
||||
|
||||
**What:** The backend exposes six market research tools
|
||||
(`market_categories_suggest`, `market_research_run`, `market_seo_analyze`,
|
||||
`tech_stack_analyze`, `market_competitor_research`,
|
||||
`market_aggregate_insights`). The prompt never mentions them. For a
|
||||
non-technical-founder product, these are some of the highest-leverage
|
||||
tools — they answer "should I build for dentists or summer camps?"
|
||||
with real TAM counts.
|
||||
|
||||
**Where:** `buildSystemPrompt`, add a new section after "Common
|
||||
questions → tools" or before "How to deploy".
|
||||
|
||||
**How:**
|
||||
|
||||
Add this section:
|
||||
|
||||
```
|
||||
## Helping the user pick what to build
|
||||
|
||||
Vibn has a market-research toolkit for non-technical founders who
|
||||
need data on their target niche. Use it when the user is undecided,
|
||||
validating an idea, or comparing markets:
|
||||
|
||||
- **"How big is the market for X in <location>?"** → `market_categories_suggest { niche }` to
|
||||
propose Google Business categories, then `market_research_run` after
|
||||
the user approves. Returns TAM count, sample domains, and review
|
||||
data. NOTE: `market_research_run` costs real money — always confirm
|
||||
with the user and pass `user_explicitly_approved: true`.
|
||||
- **"What are competitors spending on Google Ads?"** → `market_seo_analyze { domain }`.
|
||||
Returns organic traffic, paid traffic, ad spend, and top paid
|
||||
keywords. Use to tell the user how aggressive a market is.
|
||||
- **"What software do these businesses already use?"** → `tech_stack_analyze { urls, software_category_id }`.
|
||||
Detects WordPress, Shopify, named competitors, and any custom
|
||||
domains/scripts you pass. Use to find "X businesses use WordPress
|
||||
but lack Y" market gaps.
|
||||
- **"What are customers complaining about?"** → `market_aggregate_insights { category, location }`.
|
||||
Returns top review topics — use the actual words customers use as
|
||||
marketing copy and value-prop seeds.
|
||||
- **"Who are the players in this niche?"** → `market_competitor_research { niche }`.
|
||||
Returns proprietary competitors with pricing AND open-source
|
||||
alternatives that could be forked.
|
||||
|
||||
These are conversational research tools — they don't build anything.
|
||||
Use them BEFORE scaffolding when the user is exploring direction;
|
||||
SKIP them once the user has committed to building.
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
- [ ] In a fresh chat, ask the agent "should I build for dentists or
|
||||
summer camps?". Confirm it proposes using `market_categories_suggest`
|
||||
or `market_aggregate_insights` rather than guessing.
|
||||
|
||||
---
|
||||
|
||||
### [ ] 11. Surface `apps_exec`, `auth_create`, `generate_media`, `storage_*` briefly
|
||||
|
||||
**What:** Several capabilities the agent has are completely absent
|
||||
from the prompt. The agent doesn't know it can run commands inside
|
||||
production containers, deploy real auth servers, generate images, or
|
||||
wire S3 storage. One-line mentions are enough.
|
||||
|
||||
**Where:** `buildSystemPrompt`, scattered across existing sections.
|
||||
|
||||
**How:**
|
||||
|
||||
- Under "Common questions → tools", add:
|
||||
```
|
||||
- "Run a migration / psql in prod" → `apps_exec { uuid, command }`.
|
||||
- "Generate a hero image / illustration" → `generate_media { prompt, type, outputPath }`.
|
||||
- "Wire up file storage / uploads" → `storage_provision` (if not already), then `storage_inject_env { uuid }`.
|
||||
```
|
||||
|
||||
- In the "Decision defaults" section, augment the auth bullet:
|
||||
```
|
||||
- **Auth:** NextAuth with email magic-link for in-app auth.
|
||||
Deploy a separate Pocketbase / Authentik / Keycloak service via
|
||||
`auth_create { provider }` only if the user needs SSO, multi-app
|
||||
SSO, or admin user management.
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
- [ ] Ask the agent "can you run a SQL migration on prod?". Confirm
|
||||
it references `apps_exec`.
|
||||
- [ ] Ask "I need a hero image for the landing page". Confirm it
|
||||
references `generate_media`.
|
||||
|
||||
---
|
||||
|
||||
### [ ] 12. Flip the `fs_edit` preference to match the tool's documentation
|
||||
|
||||
**What:** The tool description says `startLine`/`endLine` is
|
||||
preferred and `oldString` is the fallback. The prompt currently says
|
||||
the opposite ("prefer `oldString` for small replacements"). The tool
|
||||
is authoritative — match it.
|
||||
|
||||
**Where:** `buildSystemPrompt`, in the "Iterate" bullet under "Writing
|
||||
code", the `fs_edit` guidance.
|
||||
|
||||
**How:**
|
||||
|
||||
Replace the current `fs_edit` guidance with:
|
||||
|
||||
```
|
||||
- `fs_read` / `fs_write` / `fs_edit { path, oldString, newString, startLine, endLine }`.
|
||||
**For `fs_edit`:** prefer `startLine`/`endLine` (deterministic; never
|
||||
fails on duplicate strings). Use `oldString` only when you cannot
|
||||
read the file first to get line numbers — and when you do, include
|
||||
2-3 lines of surrounding context for uniqueness. If `fs_edit` keeps
|
||||
failing, do NOT escape to `shell_exec` with patch scripts — read
|
||||
the file fresh with `fs_read`, get the line numbers, and try again.
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
- [ ] Confirm next 3 `fs_edit` calls in the wild use `startLine`/`endLine`,
|
||||
not `oldString`.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 — Monitoring (after Phases 1-3 land)
|
||||
|
||||
Once the changes are in production, watch these for a week. Tune if
|
||||
the numbers don't move.
|
||||
|
||||
### [ ] 13. Track the recovery-summary fire rate
|
||||
|
||||
**What:** The `needsRecovery` path runs when a turn ends badly (hit
|
||||
the round cap, hit a loop, or last tool returned failure). It should
|
||||
fire on <10% of turns. If it fires more often, the cap is too low or
|
||||
the model is hitting real bugs.
|
||||
|
||||
**How:** Add a metric. In the `needsRecovery` block, before calling
|
||||
`callVibnChat`, emit:
|
||||
|
||||
```ts
|
||||
console.log("[chat] recovery_fired", {
|
||||
turnId,
|
||||
reason: loopBreakReason ? "loop"
|
||||
: round >= MAX_TOOL_ROUNDS ? "round_cap"
|
||||
: assistantText.trim().length === 0 ? "no_text"
|
||||
: "tool_failure",
|
||||
toolCalls: assistantToolCalls.length,
|
||||
});
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
- [ ] Aggregate over 1 week. If `round_cap` is the dominant reason
|
||||
and the turns look legitimate, raise `MAX_TOOL_ROUNDS` to 10 or 12.
|
||||
- [ ] If `loop` is dominant, the fingerprinter may need tuning.
|
||||
|
||||
---
|
||||
|
||||
### [ ] 14. Track conversational-guard fire rate
|
||||
|
||||
**What:** The first-turn conversational guard (regex match on user
|
||||
message → no tools on round 1) is the biggest single behavioral
|
||||
change in this revision. We want to know how often it fires and
|
||||
whether it ever causes a problem.
|
||||
|
||||
**How:** Before the loop, after computing `firstMessageIsConversational`:
|
||||
|
||||
```ts
|
||||
console.log("[chat] turn_start", {
|
||||
turnId,
|
||||
firstMessageIsConversational,
|
||||
messagePreview: message.trim().slice(0, 80),
|
||||
});
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
- [ ] Aggregate over 1 week. Should fire on ~30-40% of first messages
|
||||
in a chat.
|
||||
- [ ] Spot-check 10 cases where it fired: confirm none were
|
||||
legitimate "build me X" requests being miscategorized.
|
||||
|
||||
---
|
||||
|
||||
### [ ] 15. Track `PATH_OUTSIDE_PROJECT` rejections
|
||||
|
||||
**What:** After Fix 2 ships, this error tells you how often the model
|
||||
was about to write to the wrong path. Should taper toward zero as the
|
||||
prompt guidance + enforcement settles in.
|
||||
|
||||
**How:** Server-side log in `normalizeFsPath` when the project-scoped
|
||||
rejection fires.
|
||||
|
||||
**Verify:**
|
||||
- [ ] Week 1: count rejections per day.
|
||||
- [ ] Week 2: count should be lower (model has internalized the rule
|
||||
via the error messages it's seen).
|
||||
|
||||
---
|
||||
|
||||
## What this checklist deliberately doesn't include
|
||||
|
||||
A few things from earlier reviews that I'm intentionally leaving off:
|
||||
|
||||
- **Phase-aware behavior.** Phases were removed from the product; the
|
||||
prompt no longer references them. No work needed.
|
||||
- **Codebase summary auto-generation.** Lower priority once Fix 2
|
||||
ships (paths can no longer drift) and Fix 8 lands (one `fs_tree`
|
||||
per chat instead of nine).
|
||||
- **Tool history reconstruction for DeepSeek compatibility.** Already
|
||||
shipped in the current code via the `_rawToolResults` compact
|
||||
summary. No additional work.
|
||||
|
||||
Don't add work for these unless a clear failure mode appears in
|
||||
production after Phases 1-3 land.
|
||||
@@ -2354,7 +2354,7 @@ function Closing() {
|
||||
|
||||
<div className="closing-cta">
|
||||
<div className="row">
|
||||
<a href="Beta Signup.html" className="btn btn-primary">
|
||||
<a href="/auth?new=1" className="btn btn-primary">
|
||||
Request invite <Arrow />
|
||||
</a>
|
||||
<a href="#how" className="btn btn-ghost">
|
||||
@@ -2598,6 +2598,17 @@ function LaunchModal({ prompt, onClose }) {
|
||||
return () => window.removeEventListener("keydown", onKey);
|
||||
}, [onClose]);
|
||||
|
||||
// Preserve prompt for onboarding seeding (T12)
|
||||
useEffect(() => {
|
||||
if (typeof window !== "undefined" && prompt) {
|
||||
try {
|
||||
localStorage.setItem("vibn:firstName", prompt);
|
||||
} catch (err) {
|
||||
console.error("Failed to save hero prompt to localStorage:", err);
|
||||
}
|
||||
}
|
||||
}, [prompt]);
|
||||
|
||||
const [step, setStep] = useState(0);
|
||||
useEffect(() => {
|
||||
if (step >= 4) return undefined;
|
||||
@@ -2605,6 +2616,17 @@ function LaunchModal({ prompt, onClose }) {
|
||||
return () => clearTimeout(t);
|
||||
}, [step]);
|
||||
|
||||
const [redirectCount, setRedirectCount] = useState(3);
|
||||
useEffect(() => {
|
||||
if (step < 4) return undefined;
|
||||
if (redirectCount <= 0) {
|
||||
window.location.href = "/auth";
|
||||
return undefined;
|
||||
}
|
||||
const t = setTimeout(() => setRedirectCount(redirectCount - 1), 1000);
|
||||
return () => clearTimeout(t);
|
||||
}, [step, redirectCount]);
|
||||
|
||||
return (
|
||||
<div className="modal-backdrop" onClick={onClose}>
|
||||
<style>{`
|
||||
@@ -2693,9 +2715,50 @@ function LaunchModal({ prompt, onClose }) {
|
||||
))}
|
||||
</div>
|
||||
|
||||
{step === 4 ? (
|
||||
<div
|
||||
className="modal-actions"
|
||||
style={{
|
||||
marginTop: "24px",
|
||||
display: "flex",
|
||||
flexDirection: "column",
|
||||
gap: "12px",
|
||||
alignItems: "center",
|
||||
}}
|
||||
>
|
||||
<a
|
||||
href="/auth"
|
||||
className="btn btn-primary"
|
||||
style={{
|
||||
width: "100%",
|
||||
height: "48px",
|
||||
display: "inline-flex",
|
||||
alignItems: "center",
|
||||
justifyContent: "center",
|
||||
gap: "8px",
|
||||
fontSize: "15px",
|
||||
fontWeight: "600",
|
||||
textDecoration: "none",
|
||||
}}
|
||||
>
|
||||
Launch Your Workspace <Arrow size={14} />
|
||||
</a>
|
||||
<span
|
||||
style={{
|
||||
fontSize: "11px",
|
||||
color: "var(--fg-faint)",
|
||||
fontFamily: "var(--font-mono)",
|
||||
letterSpacing: "0.04em",
|
||||
}}
|
||||
>
|
||||
Redirecting to registration in {redirectCount}s...
|
||||
</span>
|
||||
</div>
|
||||
) : (
|
||||
<div className="modal-foot">
|
||||
No homework · No setup · No new tools to learn
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
|
||||
Reference in New Issue
Block a user