fix(chat): never end a turn silent + loop detection + status nudge

The big UX failure: model fires 20 tool calls in silence, persists turn with content_len=0, user has to re-prompt to get any answer. Confirmed in prod (Dr Dave / "are you able to give me a preview url?" thread). Five changes: 1. Recovery summary now fires on ANY silent-tool-tray turn end (not just MAX_TOOL_ROUNDS): hit the cap, broke a detected loop, OR ended with empty assistantText. Previously the recovery was gated to round-cap only, so voluntary silent stops slipped through. 2. Recovery summary has a deterministic fallback. If Gemini returns empty text on the recovery call, emit a static "ran N tools, didn't reach a clean stopping point" message instead of silently swallowing the empty string. The user always gets something readable. 3. Loop detection: track tool-call fingerprints (name + first 120 chars of args) per turn; if the same fingerprint fires 3× within the last 8 calls, break the loop and surface to user via recovery summary. Kills the dev_server.start → logs → stop → start → ... pattern at its root. 4. Status nudge every 4 silent rounds: inject a synthetic system instruction telling the model to send a one-liner before any more tool calls. The user's only signal of life on long chains. 5. Prompt: soften "don't narrate intent" → "don't narrate SINGLE calls; on chains 3+ deep send a one-liner before each batch". Adds explicit "never end a turn silent" rule. Also: error-path now uses safeClose() instead of bare controller.close() to honor the streamClosed guard like every other close site. Made-with: Cursor
2026-04-30 23:18:46 -07:00
parent 6586c8ae1d
commit b395546529
1 changed files with 83 additions and 11 deletions
--- a/app/api/chat/route.ts
+++ b/app/api/chat/route.ts
@@ -122,8 +122,10 @@ You're talking to the owner of the "${workspace}" workspace. They have admin acc
 You are a high-agency product engineer. You own the outcome. Continue until the user's goal is actually resolved unless you're blocked on missing info, proceeding would be unsafe, or the user changes direction. You are not answering questions; you are building with the user. Translate engineering complexity into product momentum.

 ## Voice
- **Don't narrate intent before tool calls.** Skip "Okay, I'll read that file…" — just read it. Reasoning streams as a thinking pill; users see a tool tray. Don't play-by-play.
+- **Don't narrate single tool calls.** Skip "Okay, I'll read that file…" for a one-shot read. The user sees a tool tray; they don't need a play-by-play.
+- **DO send a one-liner before every batch on a long chain.** If you're about to fire 3+ tool calls, or you're already 3+ rounds deep, send a single sentence first: "Starting the dev server now and tailing logs." Then call the tools. The user is staring at silent ✓ pills otherwise — that's the worst UX in the app.
 - **Pack the post-tool summary into 1–3 punchy sentences:** what landed, the specific result the user needs (URL, SHA, env value, error), and the obvious next step. Don't recap every tool — they saw the tray.
+- **Never end a turn silent.** If you ran tools, you owe the user a sentence about what happened. Never finish a turn with content_len = 0.
 - **Have an opinion.** "Postgres or Mongo?" — pick one in a sentence and proceed. Founders need decisions, not menus. List options only if the user asks or tradeoffs genuinely matter.
 - **Push back when it matters.** Refuse "deploy to prod without backups." Suggest Pipedream over n8n once if it fits better, then defer. Yes-machines ship broken software.
 - **Surface adjacent risks unprompted.** Missing env var after a deploy, DNS not propagated yet, autosave hasn't fired in 30 min — say so. You're protecting their work.
@@ -348,6 +350,14 @@ export async function POST(request: Request) {
      };
      clientSignal.addEventListener('abort', onAbort);

+      // Track per-turn signals we use for loop detection and silent-stretch
+      // detection. The model has a strong tendency to grind through a
+      // dozen+ tool calls in total silence (the user just sees ✓ pills
+      // pile up); both safeguards below break that pattern.
+      const toolFingerprints: string[] = [];
+      let roundsSinceText = 0;
+      let loopBreakReason: string | null = null;
+
      try {
        // Tool-calling loop: use non-streaming so thought_signature is
        // always present in the complete response (required by thinking models).
@@ -356,11 +366,25 @@ export async function POST(request: Request) {
          round++;

          const toolDefs = mcp_token ? VIBN_TOOL_DEFINITIONS : [];
-          const resp = await callGeminiChat({ systemPrompt, messages, tools: toolDefs, temperature: 0.7 });
+
+          // Every 4 silent rounds, nudge the model to surface a one-liner
+          // status before continuing. This is the user's only signal of
+          // life when a tool chain runs long.
+          const extraSystem =
+            roundsSinceText >= 4
+              ? '\n\n[STATUS NUDGE] You have run several tool calls without sending the user any text. Before any more tool calls, send ONE short sentence describing what you are currently working on and why. The user is staring at a wall of tool pills and needs a signal of life.'
+              : '';
+
+          const resp = await callGeminiChat({
+            systemPrompt: systemPrompt + extraSystem,
+            messages,
+            tools: toolDefs,
+            temperature: 0.7,
+          });

          if (resp.error) {
            emit({ type: 'error', error: resp.error });
-            controller.close();
+            safeClose();
            return;
          }

@@ -368,6 +392,9 @@ export async function POST(request: Request) {
          if (resp.text) {
            assistantText += resp.text;
            emit({ type: 'text', text: resp.text });
+            roundsSinceText = 0;
+          } else if (resp.toolCalls.length) {
+            roundsSinceText++;
          }

          // Stream the model's reasoning narration as a separate SSE
@@ -394,6 +421,26 @@ export async function POST(request: Request) {
          if (!resp.toolCalls.length) break;
          if (aborted) break;

+          // Loop detection. If the model fires the same tool with the
+          // same first-key arg 3+ times in this turn, the user is
+          // watching it spin. Bail out, hand control back to the user
+          // with the last tool result as context. The classic case:
+          // dev_server.start → logs → stop → start → logs → stop → ...
+          for (const tc of resp.toolCalls) {
+            const argSig =
+              tc.args && typeof tc.args === 'object'
+                ? JSON.stringify(tc.args).slice(0, 120)
+                : '';
+            toolFingerprints.push(`${tc.name}|${argSig}`);
+          }
+          const last8 = toolFingerprints.slice(-8);
+          const counts = new Map<string, number>();
+          for (const fp of last8) counts.set(fp, (counts.get(fp) ?? 0) + 1);
+          const repeated = [...counts.entries()].find(([, n]) => n >= 3);
+          if (repeated) {
+            loopBreakReason = `Same call (${repeated[0].split('|')[0]}) fired ${repeated[1]}× in a row`;
+          }
+
          // Execute tool calls and add results
          for (const tc of resp.toolCalls) {
            if (aborted) break;
@@ -411,6 +458,8 @@ export async function POST(request: Request) {
              thoughtSignature: tc.thoughtSignature,
            });
          }
+
+          if (loopBreakReason) break;
        }

        // If the user clicked Stop, surface the cancel marker so the
@@ -426,32 +475,55 @@ export async function POST(request: Request) {
          emit({ type: 'aborted' });
        }

-        // If the loop exited because we hit MAX_TOOL_ROUNDS while the
-        // model still wanted to call tools, the user has only seen a
-        // tray of ✓ icons with no narrative. Force one final no-tools
-        // call so we always end on a human-readable summary.
+        // If the loop ended with the user staring at a tool tray and no
+        // narrative — whether because we hit MAX_TOOL_ROUNDS, broke a
+        // detected loop, or the model voluntarily stopped emitting tools
+        // without ever writing text — force one final no-tools summary
+        // so we never abandon the user with silent ✓ pills. Confirmed
+        // failure mode in prod: turn persisted with content_len=0 and
+        // 20 toolCalls, user had to re-prompt to get any answer.
        const lastTurnHadTools =
          messages.length > 0 &&
          messages[messages.length - 1].role === 'tool';
-        if (!aborted && round >= MAX_TOOL_ROUNDS && lastTurnHadTools) {
+        const needsRecovery =
+          !aborted &&
+          lastTurnHadTools &&
+          (round >= MAX_TOOL_ROUNDS || !!loopBreakReason || assistantText.trim().length === 0);
+
+        if (needsRecovery) {
+          const reason = loopBreakReason
+            ? `LOOP DETECTED: ${loopBreakReason}. Stop trying that approach. `
+            : round >= MAX_TOOL_ROUNDS
+              ? 'You hit the tool-round cap. '
+              : '';
          try {
            const summary = await callGeminiChat({
              systemPrompt:
                systemPrompt +
-                '\n\nYou have just executed a chain of tool calls. Summarize the result for the user in 1-3 sentences. Do NOT call any more tools.',
+                `\n\n[RECOVERY] ${reason}Send the user 1–3 short sentences right now: (a) what you actually accomplished or learned, (b) the specific blocker (last error message verbatim if there is one), (c) what you'll try next OR a question for the user. Do NOT call any tools.`,
              messages,
              tools: [],
              temperature: 0.3,
            });
-            if (summary.text) {
+            if (summary.text && summary.text.trim()) {
              assistantText += summary.text;
              emit({ type: 'text', text: summary.text });
+            } else {
+              // Gemini returned empty — fall back to a deterministic
+              // status so the user never sees silent ✓ pills.
+              const fallback = loopBreakReason
+                ? `\n\nI hit a loop while working on this — ${loopBreakReason}. Want me to try a different approach, or do you want to take a look?`
+                : `\n\nI ran a chain of ${assistantToolCalls.length} tool calls but didn't reach a clean stopping point. Want me to keep going, or take a different angle?`;
+              assistantText += fallback;
+              emit({ type: 'text', text: fallback });
            }
            if (summary.thoughts) {
              emit({ type: 'thinking', text: summary.thoughts });
            }
          } catch {
-            // Don't let a failed summary kill the stream.
+            const fallback = `\n\nI ran ${assistantToolCalls.length} tool calls but the wrap-up failed. Want me to retry, or try a different approach?`;
+            assistantText += fallback;
+            emit({ type: 'text', text: fallback });
          }
        }