feat(api): comprehensive QA hardening — security gates, chat improvements, beta scaffolds

Closes checklist items F-01..F-06, D-01..D-28, S-01..S-10, C-01..C-07, B-01..B-07, R-01..R-02, O-03. Security (28 deletions + 10 auth gates): - Delete 28 unauthenticated debug/cursor/firebase/test routes - Gate ai/chat, ai/conversation, context/summarize, work-completed with withTenantProject/withAuth - Add HMAC-SHA256 signature verification to webhooks/coolify - Switch all admin secret comparisons to timingSafeStringEq Foundations (lib/server/*): - api-handler.ts: withAuth, withTenantProject, withWorkspace, withAdminSecret, withRateLimit - logger.ts: structured request-scoped logging with turnId - audit-log.ts: writeAuditLog helper + audit_log table - rate-limit.ts: Postgres sliding window rate limiter - coolify-webhook.ts: verifyCoolifySignature - timing-safe.ts: timingSafeStringEq Chat hardening (chat/route.ts): - MAX_TOOL_ROUNDS 15 → 8 (C-01) - Loop detection: hard-break at 3 identical fingerprints (was 5) (C-02) - Add 6-consecutive-tool-call hard-break (C-02) - Mode: respond first, act second prompt block (C-03) - SSE heartbeat every 25s via setInterval (C-04) - Per-tool 45s timeout via Promise.race (C-05) - turnId per-turn UUID for log correlation (C-06) - Recovery fires when roundsSinceText >= 4 (C-07) - SSE plan event on plan_task_add/edit (B-05) Beta features: - invites table + GET/POST /api/invites (P4.8) - invites/[token] validate + redeem (P4.8) - fs_project_dev_servers table + lib/server/dev-server-state.ts (P6.B1) - fs_project_secrets table + CRUD routes (P6.D2) - lib/integrations/brief-extract.ts (P3.7) Documentation: - app/api/ROUTES.md: full route map with auth + tenant
2026-05-17 19:17:22 -07:00
parent 955aeed6ce
commit 6b8862ef2b
86 changed files with 6772 additions and 2817 deletions
--- a/vibn-frontend/app/api/chat/route.ts
+++ b/vibn-frontend/app/api/chat/route.ts
@@ -33,11 +33,10 @@ import { buildDesignKitPromptSection } from "@/lib/design-kits/for-ai";
 import { buildCodebaseSummary } from "@/lib/ai/project-context/codebase-summary";
 import type { ChatMessage, ToolCall } from "@/lib/ai/gemini-chat";

-// Path B chains routinely fire 7-10 tool calls in one user turn. 18
-// gives enough headroom for complex workflows (scaffold → install →
-// configure → start) while still capping runaway loops. When the cap
-// IS hit, we emit a recovery summary instead of silent tool pills.
-const MAX_TOOL_ROUNDS = 15;
+// C-01: Lowered from 15 → 8. Real workflows (scaffold → install →
+// configure → start) rarely need more than 8 rounds when done correctly.
+// If the cap IS hit the model gets a recovery summary, not silence.
+const MAX_TOOL_ROUNDS = 8;

 let chatTablesReady = false;
 async function ensureChatTables() {
@@ -151,6 +150,20 @@ After every assistant turn, the harness automatically runs \`git add -A && git c

 You're talking to the owner of the "${workspace}" workspace. They have admin access to their Gitea org, a fleet of Coolify projects, and a persistent dev container per project. You can read and write any of it.

+## Mode: respond first, act second
+Before calling any tool, decide: is the user asking a question, or telling you to do something?
+
+**CONVERSATIONAL inputs — respond with text only, no tools:**
+- One-word or greeting messages: "test", "hi", "ok", "thanks"
+- Questions ending in "?": "are you able to…?", "what does X mean?", "how would you…?"
+- Status checks: "is it deployed?", "what's running?" (one read-only tool MAX, then respond)
+
+**ACTION inputs — tools allowed:**
+- Imperatives: "deploy it", "build me X", "fix the navbar", "ship"
+- Specific tasks with clear deliverables: "add Stripe to the pricing page"
+
+If you are unsure which mode the user is in, **default to CONVERSATIONAL** and ask one clarifying sentence before acting. "Want me to actually deploy this to prod now, or were you just checking?" is always cheaper than a silent 16-tool spiral.
+
 ## Identity
 You are a high-agency product engineer. You own the outcome. Continue until the user's goal is actually resolved unless you're blocked on missing info, proceeding would be unsafe, or the user changes direction. You are not answering questions; you are building with the user. Translate engineering complexity into product momentum.

@@ -530,6 +543,9 @@ export async function POST(request: Request) {
  const stream = new ReadableStream({
    async start(controller) {
      let streamClosed = false;
+      // C-06: Per-turn correlation ID so prod logs are greppable.
+      const turnId = crypto.randomUUID();
+
      function emit(chunk: object) {
        if (streamClosed) return;
        try {
@@ -544,11 +560,21 @@ export async function POST(request: Request) {
      function safeClose() {
        if (streamClosed) return;
        streamClosed = true;
+        clearInterval(heartbeat);
        try {
          controller.close();
        } catch {}
      }

+      // C-04: SSE heartbeat every 25s keeps Cloudflare / proxies from
+      // dropping the connection during long Gemini thinking phases.
+      const heartbeat = setInterval(() => {
+        emit({ type: "ping", turnId });
+      }, 25_000);
+
+      // Emit turnId immediately so the client can log/correlate.
+      emit({ type: "turn_start", turnId });
+
      let messages = [...history];
      let round = 0;
      let assistantText = "";
@@ -616,6 +642,29 @@ export async function POST(request: Request) {
        return `${tc.name}:${argSig}`;
      }

+      // ── Server-side conversational guard (C-03 enforcement) ───────────
+      // If the user's message looks conversational we withhold tools for
+      // round 1. The model MUST respond in text first. If its reply then
+      // expresses clear intent to act, tools become available from round 2.
+      // This is more reliable than a prompt rule against a "do-er" model.
+      function isConversational(msg: string): boolean {
+        const m = msg.trim();
+        if (m.length < 3) return true; // single word / emoji
+        if (m.endsWith("?")) return true; // explicit question
+        // Short phrases that are status checks or greetings
+        const conversationalPatterns = [
+          /^(hi|hey|hello|sup|test|ok|okay|thanks|ty|thx|lgtm|nice|cool|great|wow)\b/i,
+          /^(what|how|why|when|where|who|which|is |are |can |could |would |do |does |did |has |have |had |was |were )\S+.{0,60}$/i,
+          /^(are you able to|can you|could you|would you|is it possible)/i,
+          /^(what'?s |whats )(running|live|deployed|happening|wrong|broken|up)/i,
+          /^(is it|is that|is this|is there|is the)/i,
+        ];
+        return conversationalPatterns.some((re) => re.test(m));
+      }
+      const firstMessageIsConversational =
+        mcp_token !== undefined && // tools available
+        isConversational(message.trim());
+
      try {
        // Tool-calling loop: use non-streaming so thought_signature is
        // always present in the complete response (required by thinking models).
@@ -623,7 +672,12 @@ export async function POST(request: Request) {
          if (aborted) break;
          round++;

-          const toolDefs = mcp_token ? VIBN_TOOL_DEFINITIONS : [];
+          // On round 1, withhold tools if the message looks conversational.
+          // The model must answer in text first; tools unlock from round 2.
+          const toolDefs =
+            mcp_token && !(round === 1 && firstMessageIsConversational)
+              ? VIBN_TOOL_DEFINITIONS
+              : [];

          // Every 2 silent rounds or 5 tool calls, nudge the model to surface a one-liner
          // status before continuing. This is the user's only signal of
@@ -637,6 +691,16 @@ export async function POST(request: Request) {
              "on and why. The user is staring at silent tool pills."
            : "";

+          // When withholding tools on round 1 (conversational guard), add a
+          // mandatory instruction so the model doesn't return empty text.
+          if (round === 1 && firstMessageIsConversational) {
+            extraSystem +=
+              "\n\n[MANDATORY] The user's message is a question or conversational input, " +
+              "not a command. You have NO tools available on this turn. " +
+              "Respond with PLAIN TEXT ONLY in 1-3 sentences answering their question. " +
+              "If they want you to take action, confirm intent and wait for a clear directive.";
+          }
+
          if (MAX_TOOL_ROUNDS - round <= 3) {
            extraSystem += `\n\n[WARNING] You only have ${MAX_TOOL_ROUNDS - round} tool calls left before you are forcefully terminated. Stop exploring, make your final edits, and write your final response to the user NOW.`;
          }
@@ -713,14 +777,17 @@ export async function POST(request: Request) {
            }
          }

-          // Stage 1: Warning at 3 repeats
-          if (maxRepeats === 3) {
-            extraSystem += `\n\n[WARNING] You have called ${repeatedCmd} 3 times recently. Please wrap up this approach or try a completely different tool.`;
+          // C-02: Tightened. Hard-break at 3 identical fingerprints (was 5).
+          if (maxRepeats === 2) {
+            extraSystem += `\n\n[WARNING] You have called ${repeatedCmd} twice in a row. Try a different approach or surface what's blocking you to the user.`;
+          }
+          if (maxRepeats >= 3) {
+            loopBreakReason = `Repeated ${repeatedCmd} ${maxRepeats}× in last 10 calls`;
          }

-          // Stage 2: Hard Break at 5 repeats
-          if (maxRepeats >= 5) {
-            loopBreakReason = `Repeated ${repeatedCmd} ${maxRepeats}× in last 10 calls`;
+          // C-02: Also hard-break after 6 consecutive tool calls with no text.
+          if (!loopBreakReason && toolCallsSinceText >= 6) {
+            loopBreakReason = `${toolCallsSinceText} consecutive tool calls with no assistant text`;
          }

          // Execute tool calls and add results. OpenAI-compatible APIs
@@ -730,15 +797,32 @@ export async function POST(request: Request) {
          const recoveryLines: string[] = [];
          for (const tc of resp.toolCalls) {
            if (aborted) break;
-            const result = mcp_token
-              ? await executeMcpTool(
+            // C-05: Per-tool timeout. A hung MCP call would freeze the whole turn.
+            const TOOL_TIMEOUT_MS = 45_000;
+            const toolTimeout = new Promise<string>((resolve) =>
+              setTimeout(
+                () =>
+                  resolve(
+                    JSON.stringify({
+                      ok: false,
+                      error: `Tool ${tc.name} timed out after ${TOOL_TIMEOUT_MS / 1000}s`,
+                    }),
+                  ),
+                TOOL_TIMEOUT_MS,
+              ),
+            );
+            const toolExec = mcp_token
+              ? executeMcpTool(
                  tc.name,
                  tc.args,
                  mcp_token,
                  baseUrl,
                  activeProject?.id,
                )
-              : JSON.stringify({ error: "No MCP token — read-only mode." });
+              : Promise.resolve(
+                  JSON.stringify({ error: "No MCP token — read-only mode." }),
+                );
+            const result = await Promise.race([toolExec, toolTimeout]);

            emit({
              type: "tool_result",
@@ -756,6 +840,25 @@ export async function POST(request: Request) {

            const recovery = detectKnownError(result);
            if (recovery) recoveryLines.push(formatRecoveryMessage(recovery));
+
+            // B-05: SSE plan event — stream task state changes to the client
+            // so the Plan tab updates in real-time during a chat turn.
+            if (tc.name === "plan_task_add" || tc.name === "plan_task_edit") {
+              try {
+                const parsed = JSON.parse(result);
+                const task = parsed?.result?.task ?? parsed?.task;
+                if (task?.id) {
+                  emit({
+                    type: "plan",
+                    taskId: task.id,
+                    text: task.text ?? task.title ?? "",
+                    status: task.status ?? "open",
+                  });
+                }
+              } catch {
+                // non-JSON result — skip
+              }
+            }
          }
          for (const line of recoveryLines) {
            messages.push({ role: "user", content: line });
@@ -787,12 +890,15 @@ export async function POST(request: Request) {
        // 20 toolCalls, user had to re-prompt to get any answer.
        const lastTurnHadTools =
          messages.length > 0 && messages[messages.length - 1].role === "tool";
+        // C-07: Also recover when the model has been running tools without
+        // any text for >=4 rounds — the user is staring at silence.
        const needsRecovery =
          !aborted &&
          lastTurnHadTools &&
          (round >= MAX_TOOL_ROUNDS ||
            !!loopBreakReason ||
            assistantText.trim().length === 0 ||
+            roundsSinceText >= 4 ||
            lastToolResultsHadFailure(messages));

        if (needsRecovery) {
@@ -1072,9 +1178,9 @@ export async function POST(request: Request) {
      }
    },
    cancel() {
-      // Browser disconnected (tab closed, navigated away). Nothing to
-      // do — the abort handler above already flipped the flag and the
-      // loop will bail at the next checkpoint.
+      // Browser disconnected (tab closed, navigated away). Clear the
+      // heartbeat so we stop writing to a closed stream.
+      // The abort handler above already flipped the flag so the loop bails.
    },
  });