From a4354a5294739708526aa6696cdda98b0ae2cedc Mon Sep 17 00:00:00 2001 From: mawkone Date: Fri, 15 May 2026 14:35:27 -0700 Subject: [PATCH] feat(ai): implement adaptive 5-dimensional QA rubric for multi-surface evaluation --- vibn-frontend/app/api/mcp/route.ts | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/vibn-frontend/app/api/mcp/route.ts b/vibn-frontend/app/api/mcp/route.ts index 3a461bc0..cd0e7584 100644 --- a/vibn-frontend/app/api/mcp/route.ts +++ b/vibn-frontend/app/api/mcp/route.ts @@ -4535,23 +4535,24 @@ async function toolRequestVisualQA( const fileContent = r.stdout; const prompt = `You are a strict, world-class Senior Design QA Engineer. -Your job is to evaluate the provided UI code against a strict 5-dimensional rubric and catch "AI slop" before the user sees it. -If the code is flawless and follows the design system perfectly, say "PASS". -If it fails ANY criteria, list the exact issues and how to fix them. DO NOT rewrite the whole file, just give actionable critique. +Your job is to evaluate the provided UI code and catch "AI slop" before the user sees it. -## The Rubric (5 Dimensions) -1. Layout & Composition: Are there overlapping text elements? Is the grid broken on mobile? Do containers overflow unexpectedly? -2. Spacing & Padding: Is there enough breathing room? Are paddings consistent? (e.g., tight items should have 4-8px gap, sections should have 24-48px). -3. Contrast & Color: Are hardcoded hex colors (e.g. #FF0000) used instead of Design System CSS variables (e.g. var(--accent), var(--fg))? ALL colors MUST use CSS variables if a design system is active. -4. Hierarchy: Is the primary action obvious? Is secondary text properly muted? -5. Responsiveness: Does the UI stack gracefully on mobile (max-width 760px)? Are \`flex-col\` or \`grid-cols-1\` applied at the right breakpoints? +First, look at the file path and code, and silently classify the surface (e.g., Marketing Page, SaaS Dashboard, Editorial Content, Social Asset). -## The Code +Then, evaluate the code against these 5 adaptive dimensions based on that classification: +1. Philosophy & Intent: Does the layout match the purpose? (Marketing needs persuasion/trust; Tools need density/efficiency; Editorial needs readability). +2. Hierarchy: Is the focal point correct for this surface? (Marketing = CTAs/Value Props; Tools = Primary Actions/Data). +3. Execution: Are padding, alignment, and typography mathematically consistent? MUST use CSS variables/tokens, NO hardcoded colors. +4. Specificity: Does the UI use highly realistic, messy data/copy to prove text wrapping and layout hold up? (Reject Lorem Ipsum, "Metric A", or generic stat-slop). +5. Restraint: Is the signal-to-noise ratio appropriate? (Tools require low cognitive load and quiet colors; Marketing requires decisive, focused flourishes). + +## The Code (\`${targetPath}\`) \`\`\` ${fileContent.slice(0, 15000)} \`\`\` -Evaluate the code. Reply with "PASS" or a concise bulleted list of fixes.`; +If it fails ANY criteria, list the exact issues and how to fix them. DO NOT rewrite the whole file, just give concise actionable critique. +If it is flawless and strictly adheres to these dimensions, reply ONLY with "PASS".`; try { const aiResponse = await callVibnChat({