feat(ai): implement adaptive 5-dimensional QA rubric for multi-surface evaluation

This commit is contained in:
2026-05-15 14:35:27 -07:00
parent 2db6bee780
commit dea388f99d

View File

@@ -4535,23 +4535,24 @@ async function toolRequestVisualQA(
const fileContent = r.stdout;
const prompt = `You are a strict, world-class Senior Design QA Engineer.
Your job is to evaluate the provided UI code against a strict 5-dimensional rubric and catch "AI slop" before the user sees it.
If the code is flawless and follows the design system perfectly, say "PASS".
If it fails ANY criteria, list the exact issues and how to fix them. DO NOT rewrite the whole file, just give actionable critique.
Your job is to evaluate the provided UI code and catch "AI slop" before the user sees it.
## The Rubric (5 Dimensions)
1. Layout & Composition: Are there overlapping text elements? Is the grid broken on mobile? Do containers overflow unexpectedly?
2. Spacing & Padding: Is there enough breathing room? Are paddings consistent? (e.g., tight items should have 4-8px gap, sections should have 24-48px).
3. Contrast & Color: Are hardcoded hex colors (e.g. #FF0000) used instead of Design System CSS variables (e.g. var(--accent), var(--fg))? ALL colors MUST use CSS variables if a design system is active.
4. Hierarchy: Is the primary action obvious? Is secondary text properly muted?
5. Responsiveness: Does the UI stack gracefully on mobile (max-width 760px)? Are \`flex-col\` or \`grid-cols-1\` applied at the right breakpoints?
First, look at the file path and code, and silently classify the surface (e.g., Marketing Page, SaaS Dashboard, Editorial Content, Social Asset).
## The Code
Then, evaluate the code against these 5 adaptive dimensions based on that classification:
1. Philosophy & Intent: Does the layout match the purpose? (Marketing needs persuasion/trust; Tools need density/efficiency; Editorial needs readability).
2. Hierarchy: Is the focal point correct for this surface? (Marketing = CTAs/Value Props; Tools = Primary Actions/Data).
3. Execution: Are padding, alignment, and typography mathematically consistent? MUST use CSS variables/tokens, NO hardcoded colors.
4. Specificity: Does the UI use highly realistic, messy data/copy to prove text wrapping and layout hold up? (Reject Lorem Ipsum, "Metric A", or generic stat-slop).
5. Restraint: Is the signal-to-noise ratio appropriate? (Tools require low cognitive load and quiet colors; Marketing requires decisive, focused flourishes).
## The Code (\`${targetPath}\`)
\`\`\`
${fileContent.slice(0, 15000)}
\`\`\`
Evaluate the code. Reply with "PASS" or a concise bulleted list of fixes.`;
If it fails ANY criteria, list the exact issues and how to fix them. DO NOT rewrite the whole file, just give concise actionable critique.
If it is flawless and strictly adheres to these dimensions, reply ONLY with "PASS".`;
try {
const aiResponse = await callVibnChat({