feat(ai): implement adaptive 5-dimensional QA rubric for multi-surface evaluation
This commit is contained in:
@@ -4535,23 +4535,24 @@ async function toolRequestVisualQA(
|
|||||||
const fileContent = r.stdout;
|
const fileContent = r.stdout;
|
||||||
|
|
||||||
const prompt = `You are a strict, world-class Senior Design QA Engineer.
|
const prompt = `You are a strict, world-class Senior Design QA Engineer.
|
||||||
Your job is to evaluate the provided UI code against a strict 5-dimensional rubric and catch "AI slop" before the user sees it.
|
Your job is to evaluate the provided UI code and catch "AI slop" before the user sees it.
|
||||||
If the code is flawless and follows the design system perfectly, say "PASS".
|
|
||||||
If it fails ANY criteria, list the exact issues and how to fix them. DO NOT rewrite the whole file, just give actionable critique.
|
|
||||||
|
|
||||||
## The Rubric (5 Dimensions)
|
First, look at the file path and code, and silently classify the surface (e.g., Marketing Page, SaaS Dashboard, Editorial Content, Social Asset).
|
||||||
1. Layout & Composition: Are there overlapping text elements? Is the grid broken on mobile? Do containers overflow unexpectedly?
|
|
||||||
2. Spacing & Padding: Is there enough breathing room? Are paddings consistent? (e.g., tight items should have 4-8px gap, sections should have 24-48px).
|
|
||||||
3. Contrast & Color: Are hardcoded hex colors (e.g. #FF0000) used instead of Design System CSS variables (e.g. var(--accent), var(--fg))? ALL colors MUST use CSS variables if a design system is active.
|
|
||||||
4. Hierarchy: Is the primary action obvious? Is secondary text properly muted?
|
|
||||||
5. Responsiveness: Does the UI stack gracefully on mobile (max-width 760px)? Are \`flex-col\` or \`grid-cols-1\` applied at the right breakpoints?
|
|
||||||
|
|
||||||
## The Code
|
Then, evaluate the code against these 5 adaptive dimensions based on that classification:
|
||||||
|
1. Philosophy & Intent: Does the layout match the purpose? (Marketing needs persuasion/trust; Tools need density/efficiency; Editorial needs readability).
|
||||||
|
2. Hierarchy: Is the focal point correct for this surface? (Marketing = CTAs/Value Props; Tools = Primary Actions/Data).
|
||||||
|
3. Execution: Are padding, alignment, and typography mathematically consistent? MUST use CSS variables/tokens, NO hardcoded colors.
|
||||||
|
4. Specificity: Does the UI use highly realistic, messy data/copy to prove text wrapping and layout hold up? (Reject Lorem Ipsum, "Metric A", or generic stat-slop).
|
||||||
|
5. Restraint: Is the signal-to-noise ratio appropriate? (Tools require low cognitive load and quiet colors; Marketing requires decisive, focused flourishes).
|
||||||
|
|
||||||
|
## The Code (\`${targetPath}\`)
|
||||||
\`\`\`
|
\`\`\`
|
||||||
${fileContent.slice(0, 15000)}
|
${fileContent.slice(0, 15000)}
|
||||||
\`\`\`
|
\`\`\`
|
||||||
|
|
||||||
Evaluate the code. Reply with "PASS" or a concise bulleted list of fixes.`;
|
If it fails ANY criteria, list the exact issues and how to fix them. DO NOT rewrite the whole file, just give concise actionable critique.
|
||||||
|
If it is flawless and strictly adheres to these dimensions, reply ONLY with "PASS".`;
|
||||||
|
|
||||||
try {
|
try {
|
||||||
const aiResponse = await callVibnChat({
|
const aiResponse = await callVibnChat({
|
||||||
|
|||||||
Reference in New Issue
Block a user