Files
vibn-frontend/COLLECTOR_TO_EXTRACTION_FLOW.md

9.9 KiB

Collector → Extraction Flow: Dependency Order

Overview

This document explains the exact order of operations when a user completes the Collector phase and transitions to Extraction Review.


Phase Flow Diagram

User says "that's everything"
         ↓
[1] AI detects readiness
         ↓
[2] Handoff persisted to Firestore
         ↓
[3] Backend extraction triggered (async)
         ↓
[4] Phase transitions to extraction_review
         ↓
[5] Mode resolver detects new phase
         ↓
[6] AI responds in extraction_review_mode

Detailed Step-by-Step

Step 1: User Confirmation

Trigger: User sends message like:

  • "that's everything"
  • "yes, analyze now"
  • "I'm ready"

What happens:

  • Message goes to /api/ai/chat POST handler
  • LLM is called with full conversation history
  • LLM returns structured response with collectorHandoff object

Location: /app/api/ai/chat/route.ts, lines 154-180


Step 2: Handoff Detection

Dependencies:

  • AI's reply.collectorHandoff?.readyForExtraction OR
  • Fallback: AI's reply text contains trigger phrases

What happens:

// Primary: Check structured output
let readyForExtraction = reply.collectorHandoff?.readyForExtraction ?? false;

// Fallback: Check reply text for phrases like "Perfect! Let me analyze"
if (!readyForExtraction && reply.reply) {
  const confirmPhrases = [
    'perfect! let me analyze',
    'perfect! i\'m starting',
    // ... etc
  ];
  const replyLower = reply.reply.toLowerCase();
  readyForExtraction = confirmPhrases.some(phrase => replyLower.includes(phrase));
}

Location: /app/api/ai/chat/route.ts, lines 191-210

Critical: If this doesn't detect readiness, the flow STOPS here.


Step 3: Build and Persist Collector Handoff

Dependencies:

  • readyForExtraction === true (from Step 2)
  • Project context data (documents, GitHub, extension status)

What happens:

const handoff: CollectorPhaseHandoff = {
  phase: 'collector',
  readyForNextPhase: readyForExtraction,  // Must be true!
  confidence: readyForExtraction ? 0.9 : 0.5,
  confirmed: {
    hasDocuments: (context.knowledgeSummary.bySourceType['imported_document'] ?? 0) > 0,
    documentCount: context.knowledgeSummary.bySourceType['imported_document'] ?? 0,
    githubConnected: !!context.project.githubRepo,
    githubRepo: context.project.githubRepo,
    extensionLinked: context.project.extensionLinked ?? false,
  },
  // ... etc
};

// Persist to Firestore
await adminDb.collection('projects').doc(projectId).set(
  { 'phaseData.phaseHandoffs.collector': handoff },
  { merge: true }
);

Location: /app/api/ai/chat/route.ts, lines 212-242

Data written:

  • projects/{projectId}/phaseData.phaseHandoffs.collector
    • readyForNextPhase: true
    • confirmed: { hasDocuments, githubConnected, extensionLinked }

Step 4: Mark Collector Complete

Dependencies:

  • handoff.readyForNextPhase === true (from Step 3)

What happens:

if (handoff.readyForNextPhase) {
  console.log(`[AI Chat] Collector complete - triggering backend extraction`);
  
  // Mark collector as complete
  await adminDb.collection('projects').doc(projectId).update({
    'phaseData.collectorCompletedAt': new Date().toISOString(),
  });
  
  // ... (Step 5 happens next)
}

Location: /app/api/ai/chat/route.ts, lines 252-260

Data written:

  • projects/{projectId}/phaseData.collectorCompletedAt = timestamp

Step 5: Trigger Backend Extraction (Async)

Dependencies:

  • Collector marked complete (from Step 4)

What happens:

// Trigger backend extraction (async - don't await)
import('@/lib/server/backend-extractor').then(({ runBackendExtractionForProject }) => {
  runBackendExtractionForProject(projectId).catch((error) => {
    console.error(`[AI Chat] Backend extraction failed for project ${projectId}:`, error);
  });
});

Location: /app/api/ai/chat/route.ts, lines 263-267

Critical: This is asynchronous - the chat response returns BEFORE extraction completes!


Step 6: Backend Extraction Runs

Dependencies:

  • Called from Step 5

What happens:

  1. Load project data

    const projectDoc = await adminDb.collection('projects').doc(projectId).get();
    const projectData = projectDoc.data();
    
  2. Load knowledge_items (documents)

    const knowledgeSnapshot = await adminDb
      .collection('knowledge_items')
      .where('projectId', '==', projectId)
      .where('sourceType', '==', 'imported_document')
      .get();
    
  3. Check if empty:

    • If NO documents: Create empty handoff, skip to Step 6d
    • If HAS documents: Process each document (call LLM, extract insights, write chunks)
  4. Build extraction handoff:

    const extractionHandoff: PhaseHandoff = {
      phase: 'extraction',
      readyForNextPhase: boolean,  // true if insights found, false if no docs
      confidence: number,
      confirmed: { problems, targetUsers, features, constraints, opportunities },
      missing: [...],
      questionsForUser: [...],
      // ...
    };
    
  5. Persist extraction handoff and transition phase:

    await adminDb.collection('projects').doc(projectId).update({
      'phaseData.phaseHandoffs.extraction': extractionHandoff,
      currentPhase: 'extraction_review',  // ← PHASE TRANSITION!
      phaseStatus: 'in_progress',
      'phaseData.extractionCompletedAt': new Date().toISOString(),
    });
    

Location: /lib/server/backend-extractor.ts, entire file

Data written:

  • projects/{projectId}/currentPhase = "extraction_review"
  • projects/{projectId}/phaseData.phaseHandoffs.extraction = extraction results
  • chat_extractions/{id} = per-document extraction data (if documents exist)
  • knowledge_chunks (AlloyDB) = vectorized insights (if documents exist)

Duration: Could take 5-60 seconds depending on document count and size


Step 7: User Sends Next Message

Dependencies:

  • User sends a new message (e.g., "what did you find?")

What happens:

  1. Mode resolver is called:

    const resolvedMode = await resolveChatMode(projectId);
    
  2. Mode resolver logic (CRITICAL ORDER):

    // PRIORITY: Check explicit phase transitions FIRST
    if (projectData.currentPhase === 'extraction_review' || 
        projectData.currentPhase === 'analyzed') {
      return 'extraction_review_mode';  // ← Returns this!
    }
    
    // These checks are skipped because phase already transitioned:
    if (!hasKnowledge) {
      return 'collector_mode';
    }
    if (hasKnowledge && !hasExtractions) {
      return 'collector_mode';
    }
    
  3. Context builder loads extraction data:

    if (mode === 'extraction_review_mode') {
      context.phaseData.phaseHandoffs.extraction = ...;
      context.extractionSummary = ...;
      // Does NOT load raw documents
    }
    
  4. System prompt selected:

    const systemPrompt = EXTRACTION_REVIEW_V2.prompt;
    // Instructs AI to:
    // - NOT say "processing"
    // - Present extraction results
    // - Ask clarifying questions
    
  5. AI responds in extraction_review_mode

Location:

  • /lib/server/chat-mode-resolver.ts (mode resolution)
  • /lib/server/chat-context.ts (context building)
  • /lib/ai/prompts/extraction-review.ts (system prompt)

Critical Dependencies

For handoff to trigger:

  1. AI must return readyForExtraction: true OR say trigger phrase
  2. Firestore must persist phaseData.phaseHandoffs.collector

For backend extraction to run:

  1. handoff.readyForNextPhase === true
  2. runBackendExtractionForProject() must be called

For phase transition:

  1. Backend extraction must complete successfully
  2. Firestore must write currentPhase: 'extraction_review'

For mode to switch to extraction_review:

  1. currentPhase === 'extraction_review' in Firestore
  2. Mode resolver must check currentPhase BEFORE checking hasKnowledge

For AI to stop hallucinating:

  1. Mode must be extraction_review_mode (not collector_mode)
  2. System prompt must be EXTRACTION_REVIEW_V2
  3. Context must include phaseData.phaseHandoffs.extraction

What Can Go Wrong?

Issue 1: Handoff doesn't trigger

  • Symptom: AI keeps asking for more materials
  • Cause: readyForExtraction is false
  • Fix: Check fallback phrase detection is working

Issue 2: Backend extraction exits early

  • Symptom: Phase stays as collector, no extraction handoff
  • Cause: No documents uploaded, empty handoff not created
  • Fix: Ensure empty handoff logic runs (lines 58-93 in backend-extractor.ts)

Issue 3: Mode stays as collector_mode

  • Symptom: projectPhase: "extraction_review" but mode: "collector_mode"
  • Cause: Mode resolver checking !hasKnowledge before currentPhase
  • Fix: Reorder mode resolver logic (priority to currentPhase)

Issue 4: AI still says "processing"

  • Symptom: AI says "I'm analyzing..." in extraction_review
  • Cause: Wrong system prompt being used
  • Fix: Verify mode is extraction_review_mode, not collector_mode

Testing Checklist

To verify the full flow works:

  1. Create new project
  2. AI welcomes user with collector checklist
  3. User connects GitHub OR uploads docs
  4. User says "that's everything"
  5. Check Firestore: phaseHandoffs.collector.readyForNextPhase === true
  6. Wait 5 seconds for async extraction
  7. Check Firestore: currentPhase === "extraction_review"
  8. Check Firestore: phaseHandoffs.extraction exists
  9. User sends message: "what did you find?"
  10. API returns mode: "extraction_review_mode"
  11. AI presents extraction results (or asks for missing info)
  12. AI does NOT say "processing" or "analyzing"

Date

November 17, 2025