# Collector → Extraction Flow: Dependency Order ## Overview This document explains the **exact order of operations** when a user completes the Collector phase and transitions to Extraction Review. --- ## Phase Flow Diagram ``` User says "that's everything" ↓ [1] AI detects readiness ↓ [2] Handoff persisted to Firestore ↓ [3] Backend extraction triggered (async) ↓ [4] Phase transitions to extraction_review ↓ [5] Mode resolver detects new phase ↓ [6] AI responds in extraction_review_mode ``` --- ## Detailed Step-by-Step ### **Step 1: User Confirmation** **Trigger:** User sends message like: - "that's everything" - "yes, analyze now" - "I'm ready" **What happens:** - Message goes to `/api/ai/chat` POST handler - LLM is called with full conversation history - LLM returns structured response with `collectorHandoff` object **Location:** `/app/api/ai/chat/route.ts`, lines 154-180 --- ### **Step 2: Handoff Detection** **Dependencies:** - AI's `reply.collectorHandoff?.readyForExtraction` OR - Fallback: AI's reply text contains trigger phrases **What happens:** ```typescript // Primary: Check structured output let readyForExtraction = reply.collectorHandoff?.readyForExtraction ?? false; // Fallback: Check reply text for phrases like "Perfect! Let me analyze" if (!readyForExtraction && reply.reply) { const confirmPhrases = [ 'perfect! let me analyze', 'perfect! i\'m starting', // ... etc ]; const replyLower = reply.reply.toLowerCase(); readyForExtraction = confirmPhrases.some(phrase => replyLower.includes(phrase)); } ``` **Location:** `/app/api/ai/chat/route.ts`, lines 191-210 **Critical:** If this doesn't detect readiness, the flow STOPS here. --- ### **Step 3: Build and Persist Collector Handoff** **Dependencies:** - `readyForExtraction === true` (from Step 2) - Project context data (documents, GitHub, extension status) **What happens:** ```typescript const handoff: CollectorPhaseHandoff = { phase: 'collector', readyForNextPhase: readyForExtraction, // Must be true! confidence: readyForExtraction ? 0.9 : 0.5, confirmed: { hasDocuments: (context.knowledgeSummary.bySourceType['imported_document'] ?? 0) > 0, documentCount: context.knowledgeSummary.bySourceType['imported_document'] ?? 0, githubConnected: !!context.project.githubRepo, githubRepo: context.project.githubRepo, extensionLinked: context.project.extensionLinked ?? false, }, // ... etc }; // Persist to Firestore await adminDb.collection('projects').doc(projectId).set( { 'phaseData.phaseHandoffs.collector': handoff }, { merge: true } ); ``` **Location:** `/app/api/ai/chat/route.ts`, lines 212-242 **Data written:** - `projects/{projectId}/phaseData.phaseHandoffs.collector` - `readyForNextPhase: true` - `confirmed: { hasDocuments, githubConnected, extensionLinked }` --- ### **Step 4: Mark Collector Complete** **Dependencies:** - `handoff.readyForNextPhase === true` (from Step 3) **What happens:** ```typescript if (handoff.readyForNextPhase) { console.log(`[AI Chat] Collector complete - triggering backend extraction`); // Mark collector as complete await adminDb.collection('projects').doc(projectId).update({ 'phaseData.collectorCompletedAt': new Date().toISOString(), }); // ... (Step 5 happens next) } ``` **Location:** `/app/api/ai/chat/route.ts`, lines 252-260 **Data written:** - `projects/{projectId}/phaseData.collectorCompletedAt` = timestamp --- ### **Step 5: Trigger Backend Extraction (Async)** **Dependencies:** - Collector marked complete (from Step 4) **What happens:** ```typescript // Trigger backend extraction (async - don't await) import('@/lib/server/backend-extractor').then(({ runBackendExtractionForProject }) => { runBackendExtractionForProject(projectId).catch((error) => { console.error(`[AI Chat] Backend extraction failed for project ${projectId}:`, error); }); }); ``` **Location:** `/app/api/ai/chat/route.ts`, lines 263-267 **Critical:** This is **asynchronous** - the chat response returns BEFORE extraction completes! --- ### **Step 6: Backend Extraction Runs** **Dependencies:** - Called from Step 5 **What happens:** 1. **Load project data** ```typescript const projectDoc = await adminDb.collection('projects').doc(projectId).get(); const projectData = projectDoc.data(); ``` 2. **Load knowledge_items (documents)** ```typescript const knowledgeSnapshot = await adminDb .collection('knowledge_items') .where('projectId', '==', projectId) .where('sourceType', '==', 'imported_document') .get(); ``` 3. **Check if empty:** - **If NO documents:** Create empty handoff, skip to Step 6d - **If HAS documents:** Process each document (call LLM, extract insights, write chunks) 4. **Build extraction handoff:** ```typescript const extractionHandoff: PhaseHandoff = { phase: 'extraction', readyForNextPhase: boolean, // true if insights found, false if no docs confidence: number, confirmed: { problems, targetUsers, features, constraints, opportunities }, missing: [...], questionsForUser: [...], // ... }; ``` 5. **Persist extraction handoff and transition phase:** ```typescript await adminDb.collection('projects').doc(projectId).update({ 'phaseData.phaseHandoffs.extraction': extractionHandoff, currentPhase: 'extraction_review', // ← PHASE TRANSITION! phaseStatus: 'in_progress', 'phaseData.extractionCompletedAt': new Date().toISOString(), }); ``` **Location:** `/lib/server/backend-extractor.ts`, entire file **Data written:** - `projects/{projectId}/currentPhase` = `"extraction_review"` - `projects/{projectId}/phaseData.phaseHandoffs.extraction` = extraction results - `chat_extractions/{id}` = per-document extraction data (if documents exist) - `knowledge_chunks` (AlloyDB) = vectorized insights (if documents exist) **Duration:** Could take 5-60 seconds depending on document count and size --- ### **Step 7: User Sends Next Message** **Dependencies:** - User sends a new message (e.g., "what did you find?") **What happens:** 1. **Mode resolver is called:** ```typescript const resolvedMode = await resolveChatMode(projectId); ``` 2. **Mode resolver logic (CRITICAL ORDER):** ```typescript // PRIORITY: Check explicit phase transitions FIRST if (projectData.currentPhase === 'extraction_review' || projectData.currentPhase === 'analyzed') { return 'extraction_review_mode'; // ← Returns this! } // These checks are skipped because phase already transitioned: if (!hasKnowledge) { return 'collector_mode'; } if (hasKnowledge && !hasExtractions) { return 'collector_mode'; } ``` 3. **Context builder loads extraction data:** ```typescript if (mode === 'extraction_review_mode') { context.phaseData.phaseHandoffs.extraction = ...; context.extractionSummary = ...; // Does NOT load raw documents } ``` 4. **System prompt selected:** ```typescript const systemPrompt = EXTRACTION_REVIEW_V2.prompt; // Instructs AI to: // - NOT say "processing" // - Present extraction results // - Ask clarifying questions ``` 5. **AI responds in extraction_review_mode** **Location:** - `/lib/server/chat-mode-resolver.ts` (mode resolution) - `/lib/server/chat-context.ts` (context building) - `/lib/ai/prompts/extraction-review.ts` (system prompt) --- ## Critical Dependencies ### **For handoff to trigger:** 1. ✅ AI must return `readyForExtraction: true` OR say trigger phrase 2. ✅ Firestore must persist `phaseData.phaseHandoffs.collector` ### **For backend extraction to run:** 1. ✅ `handoff.readyForNextPhase === true` 2. ✅ `runBackendExtractionForProject()` must be called ### **For phase transition:** 1. ✅ Backend extraction must complete successfully 2. ✅ Firestore must write `currentPhase: 'extraction_review'` ### **For mode to switch to extraction_review:** 1. ✅ `currentPhase === 'extraction_review'` in Firestore 2. ✅ Mode resolver must check `currentPhase` BEFORE checking `hasKnowledge` ### **For AI to stop hallucinating:** 1. ✅ Mode must be `extraction_review_mode` (not `collector_mode`) 2. ✅ System prompt must be `EXTRACTION_REVIEW_V2` 3. ✅ Context must include `phaseData.phaseHandoffs.extraction` --- ## What Can Go Wrong? ### **Issue 1: Handoff doesn't trigger** - **Symptom:** AI keeps asking for more materials - **Cause:** `readyForExtraction` is false - **Fix:** Check fallback phrase detection is working ### **Issue 2: Backend extraction exits early** - **Symptom:** Phase stays as `collector`, no extraction handoff - **Cause:** No documents uploaded, empty handoff not created - **Fix:** Ensure empty handoff logic runs (lines 58-93 in `backend-extractor.ts`) ### **Issue 3: Mode stays as `collector_mode`** - **Symptom:** `projectPhase: "extraction_review"` but `mode: "collector_mode"` - **Cause:** Mode resolver checking `!hasKnowledge` before `currentPhase` - **Fix:** Reorder mode resolver logic (priority to `currentPhase`) ### **Issue 4: AI still says "processing"** - **Symptom:** AI says "I'm analyzing..." in extraction_review - **Cause:** Wrong system prompt being used - **Fix:** Verify mode is `extraction_review_mode`, not `collector_mode` --- ## Testing Checklist To verify the full flow works: 1. ✅ Create new project 2. ✅ AI welcomes user with collector checklist 3. ✅ User connects GitHub OR uploads docs 4. ✅ User says "that's everything" 5. ✅ Check Firestore: `phaseHandoffs.collector.readyForNextPhase === true` 6. ✅ Wait 5 seconds for async extraction 7. ✅ Check Firestore: `currentPhase === "extraction_review"` 8. ✅ Check Firestore: `phaseHandoffs.extraction` exists 9. ✅ User sends message: "what did you find?" 10. ✅ API returns `mode: "extraction_review_mode"` 11. ✅ AI presents extraction results (or asks for missing info) 12. ✅ AI does NOT say "processing" or "analyzing" --- ## Date November 17, 2025