354 lines
9.9 KiB
Markdown
354 lines
9.9 KiB
Markdown
# Collector → Extraction Flow: Dependency Order
|
|
|
|
## Overview
|
|
|
|
This document explains the **exact order of operations** when a user completes the Collector phase and transitions to Extraction Review.
|
|
|
|
---
|
|
|
|
## Phase Flow Diagram
|
|
|
|
```
|
|
User says "that's everything"
|
|
↓
|
|
[1] AI detects readiness
|
|
↓
|
|
[2] Handoff persisted to Firestore
|
|
↓
|
|
[3] Backend extraction triggered (async)
|
|
↓
|
|
[4] Phase transitions to extraction_review
|
|
↓
|
|
[5] Mode resolver detects new phase
|
|
↓
|
|
[6] AI responds in extraction_review_mode
|
|
```
|
|
|
|
---
|
|
|
|
## Detailed Step-by-Step
|
|
|
|
### **Step 1: User Confirmation**
|
|
|
|
**Trigger:** User sends message like:
|
|
- "that's everything"
|
|
- "yes, analyze now"
|
|
- "I'm ready"
|
|
|
|
**What happens:**
|
|
- Message goes to `/api/ai/chat` POST handler
|
|
- LLM is called with full conversation history
|
|
- LLM returns structured response with `collectorHandoff` object
|
|
|
|
**Location:** `/app/api/ai/chat/route.ts`, lines 154-180
|
|
|
|
---
|
|
|
|
### **Step 2: Handoff Detection**
|
|
|
|
**Dependencies:**
|
|
- AI's `reply.collectorHandoff?.readyForExtraction` OR
|
|
- Fallback: AI's reply text contains trigger phrases
|
|
|
|
**What happens:**
|
|
|
|
```typescript
|
|
// Primary: Check structured output
|
|
let readyForExtraction = reply.collectorHandoff?.readyForExtraction ?? false;
|
|
|
|
// Fallback: Check reply text for phrases like "Perfect! Let me analyze"
|
|
if (!readyForExtraction && reply.reply) {
|
|
const confirmPhrases = [
|
|
'perfect! let me analyze',
|
|
'perfect! i\'m starting',
|
|
// ... etc
|
|
];
|
|
const replyLower = reply.reply.toLowerCase();
|
|
readyForExtraction = confirmPhrases.some(phrase => replyLower.includes(phrase));
|
|
}
|
|
```
|
|
|
|
**Location:** `/app/api/ai/chat/route.ts`, lines 191-210
|
|
|
|
**Critical:** If this doesn't detect readiness, the flow STOPS here.
|
|
|
|
---
|
|
|
|
### **Step 3: Build and Persist Collector Handoff**
|
|
|
|
**Dependencies:**
|
|
- `readyForExtraction === true` (from Step 2)
|
|
- Project context data (documents, GitHub, extension status)
|
|
|
|
**What happens:**
|
|
|
|
```typescript
|
|
const handoff: CollectorPhaseHandoff = {
|
|
phase: 'collector',
|
|
readyForNextPhase: readyForExtraction, // Must be true!
|
|
confidence: readyForExtraction ? 0.9 : 0.5,
|
|
confirmed: {
|
|
hasDocuments: (context.knowledgeSummary.bySourceType['imported_document'] ?? 0) > 0,
|
|
documentCount: context.knowledgeSummary.bySourceType['imported_document'] ?? 0,
|
|
githubConnected: !!context.project.githubRepo,
|
|
githubRepo: context.project.githubRepo,
|
|
extensionLinked: context.project.extensionLinked ?? false,
|
|
},
|
|
// ... etc
|
|
};
|
|
|
|
// Persist to Firestore
|
|
await adminDb.collection('projects').doc(projectId).set(
|
|
{ 'phaseData.phaseHandoffs.collector': handoff },
|
|
{ merge: true }
|
|
);
|
|
```
|
|
|
|
**Location:** `/app/api/ai/chat/route.ts`, lines 212-242
|
|
|
|
**Data written:**
|
|
- `projects/{projectId}/phaseData.phaseHandoffs.collector`
|
|
- `readyForNextPhase: true`
|
|
- `confirmed: { hasDocuments, githubConnected, extensionLinked }`
|
|
|
|
---
|
|
|
|
### **Step 4: Mark Collector Complete**
|
|
|
|
**Dependencies:**
|
|
- `handoff.readyForNextPhase === true` (from Step 3)
|
|
|
|
**What happens:**
|
|
|
|
```typescript
|
|
if (handoff.readyForNextPhase) {
|
|
console.log(`[AI Chat] Collector complete - triggering backend extraction`);
|
|
|
|
// Mark collector as complete
|
|
await adminDb.collection('projects').doc(projectId).update({
|
|
'phaseData.collectorCompletedAt': new Date().toISOString(),
|
|
});
|
|
|
|
// ... (Step 5 happens next)
|
|
}
|
|
```
|
|
|
|
**Location:** `/app/api/ai/chat/route.ts`, lines 252-260
|
|
|
|
**Data written:**
|
|
- `projects/{projectId}/phaseData.collectorCompletedAt` = timestamp
|
|
|
|
---
|
|
|
|
### **Step 5: Trigger Backend Extraction (Async)**
|
|
|
|
**Dependencies:**
|
|
- Collector marked complete (from Step 4)
|
|
|
|
**What happens:**
|
|
|
|
```typescript
|
|
// Trigger backend extraction (async - don't await)
|
|
import('@/lib/server/backend-extractor').then(({ runBackendExtractionForProject }) => {
|
|
runBackendExtractionForProject(projectId).catch((error) => {
|
|
console.error(`[AI Chat] Backend extraction failed for project ${projectId}:`, error);
|
|
});
|
|
});
|
|
```
|
|
|
|
**Location:** `/app/api/ai/chat/route.ts`, lines 263-267
|
|
|
|
**Critical:** This is **asynchronous** - the chat response returns BEFORE extraction completes!
|
|
|
|
---
|
|
|
|
### **Step 6: Backend Extraction Runs**
|
|
|
|
**Dependencies:**
|
|
- Called from Step 5
|
|
|
|
**What happens:**
|
|
|
|
1. **Load project data**
|
|
```typescript
|
|
const projectDoc = await adminDb.collection('projects').doc(projectId).get();
|
|
const projectData = projectDoc.data();
|
|
```
|
|
|
|
2. **Load knowledge_items (documents)**
|
|
```typescript
|
|
const knowledgeSnapshot = await adminDb
|
|
.collection('knowledge_items')
|
|
.where('projectId', '==', projectId)
|
|
.where('sourceType', '==', 'imported_document')
|
|
.get();
|
|
```
|
|
|
|
3. **Check if empty:**
|
|
- **If NO documents:** Create empty handoff, skip to Step 6d
|
|
- **If HAS documents:** Process each document (call LLM, extract insights, write chunks)
|
|
|
|
4. **Build extraction handoff:**
|
|
```typescript
|
|
const extractionHandoff: PhaseHandoff = {
|
|
phase: 'extraction',
|
|
readyForNextPhase: boolean, // true if insights found, false if no docs
|
|
confidence: number,
|
|
confirmed: { problems, targetUsers, features, constraints, opportunities },
|
|
missing: [...],
|
|
questionsForUser: [...],
|
|
// ...
|
|
};
|
|
```
|
|
|
|
5. **Persist extraction handoff and transition phase:**
|
|
```typescript
|
|
await adminDb.collection('projects').doc(projectId).update({
|
|
'phaseData.phaseHandoffs.extraction': extractionHandoff,
|
|
currentPhase: 'extraction_review', // ← PHASE TRANSITION!
|
|
phaseStatus: 'in_progress',
|
|
'phaseData.extractionCompletedAt': new Date().toISOString(),
|
|
});
|
|
```
|
|
|
|
**Location:** `/lib/server/backend-extractor.ts`, entire file
|
|
|
|
**Data written:**
|
|
- `projects/{projectId}/currentPhase` = `"extraction_review"`
|
|
- `projects/{projectId}/phaseData.phaseHandoffs.extraction` = extraction results
|
|
- `chat_extractions/{id}` = per-document extraction data (if documents exist)
|
|
- `knowledge_chunks` (AlloyDB) = vectorized insights (if documents exist)
|
|
|
|
**Duration:** Could take 5-60 seconds depending on document count and size
|
|
|
|
---
|
|
|
|
### **Step 7: User Sends Next Message**
|
|
|
|
**Dependencies:**
|
|
- User sends a new message (e.g., "what did you find?")
|
|
|
|
**What happens:**
|
|
|
|
1. **Mode resolver is called:**
|
|
```typescript
|
|
const resolvedMode = await resolveChatMode(projectId);
|
|
```
|
|
|
|
2. **Mode resolver logic (CRITICAL ORDER):**
|
|
```typescript
|
|
// PRIORITY: Check explicit phase transitions FIRST
|
|
if (projectData.currentPhase === 'extraction_review' ||
|
|
projectData.currentPhase === 'analyzed') {
|
|
return 'extraction_review_mode'; // ← Returns this!
|
|
}
|
|
|
|
// These checks are skipped because phase already transitioned:
|
|
if (!hasKnowledge) {
|
|
return 'collector_mode';
|
|
}
|
|
if (hasKnowledge && !hasExtractions) {
|
|
return 'collector_mode';
|
|
}
|
|
```
|
|
|
|
3. **Context builder loads extraction data:**
|
|
```typescript
|
|
if (mode === 'extraction_review_mode') {
|
|
context.phaseData.phaseHandoffs.extraction = ...;
|
|
context.extractionSummary = ...;
|
|
// Does NOT load raw documents
|
|
}
|
|
```
|
|
|
|
4. **System prompt selected:**
|
|
```typescript
|
|
const systemPrompt = EXTRACTION_REVIEW_V2.prompt;
|
|
// Instructs AI to:
|
|
// - NOT say "processing"
|
|
// - Present extraction results
|
|
// - Ask clarifying questions
|
|
```
|
|
|
|
5. **AI responds in extraction_review_mode**
|
|
|
|
**Location:**
|
|
- `/lib/server/chat-mode-resolver.ts` (mode resolution)
|
|
- `/lib/server/chat-context.ts` (context building)
|
|
- `/lib/ai/prompts/extraction-review.ts` (system prompt)
|
|
|
|
---
|
|
|
|
## Critical Dependencies
|
|
|
|
### **For handoff to trigger:**
|
|
1. ✅ AI must return `readyForExtraction: true` OR say trigger phrase
|
|
2. ✅ Firestore must persist `phaseData.phaseHandoffs.collector`
|
|
|
|
### **For backend extraction to run:**
|
|
1. ✅ `handoff.readyForNextPhase === true`
|
|
2. ✅ `runBackendExtractionForProject()` must be called
|
|
|
|
### **For phase transition:**
|
|
1. ✅ Backend extraction must complete successfully
|
|
2. ✅ Firestore must write `currentPhase: 'extraction_review'`
|
|
|
|
### **For mode to switch to extraction_review:**
|
|
1. ✅ `currentPhase === 'extraction_review'` in Firestore
|
|
2. ✅ Mode resolver must check `currentPhase` BEFORE checking `hasKnowledge`
|
|
|
|
### **For AI to stop hallucinating:**
|
|
1. ✅ Mode must be `extraction_review_mode` (not `collector_mode`)
|
|
2. ✅ System prompt must be `EXTRACTION_REVIEW_V2`
|
|
3. ✅ Context must include `phaseData.phaseHandoffs.extraction`
|
|
|
|
---
|
|
|
|
## What Can Go Wrong?
|
|
|
|
### **Issue 1: Handoff doesn't trigger**
|
|
- **Symptom:** AI keeps asking for more materials
|
|
- **Cause:** `readyForExtraction` is false
|
|
- **Fix:** Check fallback phrase detection is working
|
|
|
|
### **Issue 2: Backend extraction exits early**
|
|
- **Symptom:** Phase stays as `collector`, no extraction handoff
|
|
- **Cause:** No documents uploaded, empty handoff not created
|
|
- **Fix:** Ensure empty handoff logic runs (lines 58-93 in `backend-extractor.ts`)
|
|
|
|
### **Issue 3: Mode stays as `collector_mode`**
|
|
- **Symptom:** `projectPhase: "extraction_review"` but `mode: "collector_mode"`
|
|
- **Cause:** Mode resolver checking `!hasKnowledge` before `currentPhase`
|
|
- **Fix:** Reorder mode resolver logic (priority to `currentPhase`)
|
|
|
|
### **Issue 4: AI still says "processing"**
|
|
- **Symptom:** AI says "I'm analyzing..." in extraction_review
|
|
- **Cause:** Wrong system prompt being used
|
|
- **Fix:** Verify mode is `extraction_review_mode`, not `collector_mode`
|
|
|
|
---
|
|
|
|
## Testing Checklist
|
|
|
|
To verify the full flow works:
|
|
|
|
1. ✅ Create new project
|
|
2. ✅ AI welcomes user with collector checklist
|
|
3. ✅ User connects GitHub OR uploads docs
|
|
4. ✅ User says "that's everything"
|
|
5. ✅ Check Firestore: `phaseHandoffs.collector.readyForNextPhase === true`
|
|
6. ✅ Wait 5 seconds for async extraction
|
|
7. ✅ Check Firestore: `currentPhase === "extraction_review"`
|
|
8. ✅ Check Firestore: `phaseHandoffs.extraction` exists
|
|
9. ✅ User sends message: "what did you find?"
|
|
10. ✅ API returns `mode: "extraction_review_mode"`
|
|
11. ✅ AI presents extraction results (or asks for missing info)
|
|
12. ✅ AI does NOT say "processing" or "analyzing"
|
|
|
|
---
|
|
|
|
## Date
|
|
|
|
November 17, 2025
|
|
|