# Backend-Led Extraction: Fixes Applied

## Summary

Fixed critical bugs preventing the backend-led extraction flow from working correctly. The system now properly transitions from `collector` → `extraction_review` phases and the AI no longer hallucinates "processing" messages.

---

## Issues Fixed

### 1. **Handoff Not Triggering** (`/app/api/ai/chat/route.ts`)

**Problem:** The AI wasn't consistently returning `collectorHandoff.readyForExtraction: true` in the structured JSON response when the user said "that's everything".

**Fix:** Added fallback detection that checks for trigger phrases in the AI's reply text:

```typescript
// Fallback: If AI says certain phrases, assume user confirmed readiness
if (!readyForExtraction && reply.reply) {
  const confirmPhrases = [
    'perfect! let me analyze',
    'perfect! i\'m starting',
    'great! i\'m running',
    'okay, i\'ll start',
    'i\'ll start digging',
    'i\'ll analyze what you',
  ];
  const replyLower = reply.reply.toLowerCase();
  readyForExtraction = confirmPhrases.some(phrase => replyLower.includes(phrase));
}
```

**Location:** Lines 194-210

---

### 2. **Backend Extractor Exiting Without Phase Transition** (`/lib/server/backend-extractor.ts`)

**Problem:** When a project had no documents uploaded (only GitHub connected), the backend extractor would exit early without updating `currentPhase` to `extraction_review`.

```typescript
if (knowledgeSnapshot.empty) {
  console.log(`No documents to extract`);
  return;  // ← Exits WITHOUT updating phase!
}
```

**Fix:** When no documents exist, create a minimal extraction handoff and still transition the phase:

```typescript
if (knowledgeSnapshot.empty) {
  console.log(`No documents to extract for project ${projectId} - creating empty handoff`);
  
  // Create a minimal extraction handoff even with no documents
  const emptyHandoff: PhaseHandoff = {
    phase: 'extraction',
    readyForNextPhase: false,
    confidence: 0,
    confirmed: {
      problems: [],
      targetUsers: [],
      features: [],
      constraints: [],
      opportunities: [],
    },
    uncertain: {},
    missing: ['No documents uploaded - need product requirements, specs, or notes'],
    questionsForUser: [
      'You haven\'t uploaded any documents yet. Do you have any product specs, requirements, or notes to share?',
    ],
    sourceEvidence: [],
    version: 'extraction_v1',
    timestamp: new Date().toISOString(),
  };
  
  await adminDb.collection('projects').doc(projectId).update({
    'phaseData.phaseHandoffs.extraction': emptyHandoff,
    currentPhase: 'extraction_review',
    phaseStatus: 'in_progress',
    'phaseData.extractionCompletedAt': new Date().toISOString(),
    updatedAt: new Date().toISOString(),
  });
  
  return;
}
```

**Location:** Lines 58-93

---

### 3. **Mode Resolver Not Detecting `extraction_review` Phase** (`/lib/server/chat-mode-resolver.ts`)

**Problem #1:** The mode resolver was checking for `currentPhase === 'analyzed'` but projects were being set to `currentPhase: 'extraction_review'`, causing a mismatch.

**Fix #1:** Added both phase values to the check:

```typescript
if (
  projectData.currentPhase === 'extraction_review' || 
  projectData.currentPhase === 'analyzed' || 
  (hasExtractions && !phaseData.canonicalProductModel)
) {
  return 'extraction_review_mode';
}
```

**Problem #2:** The mode resolver was querying **subcollections** (`projects/{id}/knowledge_items`) instead of the **top-level collections** (`knowledge_items` filtered by `projectId`).

**Fix #2:** Updated all collection queries to use top-level collections with `where` clauses:

```typescript
// Before (WRONG):
.collection('projects')
.doc(projectId)
.collection('knowledge_items')

// After (CORRECT):
.collection('knowledge_items')
.where('projectId', '==', projectId)
```

**Problem #3:** The mode resolver logic checked `!hasKnowledge` BEFORE checking `currentPhase`, causing projects with GitHub but no documents to always return `collector_mode`.

**Fix #3:** Reordered the logic to prioritize explicit phase transitions:

```typescript
// Apply resolution logic
// PRIORITY: Check explicit phase transitions FIRST (overrides knowledge checks)
if (projectData.currentPhase === 'extraction_review' || projectData.currentPhase === 'analyzed') {
  return 'extraction_review_mode';
}

if (!hasKnowledge) {
  return 'collector_mode';
}

// ... rest of logic
```

**Locations:** Lines 39-74, 107-112, 147-150

---

## Test Results

### Before Fixes

```json
{
  "mode": "collector_mode",          // ❌ Wrong mode
  "projectPhase": "extraction_review", // ✓ Phase transitioned
  "reply": "Perfect! Let me analyze..." // ❌ Hallucinating
}
```

### After Fixes

```json
{
  "mode": "extraction_review_mode",   // ✓ Correct mode
  "projectPhase": "extraction_review", // ✓ Phase transitioned
  "reply": "Thanks for your patience. I've finished the initial analysis... What is the core problem you're trying to solve?" // ✓ Asking clarifying questions
}
```

---

## Files Modified

1. `/app/api/ai/chat/route.ts` - Added fallback handoff detection
2. `/lib/server/backend-extractor.ts` - Handle empty documents gracefully
3. `/lib/server/chat-mode-resolver.ts` - Fixed collection queries and logic ordering

---

## Next Steps

1. ✅ Test with a project that has documents uploaded
2. ✅ Test with a project that only has GitHub (no documents)
3. ✅ Test with a new project (no materials at all)
4. Verify the checklist UI updates correctly
5. Verify extraction handoff data is stored correctly in Firestore

---

## Date

November 17, 2025