9.2 KiB
Collector & Extractor Refactor - Complete
Overview
Refactored the Collector and Extraction Review phases to implement a proactive, collaborative workflow that guides users through setup and only chunks content they confirm is important.
Changes Made
1. Collector Phase (v2 Prompt)
Location: lib/ai/prompts/collector.ts
New Behavior:
- ✅ Proactive Welcome - Greets new users with clear 3-step setup guide
- ✅ 3-Step Checklist Tracking:
- Upload documents 📄
- Connect GitHub repo 🔗
- Install browser extension 🔌
- ✅ Smart GitHub Analysis - Automatically analyzes connected repos and presents findings
- ✅ Conversational Handoff - Asks "Is that everything?" when materials are detected
- ✅ Automatic Transition - Moves to extraction_review_mode when user confirms
Key Changes:
- Removed "Click Analyze Context button" instruction
- Added explicit checklist tracking based on
knowledgeSummary.bySourceType - Added welcome message with step-by-step guidance
- Emphasized ONE question at a time (not overwhelming)
2. Extraction Review Phase (v2 Prompt)
Location: lib/ai/prompts/extraction-review.ts
New Behavior:
- ✅ Collaborative Review - Presents each potential insight and asks "Is this important?"
- ✅ Smart Chunking - Only chunks content the user confirms is V1-critical
- ✅ Semantic Boundaries - Chunks by meaning (feature, persona, constraint), not character count
- ✅ Tight Responses - Guides a review process, not essays
Workflow:
- Read & Identify - Find potential insights in documents/code
- Collaborative Review - Show user the text, ask "Should I save this?"
- Chunk & Store - Extract and store confirmed insights in AlloyDB
- Build Product Model - Synthesize confirmed insights into
canonicalProductModel
Key Changes:
- Removed automatic extraction behavior
- Added explicit "Is this important?" questioning pattern
- Emphasized showing ACTUAL TEXT from user's docs
- Added chunking strategy guidance (semantic, not arbitrary)
3. UI Changes
Location: app/[workspace]/project/[projectId]/v_ai_chat/page.tsx
Changes:
- ❌ Removed "Analyze Context" button
- ❌ Removed
isBatchExtractingstate - ❌ Removed
handleBatchExtractfunction - ❌ Removed
Sparklesicon import - ✅ Kept "Reset Chat" button
Rationale:
- Transition to extraction happens conversationally ("Is that everything?" → "yes" → auto-transition)
- No manual button click needed
- Cleaner, less cluttered UI
4. Auto-Chunking Disabled
Location: app/api/projects/[projectId]/knowledge/upload-document/route.ts
Changes:
- ✅ Commented out
writeKnowledgeChunksForItemfire-and-forget call - ✅ Added comment:
// NOTE: Auto-chunking disabled - Extractor AI will collaboratively chunk important sections
Rationale:
- Documents are stored whole in Firestore as
knowledge_items - Extractor AI reads them later and chunks only user-confirmed insights
- Prevents bloat in AlloyDB with irrelevant chunks
5. PhaseHandoff Type Updates
Location: lib/types/phase-handoff.ts
Changes:
- ✅ Added
'collector'toPhaseTypeunion - ✅ Created
CollectorPhaseHandoffinterface with checklist fields:confirmed: { hasDocuments?: boolean; documentCount?: number; githubConnected?: boolean; githubRepo?: string; extensionLinked?: boolean; } uncertain: { extensionDeclined?: boolean; noGithubYet?: boolean; } missing: string[]; - ✅ Added
CollectorPhaseHandofftoAnyPhaseHandoffunion
Location: lib/types/project-artifacts.ts
Changes:
- ✅ Updated
phaseHandoffsto include'collector'key
How It Works Now
User Journey:
-
Welcome (Collector)
- AI greets user: "Welcome to Vibn! Here's how this works: Step 1: Upload docs, Step 2: Connect GitHub, Step 3: Install extension"
- User uploads documents via Context tab → AI confirms: "✅ I see you've uploaded 2 document(s)"
- User connects GitHub → AI analyzes and presents: "✅ I can see your repo - it's built with Next.js, has 247 files..."
- User installs extension → AI confirms: "✅ I see your browser extension is connected"
-
Handoff Question (Collector)
- AI asks: "Is that everything you want me to work with for now? If so, I'll start digging into the details."
- User says: "yes" / "yep" / "go ahead"
-
Automatic Transition
- AI responds: "Perfect! Let me analyze what you've shared. This might take a moment..."
- System automatically transitions to
extraction_review_mode
-
Collaborative Extraction (Extractor)
- AI says: "I'm reading through everything you've shared. Let me walk through what I found..."
- AI presents each insight: "I found this section about [topic]: [quote]. Is this important for your V1 product? Should I save it?"
- User says: "yes" → AI chunks and stores: "✅ Saved! I'll remember this for later phases."
- User says: "no" → AI skips: "Got it, moving on..."
-
Product Model Built
- After reviewing all docs, AI asks: "I've identified 12 key requirements. Does that sound right?"
- AI synthesizes
canonicalProductModeland transitions to Vision phase
Extension Project Linking
Current Status:
- Extension uses
workspacePathheader to identify project context - Extension sends chats to Vibn proxy with
x-workspace-pathheader - Vibn API uses
extractProjectName(workspacePath)to link chats to projects - Limitation: Extension doesn't explicitly link to a Vibn project ID yet
Detection in Collector:
- Checks
knowledgeSummary.bySourceTypefor'extension'orcontextSourceswithtype='extension' - If found: "✅ I see your browser extension is connected"
- If not: "Have you installed the Vibn browser extension yet?"
Future Enhancement:
- Add explicit project ID linking in extension settings
- Allow users to select which Vibn project their workspace maps to
Files Changed
lib/ai/prompts/collector.ts- New v2 prompt (proactive, 3-step checklist)lib/ai/prompts/extraction-review.ts- New v2 prompt (collaborative chunking)app/[workspace]/project/[projectId]/v_ai_chat/page.tsx- Removed "Analyze Context" buttonapp/api/projects/[projectId]/knowledge/upload-document/route.ts- Disabled auto-chunkinglib/types/phase-handoff.ts- AddedCollectorPhaseHandofftypelib/types/project-artifacts.ts- UpdatedphaseHandoffsto include'collector'
Testing Checklist
Collector Phase:
- New project shows welcome message with 3-step guide
- Uploading doc triggers "✅ I see you've uploaded X document(s)"
- Connecting GitHub triggers repo analysis summary
- AI asks "Is that everything?" when materials exist
- User saying "yes" transitions to extraction_review_mode
Extraction Phase:
- AI presents insights one at a time
- AI shows actual text from user's docs
- User saying "yes" to insight triggers "✅ Saved!"
- User saying "no" to insight triggers skip
- After review, AI asks "I've identified X requirements. Does that sound right?"
- Confirmed insights are chunked and stored in AlloyDB
Upload Flow:
- Uploading document does NOT trigger auto-chunking
- Document is stored whole in Firestore
- Document appears in Context UI
- Extractor can read full document content later
Next Steps
-
Implement Extraction Chunking API
- Create endpoint for AI to chunk and store confirmed insights
/api/projects/[projectId]/knowledge/chunk-insight- Takes
knowledgeItemId,content,metadata(importance, tags, etc.)
-
Add CollectorPhaseHandoff Storage
- Update
/api/ai/chatto detect checklist status - Store
CollectorPhaseHandoffinphaseData.phaseHandoffs.collector - Use for analytics and debugging
- Update
-
Extension Project Linking
- Add Vibn project ID to extension settings
- Update extension to send
x-vibn-project-idheader - Update proxy to use explicit project ID instead of workspace path extraction
-
Mode Transition Logic
- Update
resolveChatModeto check for "is that everything?" confirmation - Add LLM structured output field:
readyForNextPhase: boolean - Auto-transition when
readyForNextPhase === true
- Update
Architecture Alignment
This refactor aligns with the "Why We Overhauled Vibn's Architecture" document:
✅ Clear, specialized phases - Collector and Extractor now have distinct, focused jobs
✅ Smart Handoff Protocol - CollectorPhaseHandoff with checklist fields
✅ Long-term semantic memory - Only user-confirmed insights are chunked to AlloyDB
✅ Structured outputs - Checklist and handoff data is machine-readable
✅ Better monitoring - Handoff contracts can be logged for debugging
Summary
The Collector and Extractor are now proactive, collaborative, and smart. Users are guided through setup, and only the content they confirm as important is chunked and stored for retrieval. This prevents bloat, increases relevance, and ensures the AI never works with irrelevant data.
Status: ✅ Complete and deployed (v2 prompts active)