Files
vibn-frontend/COLLECTOR_EXTRACTOR_REFACTOR.md

9.2 KiB

Collector & Extractor Refactor - Complete

Overview

Refactored the Collector and Extraction Review phases to implement a proactive, collaborative workflow that guides users through setup and only chunks content they confirm is important.


Changes Made

1. Collector Phase (v2 Prompt)

Location: lib/ai/prompts/collector.ts

New Behavior:

  • Proactive Welcome - Greets new users with clear 3-step setup guide
  • 3-Step Checklist Tracking:
    1. Upload documents 📄
    2. Connect GitHub repo 🔗
    3. Install browser extension 🔌
  • Smart GitHub Analysis - Automatically analyzes connected repos and presents findings
  • Conversational Handoff - Asks "Is that everything?" when materials are detected
  • Automatic Transition - Moves to extraction_review_mode when user confirms

Key Changes:

  • Removed "Click Analyze Context button" instruction
  • Added explicit checklist tracking based on knowledgeSummary.bySourceType
  • Added welcome message with step-by-step guidance
  • Emphasized ONE question at a time (not overwhelming)

2. Extraction Review Phase (v2 Prompt)

Location: lib/ai/prompts/extraction-review.ts

New Behavior:

  • Collaborative Review - Presents each potential insight and asks "Is this important?"
  • Smart Chunking - Only chunks content the user confirms is V1-critical
  • Semantic Boundaries - Chunks by meaning (feature, persona, constraint), not character count
  • Tight Responses - Guides a review process, not essays

Workflow:

  1. Read & Identify - Find potential insights in documents/code
  2. Collaborative Review - Show user the text, ask "Should I save this?"
  3. Chunk & Store - Extract and store confirmed insights in AlloyDB
  4. Build Product Model - Synthesize confirmed insights into canonicalProductModel

Key Changes:

  • Removed automatic extraction behavior
  • Added explicit "Is this important?" questioning pattern
  • Emphasized showing ACTUAL TEXT from user's docs
  • Added chunking strategy guidance (semantic, not arbitrary)

3. UI Changes

Location: app/[workspace]/project/[projectId]/v_ai_chat/page.tsx

Changes:

  • Removed "Analyze Context" button
  • Removed isBatchExtracting state
  • Removed handleBatchExtract function
  • Removed Sparkles icon import
  • Kept "Reset Chat" button

Rationale:

  • Transition to extraction happens conversationally ("Is that everything?" → "yes" → auto-transition)
  • No manual button click needed
  • Cleaner, less cluttered UI

4. Auto-Chunking Disabled

Location: app/api/projects/[projectId]/knowledge/upload-document/route.ts

Changes:

  • Commented out writeKnowledgeChunksForItem fire-and-forget call
  • Added comment: // NOTE: Auto-chunking disabled - Extractor AI will collaboratively chunk important sections

Rationale:

  • Documents are stored whole in Firestore as knowledge_items
  • Extractor AI reads them later and chunks only user-confirmed insights
  • Prevents bloat in AlloyDB with irrelevant chunks

5. PhaseHandoff Type Updates

Location: lib/types/phase-handoff.ts

Changes:

  • Added 'collector' to PhaseType union
  • Created CollectorPhaseHandoff interface with checklist fields:
    confirmed: {
      hasDocuments?: boolean;
      documentCount?: number;
      githubConnected?: boolean;
      githubRepo?: string;
      extensionLinked?: boolean;
    }
    uncertain: {
      extensionDeclined?: boolean;
      noGithubYet?: boolean;
    }
    missing: string[];
    
  • Added CollectorPhaseHandoff to AnyPhaseHandoff union

Location: lib/types/project-artifacts.ts

Changes:

  • Updated phaseHandoffs to include 'collector' key

How It Works Now

User Journey:

  1. Welcome (Collector)

    • AI greets user: "Welcome to Vibn! Here's how this works: Step 1: Upload docs, Step 2: Connect GitHub, Step 3: Install extension"
    • User uploads documents via Context tab → AI confirms: " I see you've uploaded 2 document(s)"
    • User connects GitHub → AI analyzes and presents: " I can see your repo - it's built with Next.js, has 247 files..."
    • User installs extension → AI confirms: " I see your browser extension is connected"
  2. Handoff Question (Collector)

    • AI asks: "Is that everything you want me to work with for now? If so, I'll start digging into the details."
    • User says: "yes" / "yep" / "go ahead"
  3. Automatic Transition

    • AI responds: "Perfect! Let me analyze what you've shared. This might take a moment..."
    • System automatically transitions to extraction_review_mode
  4. Collaborative Extraction (Extractor)

    • AI says: "I'm reading through everything you've shared. Let me walk through what I found..."
    • AI presents each insight: "I found this section about [topic]: [quote]. Is this important for your V1 product? Should I save it?"
    • User says: "yes" → AI chunks and stores: " Saved! I'll remember this for later phases."
    • User says: "no" → AI skips: "Got it, moving on..."
  5. Product Model Built

    • After reviewing all docs, AI asks: "I've identified 12 key requirements. Does that sound right?"
    • AI synthesizes canonicalProductModel and transitions to Vision phase

Extension Project Linking

Current Status:

  • Extension uses workspacePath header to identify project context
  • Extension sends chats to Vibn proxy with x-workspace-path header
  • Vibn API uses extractProjectName(workspacePath) to link chats to projects
  • Limitation: Extension doesn't explicitly link to a Vibn project ID yet

Detection in Collector:

  • Checks knowledgeSummary.bySourceType for 'extension' or contextSources with type='extension'
  • If found: " I see your browser extension is connected"
  • If not: "Have you installed the Vibn browser extension yet?"

Future Enhancement:

  • Add explicit project ID linking in extension settings
  • Allow users to select which Vibn project their workspace maps to

Files Changed

  1. lib/ai/prompts/collector.ts - New v2 prompt (proactive, 3-step checklist)
  2. lib/ai/prompts/extraction-review.ts - New v2 prompt (collaborative chunking)
  3. app/[workspace]/project/[projectId]/v_ai_chat/page.tsx - Removed "Analyze Context" button
  4. app/api/projects/[projectId]/knowledge/upload-document/route.ts - Disabled auto-chunking
  5. lib/types/phase-handoff.ts - Added CollectorPhaseHandoff type
  6. lib/types/project-artifacts.ts - Updated phaseHandoffs to include 'collector'

Testing Checklist

Collector Phase:

  • New project shows welcome message with 3-step guide
  • Uploading doc triggers " I see you've uploaded X document(s)"
  • Connecting GitHub triggers repo analysis summary
  • AI asks "Is that everything?" when materials exist
  • User saying "yes" transitions to extraction_review_mode

Extraction Phase:

  • AI presents insights one at a time
  • AI shows actual text from user's docs
  • User saying "yes" to insight triggers " Saved!"
  • User saying "no" to insight triggers skip
  • After review, AI asks "I've identified X requirements. Does that sound right?"
  • Confirmed insights are chunked and stored in AlloyDB

Upload Flow:

  • Uploading document does NOT trigger auto-chunking
  • Document is stored whole in Firestore
  • Document appears in Context UI
  • Extractor can read full document content later

Next Steps

  1. Implement Extraction Chunking API

    • Create endpoint for AI to chunk and store confirmed insights
    • /api/projects/[projectId]/knowledge/chunk-insight
    • Takes knowledgeItemId, content, metadata (importance, tags, etc.)
  2. Add CollectorPhaseHandoff Storage

    • Update /api/ai/chat to detect checklist status
    • Store CollectorPhaseHandoff in phaseData.phaseHandoffs.collector
    • Use for analytics and debugging
  3. Extension Project Linking

    • Add Vibn project ID to extension settings
    • Update extension to send x-vibn-project-id header
    • Update proxy to use explicit project ID instead of workspace path extraction
  4. Mode Transition Logic

    • Update resolveChatMode to check for "is that everything?" confirmation
    • Add LLM structured output field: readyForNextPhase: boolean
    • Auto-transition when readyForNextPhase === true

Architecture Alignment

This refactor aligns with the "Why We Overhauled Vibn's Architecture" document:

Clear, specialized phases - Collector and Extractor now have distinct, focused jobs Smart Handoff Protocol - CollectorPhaseHandoff with checklist fields Long-term semantic memory - Only user-confirmed insights are chunked to AlloyDB Structured outputs - Checklist and handoff data is machine-readable Better monitoring - Handoff contracts can be logged for debugging


Summary

The Collector and Extractor are now proactive, collaborative, and smart. Users are guided through setup, and only the content they confirm as important is chunked and stored for retrieval. This prevents bloat, increases relevance, and ensures the AI never works with irrelevant data.

Status: Complete and deployed (v2 prompts active)