mark/vibn-frontend

Fork 0

Files

Mark Henderson 40bf8428cd VIBN Frontend for Coolify deployment

2026-02-15 19:25:52 -08:00

4.9 KiB

Raw Blame History

Document Upload - Chunking Removed ✅

Issue Found

Despite the Collector/Extractor refactor, document uploads were still auto-chunking files into semantic pieces.

What Was Happening (Before)

// upload-document/route.ts
const chunks = chunkDocument(content, {
  maxChunkSize: 2000,
  chunkOverlap: 200,
});

for (const chunk of chunks) {
  await createKnowledgeItem({
    title: `${file.name} (chunk ${i}/${total})`,
    content: chunk.content,
  });
}

Result:

1 file upload → 5-10 separate knowledge_items
Each chunk stored as separate record
Auto-chunking contradicted Extractor AI's collaborative approach

What Happens Now (After)

// upload-document/route.ts
const knowledgeItem = await createKnowledgeItem({
  title: file.name,
  content: content, // Whole document
  sourceMeta: {
    tags: ['document', 'uploaded', 'pending_extraction'],
  },
});

Result:

1 file upload → 1 knowledge_item
Whole document stored intact
Tagged as pending_extraction
Extractor AI will review and collaboratively chunk

Files Changed

1. `app/api/projects/[projectId]/knowledge/upload-document/route.ts`

Removed:

chunkDocument() import and calls
Loop creating multiple knowledge_items
Chunk metadata tracking

Added:

Single knowledge_item creation with full content
pending_extraction tag
Status tracking in contextSources

Before:

const chunks = chunkDocument(content, {...});
for (const chunk of chunks) {
  const knowledgeItem = await createKnowledgeItem({
    title: `${file.name} (chunk ${i}/${total})`,
    content: chunk.content,
  });
}

After:

const knowledgeItem = await createKnowledgeItem({
  title: file.name,
  content: content, // Whole document
  sourceMeta: {
    tags: ['pending_extraction'],
  },
});

2. `app/[workspace]/project/[projectId]/context/page.tsx`

Changed UI text:

Before: "Documents will be automatically chunked and processed for AI context."
After: "Documents will be stored for the Extractor AI to review and process."

User Experience Changes

Upload Flow (Now):

User uploads project-spec.md
File saved to Firebase Storage
Whole document stored as 1 knowledge_item
Appears in Context page as "project-spec.md"
Tagged pending_extraction

Extraction Flow (Later):

User says "Is that everything?" → AI transitions
Extractor AI mode activates
AI reads whole documents
AI asks: "I see this section about user roles - is this important for V1?"
User confirms: "Yes, that's critical"
AI calls /api/projects/{id}/knowledge/chunk-insight
Creates targeted chunk as extracted_insight
Chunks stored in AlloyDB for retrieval

Why This Matters

Before (Auto-chunking):

❌ System guessed what's important
❌ Over-chunked irrelevant sections
❌ Polluted vector database with noise
❌ User had no control

After (Collaborative):

✅ Extractor AI asks before chunking
✅ Only important sections chunked
✅ User confirms what matters for V1
✅ Clean, relevant vector database

API Response Changes

Before:

{
  "success": true,
  "chunkCount": 8,
  "knowledgeItemIds": ["id1", "id2", "id3", ...]
}

After:

{
  "success": true,
  "knowledgeItemId": "single_id",
  "status": "stored",
  "message": "Document stored. Extractor AI will review and chunk important sections."
}

Database Structure

Firestore - knowledge_items:

{
  "id": "abc123",
  "projectId": "proj456",
  "sourceType": "imported_document",
  "title": "project-spec.md",
  "content": "< FULL DOCUMENT CONTENT >",
  "sourceMeta": {
    "filename": "project-spec.md",
    "tags": ["document", "uploaded", "pending_extraction"],
    "url": "https://storage.googleapis.com/..."
  }
}

Firestore - contextSources:

{
  "type": "document",
  "name": "project-spec.md",
  "summary": "Document (5423 characters) - pending extraction",
  "metadata": {
    "knowledgeItemId": "abc123",
    "status": "pending_extraction"
  }
}

Testing Checklist

Remove chunking logic from upload endpoint
Update UI text to reflect new behavior
Verify whole document is stored
Confirm pending_extraction tag is set
Test document upload with 3 files
Verify Collector checklist updates
Test Extractor AI reads full documents
Test /chunk-insight API creates extracted chunks

TABLE_STAKES_IMPLEMENTATION.md - Full feature implementation
COLLECTOR_EXTRACTOR_REFACTOR.md - Refactor rationale
QA_FIXES_APPLIED.md - QA testing results

Status

✅ Auto-chunking removed ✅ UI text updated ✅ Server restarted 🔄 Ready for testing

The upload flow now correctly stores whole documents and defers chunking to the collaborative Extractor AI phase.

4.9 KiB Raw Blame History