VIBN Frontend for Coolify deployment

2026-02-15 19:25:52 -08:00
commit 40bf8428cd
398 changed files with 76513 additions and 0 deletions
--- a/UPLOAD_CHUNKING_REMOVED.md
+++ b/UPLOAD_CHUNKING_REMOVED.md
@@ -0,0 +1,213 @@
+# Document Upload - Chunking Removed ✅
+
+## Issue Found
+Despite the Collector/Extractor refactor, document uploads were still auto-chunking files into semantic pieces.
+
+## What Was Happening (Before)
+```typescript
+// upload-document/route.ts
+const chunks = chunkDocument(content, {
+  maxChunkSize: 2000,
+  chunkOverlap: 200,
+});
+
+for (const chunk of chunks) {
+  await createKnowledgeItem({
+    title: `${file.name} (chunk ${i}/${total})`,
+    content: chunk.content,
+  });
+}
+```
+
+**Result:**
+- 1 file upload → 5-10 separate knowledge_items
+- Each chunk stored as separate record
+- Auto-chunking contradicted Extractor AI's collaborative approach
+
+## What Happens Now (After)
+```typescript
+// upload-document/route.ts
+const knowledgeItem = await createKnowledgeItem({
+  title: file.name,
+  content: content, // Whole document
+  sourceMeta: {
+    tags: ['document', 'uploaded', 'pending_extraction'],
+  },
+});
+```
+
+**Result:**
+- 1 file upload → 1 knowledge_item
+- Whole document stored intact
+- Tagged as `pending_extraction`
+- Extractor AI will review and collaboratively chunk
+
+---
+
+## Files Changed
+
+### 1. `app/api/projects/[projectId]/knowledge/upload-document/route.ts`
+
+**Removed:**
+- `chunkDocument()` import and calls
+- Loop creating multiple knowledge_items
+- Chunk metadata tracking
+
+**Added:**
+- Single knowledge_item creation with full content
+- `pending_extraction` tag
+- Status tracking in contextSources
+
+**Before:**
+```typescript
+const chunks = chunkDocument(content, {...});
+for (const chunk of chunks) {
+  const knowledgeItem = await createKnowledgeItem({
+    title: `${file.name} (chunk ${i}/${total})`,
+    content: chunk.content,
+  });
+}
+```
+
+**After:**
+```typescript
+const knowledgeItem = await createKnowledgeItem({
+  title: file.name,
+  content: content, // Whole document
+  sourceMeta: {
+    tags: ['pending_extraction'],
+  },
+});
+```
+
+### 2. `app/[workspace]/project/[projectId]/context/page.tsx`
+
+**Changed UI text:**
+- **Before:** "Documents will be automatically chunked and processed for AI context."
+- **After:** "Documents will be stored for the Extractor AI to review and process."
+
+---
+
+## User Experience Changes
+
+### Upload Flow (Now):
+1. User uploads `project-spec.md`
+2. File saved to Firebase Storage
+3. **Whole document** stored as 1 knowledge_item
+4. Appears in Context page as "project-spec.md"
+5. Tagged `pending_extraction`
+
+### Extraction Flow (Later):
+1. User says "Is that everything?" → AI transitions
+2. Extractor AI mode activates
+3. AI reads whole documents
+4. AI asks: "I see this section about user roles - is this important for V1?"
+5. User confirms: "Yes, that's critical"
+6. AI calls `/api/projects/{id}/knowledge/chunk-insight`
+7. Creates targeted chunk as `extracted_insight`
+8. Chunks stored in AlloyDB for retrieval
+
+---
+
+## Why This Matters
+
+### Before (Auto-chunking):
+- ❌ System guessed what's important
+- ❌ Over-chunked irrelevant sections
+- ❌ Polluted vector database with noise
+- ❌ User had no control
+
+### After (Collaborative):
+- ✅ Extractor AI asks before chunking
+- ✅ Only important sections chunked
+- ✅ User confirms what matters for V1
+- ✅ Clean, relevant vector database
+
+---
+
+## API Response Changes
+
+### Before:
+```json
+{
+  "success": true,
+  "chunkCount": 8,
+  "knowledgeItemIds": ["id1", "id2", "id3", ...]
+}
+```
+
+### After:
+```json
+{
+  "success": true,
+  "knowledgeItemId": "single_id",
+  "status": "stored",
+  "message": "Document stored. Extractor AI will review and chunk important sections."
+}
+```
+
+---
+
+## Database Structure
+
+### Firestore - knowledge_items:
+```json
+{
+  "id": "abc123",
+  "projectId": "proj456",
+  "sourceType": "imported_document",
+  "title": "project-spec.md",
+  "content": "< FULL DOCUMENT CONTENT >",
+  "sourceMeta": {
+    "filename": "project-spec.md",
+    "tags": ["document", "uploaded", "pending_extraction"],
+    "url": "https://storage.googleapis.com/..."
+  }
+}
+```
+
+### Firestore - contextSources:
+```json
+{
+  "type": "document",
+  "name": "project-spec.md",
+  "summary": "Document (5423 characters) - pending extraction",
+  "metadata": {
+    "knowledgeItemId": "abc123",
+    "status": "pending_extraction"
+  }
+}
+```
+
+---
+
+## Testing Checklist
+
+- [x] Remove chunking logic from upload endpoint
+- [x] Update UI text to reflect new behavior
+- [x] Verify whole document is stored
+- [x] Confirm `pending_extraction` tag is set
+- [ ] Test document upload with 3 files
+- [ ] Verify Collector checklist updates
+- [ ] Test Extractor AI reads full documents
+- [ ] Test `/chunk-insight` API creates extracted chunks
+
+---
+
+## Related Documentation
+
+- `TABLE_STAKES_IMPLEMENTATION.md` - Full feature implementation
+- `COLLECTOR_EXTRACTOR_REFACTOR.md` - Refactor rationale
+- `QA_FIXES_APPLIED.md` - QA testing results
+
+---
+
+## Status
+
+✅ **Auto-chunking removed**
+✅ **UI text updated**
+✅ **Server restarted**
+🔄 **Ready for testing**
+
+The upload flow now correctly stores whole documents and defers chunking to the collaborative Extractor AI phase.
+