VIBN Frontend for Coolify deployment

This commit is contained in:
2026-02-15 19:25:52 -08:00
commit 40bf8428cd
398 changed files with 76513 additions and 0 deletions

213
UPLOAD_CHUNKING_REMOVED.md Normal file
View File

@@ -0,0 +1,213 @@
# Document Upload - Chunking Removed ✅
## Issue Found
Despite the Collector/Extractor refactor, document uploads were still auto-chunking files into semantic pieces.
## What Was Happening (Before)
```typescript
// upload-document/route.ts
const chunks = chunkDocument(content, {
maxChunkSize: 2000,
chunkOverlap: 200,
});
for (const chunk of chunks) {
await createKnowledgeItem({
title: `${file.name} (chunk ${i}/${total})`,
content: chunk.content,
});
}
```
**Result:**
- 1 file upload → 5-10 separate knowledge_items
- Each chunk stored as separate record
- Auto-chunking contradicted Extractor AI's collaborative approach
## What Happens Now (After)
```typescript
// upload-document/route.ts
const knowledgeItem = await createKnowledgeItem({
title: file.name,
content: content, // Whole document
sourceMeta: {
tags: ['document', 'uploaded', 'pending_extraction'],
},
});
```
**Result:**
- 1 file upload → 1 knowledge_item
- Whole document stored intact
- Tagged as `pending_extraction`
- Extractor AI will review and collaboratively chunk
---
## Files Changed
### 1. `app/api/projects/[projectId]/knowledge/upload-document/route.ts`
**Removed:**
- `chunkDocument()` import and calls
- Loop creating multiple knowledge_items
- Chunk metadata tracking
**Added:**
- Single knowledge_item creation with full content
- `pending_extraction` tag
- Status tracking in contextSources
**Before:**
```typescript
const chunks = chunkDocument(content, {...});
for (const chunk of chunks) {
const knowledgeItem = await createKnowledgeItem({
title: `${file.name} (chunk ${i}/${total})`,
content: chunk.content,
});
}
```
**After:**
```typescript
const knowledgeItem = await createKnowledgeItem({
title: file.name,
content: content, // Whole document
sourceMeta: {
tags: ['pending_extraction'],
},
});
```
### 2. `app/[workspace]/project/[projectId]/context/page.tsx`
**Changed UI text:**
- **Before:** "Documents will be automatically chunked and processed for AI context."
- **After:** "Documents will be stored for the Extractor AI to review and process."
---
## User Experience Changes
### Upload Flow (Now):
1. User uploads `project-spec.md`
2. File saved to Firebase Storage
3. **Whole document** stored as 1 knowledge_item
4. Appears in Context page as "project-spec.md"
5. Tagged `pending_extraction`
### Extraction Flow (Later):
1. User says "Is that everything?" → AI transitions
2. Extractor AI mode activates
3. AI reads whole documents
4. AI asks: "I see this section about user roles - is this important for V1?"
5. User confirms: "Yes, that's critical"
6. AI calls `/api/projects/{id}/knowledge/chunk-insight`
7. Creates targeted chunk as `extracted_insight`
8. Chunks stored in AlloyDB for retrieval
---
## Why This Matters
### Before (Auto-chunking):
- ❌ System guessed what's important
- ❌ Over-chunked irrelevant sections
- ❌ Polluted vector database with noise
- ❌ User had no control
### After (Collaborative):
- ✅ Extractor AI asks before chunking
- ✅ Only important sections chunked
- ✅ User confirms what matters for V1
- ✅ Clean, relevant vector database
---
## API Response Changes
### Before:
```json
{
"success": true,
"chunkCount": 8,
"knowledgeItemIds": ["id1", "id2", "id3", ...]
}
```
### After:
```json
{
"success": true,
"knowledgeItemId": "single_id",
"status": "stored",
"message": "Document stored. Extractor AI will review and chunk important sections."
}
```
---
## Database Structure
### Firestore - knowledge_items:
```json
{
"id": "abc123",
"projectId": "proj456",
"sourceType": "imported_document",
"title": "project-spec.md",
"content": "< FULL DOCUMENT CONTENT >",
"sourceMeta": {
"filename": "project-spec.md",
"tags": ["document", "uploaded", "pending_extraction"],
"url": "https://storage.googleapis.com/..."
}
}
```
### Firestore - contextSources:
```json
{
"type": "document",
"name": "project-spec.md",
"summary": "Document (5423 characters) - pending extraction",
"metadata": {
"knowledgeItemId": "abc123",
"status": "pending_extraction"
}
}
```
---
## Testing Checklist
- [x] Remove chunking logic from upload endpoint
- [x] Update UI text to reflect new behavior
- [x] Verify whole document is stored
- [x] Confirm `pending_extraction` tag is set
- [ ] Test document upload with 3 files
- [ ] Verify Collector checklist updates
- [ ] Test Extractor AI reads full documents
- [ ] Test `/chunk-insight` API creates extracted chunks
---
## Related Documentation
- `TABLE_STAKES_IMPLEMENTATION.md` - Full feature implementation
- `COLLECTOR_EXTRACTOR_REFACTOR.md` - Refactor rationale
- `QA_FIXES_APPLIED.md` - QA testing results
---
## Status
**Auto-chunking removed**
**UI text updated**
**Server restarted**
🔄 **Ready for testing**
The upload flow now correctly stores whole documents and defers chunking to the collaborative Extractor AI phase.