vibn-frontend/docs/GATHERING_AGENT.md

# Gathering Agent - Implementation Summary

## Overview

The **Gathering Agent** is the first phase in Vibn's multi-agent system. It's responsible for systematically going through each context item (documents, GitHub repos, coding sessions) one at a time, extracting key insights, confirming them with the user, and storing them for later phases.

## Key Principle

**"SHOW, DON'T GUESS"**

Instead of the AI making assumptions about the project, it:
1. Lists what context is available
2. Goes through each item ONE AT A TIME
3. Shows what it found (with specific quotes/references)
4. Gets user confirmation
5. Stores confirmed insights
6. Moves to next item only after confirmation

## Files Created/Modified

### New Files

1. **`/prompts/GATHERING_AGENT.md`**
   - Complete system prompt for the Gathering Agent
   - Defines agent identity, goals, and process
   - Includes communication style guide
   - Provides edge case handling
   - Shows example flow

2. **`/lib/types/phases.ts`** (already created)
   - TypeScript definitions for project phases
   - Phase types: `gathering`, `vision`, `scope`, `blueprint`, `execution`
   - Status types: `not_started`, `in_progress`, `completed`, `skipped`, `failed`

3. **`/app/api/projects/phase/route.ts`** (already created)
   - API endpoint to update project phase and status
   - Handles phase transitions and history tracking

4. **`/docs/PHASE_SYSTEM.md`** (already created)
   - Documentation for the phase tracking system

### Modified Files

1. **`/app/api/ai/chat/route.ts`**
   - Added `GATHERING_AGENT_PROMPT` import
   - Implemented phase-based agent selection logic
   - Reads `currentPhase` from project document
   - Selects appropriate agent prompt based on phase
   - Auto-marks phase as `in_progress` when first message is sent
   - Includes phase data in context payload

2. **`/app/[workspace]/project/[projectId]/v_ai_chat/page.tsx`**
   - Changed initial message from `[VISION_AGENT_AUTO_START]` to `"Hi! I'm here to help."`
   - Filters out auto-start messages from conversation history
   - Updated header from "Vision Agent" to "AI Assistant"
   - Updated description to "Building your project step-by-step"

## How It Works

### 1. Project Initialization

When a project is created (or migrated):
```typescript
{
  currentPhase: 'gathering',
  phaseStatus: 'not_started',
  phaseData: {},
  phaseHistory: []
}
```

### 2. User Opens Chat

1. Frontend loads conversation history from Firestore
2. If no history exists, sends initial greeting: `"Hi! I'm here to help."`
3. Backend receives message and checks `project.currentPhase`
4. Since `currentPhase === 'gathering'`, it loads `GATHERING_AGENT_PROMPT`
5. If `phaseStatus === 'not_started'`, marks it as `in_progress`

### 3. Agent Starts Process

The Gathering Agent follows this flow:

**Step 1: Initial Greeting**
```
Hi! I'm here to help you gather everything about your project.

I can see you've connected:
- GitHub repo: [repo name or "Not connected"]
- [X] documents
- [Y] coding sessions

Let me go through each item with you to extract the key insights. Ready?
```

**Step 2: Go Through Items One-by-One**
```
📄 [Item Name] ([size])

From this, I found:

✓ [Specific insight with quote or reference]
✓ [Specific insight with quote or reference]
✓ [Specific insight with quote or reference]

Did I capture this correctly? Anything I missed or got wrong?
```

**Wait for user confirmation before proceeding to next item.**

**Step 3: Store Insights**

After user confirms, the agent calls:
```typescript
POST /api/projects/phase
{
  projectId: "xxx",
  action: "add_insight",
  data: {
    id: "insight_[timestamp]",
    source: "Document Name",
    sourceType: "document",
    sourceId: "doc_id",
    insight: "Specific finding from document",
    extractedAt: timestamp,
    confirmed: true,
    confirmedAt: timestamp,
    usedInVision: false,
    category: "feature" | "user" | "problem" | "competitor" | "tech" | "progress"
  }
}
```

**Step 4: Final Summary & Handoff**
```
Perfect! I've analyzed everything you've connected:

From [X] documents:
- [Key theme 1 from multiple docs]
- [Key theme 2 from multiple docs]
- [Key theme 3 from multiple docs]

From GitHub:
- [Progress summary]

From Sessions:
- [Activity summary]

Total insights captured: [count]

Do you have anything else to add before I hand this off
to the Vision Agent to build your Product Vision Board?

[Add more / No, proceed to Vision]
```

**Step 5: Transition to Vision Phase**

When user approves handoff:
```typescript
POST /api/projects/phase
{
  projectId: "xxx",
  newPhase: "vision",
  newStatus: "in_progress",
  phaseSpecificData: {
    gatheredInsights: [...all confirmed insights],
    gatheredAt: timestamp
  }
}
```

## Context Data Structure

The agent receives this JSON payload:

```json
{
  "project": {
    "id": "string",
    "name": "string",
    "githubRepo": "string | null",
    "workspacePath": "string | null",
    "chatgptUrl": "string | null"
  },
  "phase": {
    "current": "gathering",
    "status": "in_progress",
    "data": {}
  },
  "contextSources": [
    {
      "id": "string",
      "name": "string",
      "type": "chat" | "document" | "file",
      "summary": "string",
      "contentPreview": "string (first 500 chars)",
      "contentLength": number,
      "connectedAt": "timestamp"
    }
  ],
  "sessions": [
    {
      "id": "string",
      "workspaceName": "string",
      "createdAt": "timestamp",
      "linkedToProject": boolean
    }
  ]
}
```

## Communication Style

### ✅ Good Examples

```
✅ "In your 'User Stories' doc, you listed 15 EMR features..."
✅ "Your 'Canadian EMR' doc mentions TELUS Health, Accuro, OSCAR..."
✅ "I found these features in your SmartClinix doc. Sound right?"
```

### ❌ Bad Examples

```
❌ "I think you're building an EMR system..."
❌ "There are several competitors in this space..."
❌ "ANALYSIS COMPLETE. FEATURES EXTRACTED."
```

## Critical Rules

1. **ONE ITEM AT A TIME** - Never jump ahead
2. **SHOW WHAT YOU FOUND** - Always cite specific content
3. **GET CONFIRMATION** - Never proceed without user approval
4. **NEVER GUESS** - Only extract what's explicitly stated
5. **STORE SILENTLY** - Don't tell user about data storage
6. **CITE SOURCES** - Always reference which doc/file/session
7. **NO INTERPRETATION** - Just extract facts, not conclusions

## Edge Cases

### No Context Available
```
I don't see any context sources yet. To help me understand your
project, could you:

1. Add documents - Click 'Context' in sidebar to add:
   - ChatGPT conversations
   - Product docs
   - User research

2. Connect GitHub - If you have code

Once you've added materials, I'll go through each one with you!
```

### User Says "Skip This"
```
Sure, moving on.

📄 [Next Item]...
```

### User Says "That's Outdated"
```
Got it - I'll note this as outdated. What's the current status?
```

## Testing

To test the Gathering Agent:

1. Reset your chat (click "Reset Chat" button)
2. Refresh the page at: `http://localhost:3000/[workspace]/project/[projectId]/v_ai_chat`
3. Send first message: "Ready"
4. Agent should start the gathering process

## Next Steps

After gathering is complete and user approves:
1. Update project phase to `vision`
2. Load `VISION_AGENT_PROMPT` (to be created/updated)
3. Vision Agent uses gathered insights to fill out Product Vision Board
4. Process repeats for each subsequent phase

## Benefits of This Approach

1. **No More Guessing** - AI only uses confirmed information
2. **User Control** - User approves every insight before it's stored
3. **Transparency** - User sees exactly what AI found and where
4. **Accuracy** - No hallucinations or assumptions
5. **Progressive** - Builds foundation for later phases
6. **Resumable** - Phase system prevents starting over on reload