5.0 KiB
5.0 KiB
Vertex AI Migration for Gemini 3 Pro
Summary
Migrated from Google AI SDK (@google/generative-ai) to Vertex AI SDK (@google-cloud/vertexai) to access Gemini 3 Pro Preview.
Changes Made
1. Package Installation
npm install @google-cloud/vertexai
2. Environment Variables Added
Added to .env.local:
VERTEX_AI_PROJECT_ID=gen-lang-client-0980079410
VERTEX_AI_LOCATION=us-central1
VERTEX_AI_MODEL=gemini-3-pro-preview
Existing credential (already configured):
GOOGLE_APPLICATION_CREDENTIALS=/Users/markhenderson/vibn-alloydb-key-v2.json
3. Code Changes
lib/ai/gemini-client.ts - Complete Rewrite
- Before: Used
GoogleGenerativeAIfrom@google/generative-ai - After: Uses
VertexAIfrom@google-cloud/vertexai
Key changes:
- Imports:
VertexAIinstead ofGoogleGenerativeAI - Constructor: No API key needed (uses
GOOGLE_APPLICATION_CREDENTIALS) - Model:
gemini-3-pro-preview(wasgemini-2.5-pro) - Temperature: Default
1.0(was0.2) per Gemini 3 docs - Response parsing: Updated for Vertex AI response structure
lib/ai/embeddings.ts - No Changes
- Still uses
@google/generative-aifortext-embedding-004 - Embeddings don't require Vertex AI migration
- Works fine with Google AI SDK
Gemini 3 Pro Features
According to Vertex AI Documentation:
Capabilities:
- ✅ 1M token context window (64k output)
- ✅ Thinking mode - Internal reasoning control
- ✅ Function calling
- ✅ Structured output (JSON)
- ✅ System instructions
- ✅ Google Search grounding
- ✅ Code execution
- ✅ Context caching
- ✅ Knowledge cutoff: January 2025
Recommendations:
- 🔥 Temperature: Keep at
1.0(default) - Gemini 3's reasoning is optimized for this - ⚠️ Changing temperature (especially < 1.0) may cause looping or degraded performance
- 📝 Prompting: Be concise and direct - Gemini 3 prefers clear instructions over verbose prompt engineering
Required Permissions
The service account vibn-alloydb@gen-lang-client-0980079410.iam.gserviceaccount.com needs:
IAM Roles:
- ✅
roles/aiplatform.user- Access Vertex AI models - ✅
roles/serviceusage.serviceUsageConsumer- Use Vertex AI API
Check permissions:
gcloud projects get-iam-policy gen-lang-client-0980079410 \
--flatten="bindings[].members" \
--filter="bindings.members:vibn-alloydb@gen-lang-client-0980079410.iam.gserviceaccount.com"
Add permissions (if missing):
gcloud projects add-iam-policy-binding gen-lang-client-0980079410 \
--member="serviceAccount:vibn-alloydb@gen-lang-client-0980079410.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
gcloud projects add-iam-policy-binding gen-lang-client-0980079410 \
--member="serviceAccount:vibn-alloydb@gen-lang-client-0980079410.iam.gserviceaccount.com" \
--role="roles/serviceusage.serviceUsageConsumer"
Testing
Test in Vibn:
- Go to http://localhost:3000
- Send a message in the AI chat
- Check terminal/browser console for errors
Expected Success:
- AI responds normally
- Terminal logs:
[AI Chat] Mode: collector_mode(or other mode) - No "Model not found" or "403 Forbidden" errors
Expected Errors (if no access):
Model gemini-3-pro-preview not found403 Forbidden: Permission deniedUser does not have access to model
Rollback Plan
If Gemini 3 Pro doesn't work:
Option 1: Use Gemini 2.5 Pro on Vertex AI
Change in .env.local:
VERTEX_AI_MODEL=gemini-2.5-pro
Option 2: Revert to Google AI SDK
- Uninstall:
npm uninstall @google-cloud/vertexai - Reinstall:
npm install @google/generative-ai - Revert
lib/ai/gemini-client.tsto useGoogleGenerativeAI - Use
GEMINI_API_KEYenvironment variable
Migration Benefits
✅ Access to latest models - Gemini 3 Pro and future releases ✅ Better reasoning - Gemini 3's thinking mode for complex tasks ✅ Unified GCP platform - Same auth as AlloyDB, Firestore, etc. ✅ Enterprise features - Context caching, batch prediction, provisioned throughput ✅ Better observability - Logs and metrics in Cloud Console
Next Steps
- Verify service account has Vertex AI permissions (see "Required Permissions" above)
- Test the chat - Send a message and check for errors
- Monitor performance - Compare Gemini 3 vs 2.5 quality
- Adjust temperature if needed - Test with default 1.0 first
- Explore thinking mode - If beneficial for complex tasks