Theia rip-out (parent):
- Remove theia submodule entry (the local fork, Gitea repo, Coolify app,
Cloud Run services, and Artifact Registry image are all gone)
- Drop README.md + INFRASTRUCTURE.md (obsolete "Project OS" snapshots
that also leaked API tokens) and setup.sh (Theia clone bootstrap)
- Delete UI-DESIGN-GUIDE.md, BACKEND_AGENTS_PLAN.md, VIBN_BUILD_PLAN.md,
VISUAL_EDITOR_PLAN.md, core-packages.md, ai-packages.md, tools-list.md
(all 100% Theia-specific or superseded)
- Surgical scrubs of remaining Theia mentions in
AGENT_EXECUTION_ARCHITECTURE.md and TURBOREPO_MIGRATION_PLAN.md
Submodule bumps:
- vibn-agent-runner: Theia rip-out + MCP refactor (api/wrapper/server
pattern across shell/file/git/memory/prd/search/agent/gitea/coolify)
- vibn-frontend: Theia rip-out + P5.1 attach E2E + Justine UI WIP
Retire platform/ scaffold:
- Remove platform/backend/ (control-plane, executors, mcp-adapter),
platform/client-ide/ (gcp-productos extension), platform/contracts/,
platform/infra/terraform/, platform/scripts/templates/turborepo/
(replaced by vibn-agent-runner + vibn-frontend + Coolify direct)
- Drop architecture.md, technical_spec.md, vision-ext.md,
"1.Generate Control Plane API scaffold.md" (same era)
Docs / planning snapshots (new):
- AI_CAPABILITIES.md, AI_CAPABILITIES_ROADMAP.md
- AGENT_TELEMETRY_STREAMING_PROJECT.md
- VIBN_PRD.md, product-idea-a.md
Design assets (new):
- branding/{coolify,gitea,ux-testing}/ static brand collateral
- justine/ HTML mockups for the new onboarding/build flows
- preview-assist-ui/ Vite scratch app
- master-ai.code-workspace
Infra helpers (new):
- setup-coolify-montreal.sh provisioner
- gitea-docker-compose.yml
- vibn-coolify-schema.sql for the Coolify Postgres extensions
- prd-agent-prompt.pdf, prompt, root.txt, remixed-9edec9e9.tsx scratch
- flatten.sh helper
.gitignore: ignore **/node_modules, **/.next, **/.turbo, **/coverage
Made-with: Cursor
7.7 KiB
VibnAI Plan Summary — “Shopify Template Model” + Your Infra + Model Routing + Pricing
Below is the consolidated plan we’ve converged on: VibnAI as a template-first product builder (Shopify-style), with your own hosted infra, and usage-based AI credits powered by Vertex marketplace models with smart routing.
- Product Strategy: VibnAI Is Shopify for Building Software Core positioning
VibnAI is not “blank page AI coding.”
VibnAI is:
Build production-ready apps from elite starter templates then customize via guided AI workflows.
This reduces:
token burn
failure loops
architectural ambiguity
debugging chaos
And increases:
predictability
success rate
margins
retention
Template-first rule
No project starts from an empty repo by default.
Users must choose:
a starter template, or
“Advanced: Custom Build” (explicitly warned as costlier)
- Platform Architecture: Your Infra + Event-Driven AI High-level architecture decisions
You host the infrastructure layer yourself (Hot + Cold tiers). AI compute is purchased via credits.
Hot tier (shared, always running)
API Gateway (auth, WebSockets, rate limits)
Orchestrator service (task routing + state machine)
Job queue + worker pool
Postgres (conversations, tasks, state)
Redis (optional: queue/pubsub)
Gitea (code/content source-of-truth)
Coolify (deploys, logs, runtime orchestration)
Key rule: The hot tier is always on, but it should be cheap to run because it is mostly event-driven and does not constantly call expensive models.
Cold tier (per-user, on-demand)
Agent workspace containers
Hibernate / wake-on-access
Persistent storage volumes
“Master Orchestrator” behavior change (critical cost control)
Even if it’s “always running,” it should behave like:
event-driven
stateless compute
minimal model calls
structured memory, not replaying chat history
Structured memory > conversation replay Instead of resending entire conversation history, persist and inject:
project summary
architecture summary
repo map summary
deploy state
open tasks
known bugs
This is a major cost reducer.
- AI Model Strategy: 3-Tier Routing (Cost-Efficient Orchestration)
You’re building your own agents, but the principle applies: choose models per tool/task.
Tier A / Tier B / Tier C (the blend)
We landed on this operational blend:
40% Tier A (cheap)
45% Tier B (mid / workhorse coder)
15% Tier C (premium escalation)
This is not arbitrary—it aligns with tool/task reality:
most actions are parsing, routing, search, summarizing (cheap)
most code edits and implementations are workhorse coding (mid)
only a small fraction require deep reasoning / high-stakes decisions (premium)
Tier purpose Tier A — Cheap “Utility / Router”
Use for:
routing decisions
summarizing logs, errors, context
file discovery + search interpretation
command suggestion drafts
task context updates
chat summaries / naming
monitoring analysis
This tier should handle the majority of orchestration.
Tier B — Workhorse Coding Model
Use for:
generating diffs
writing/refactoring code
tests
standard bug fixes
“agent mode” loops when tasks are scoped
iterating on features inside templates
This tier should handle most coding.
Tier C — Premium Escalation Model
Use only when:
architecture decisions
high-risk changes (deploy, infra, migrations)
cross-service debugging
persistent failures (2 failed iterations)
very large diffs / multi-file refactors
security-sensitive changes
This tier should be rare by design.
- Vertex Models: What to Use in Each Tier
You wanted to stay on Google infra and Vertex marketplace/API models.
Recommended mapping (Vertex-first) Tier A (cheap)
Gemini Flash-class model (fast, low cost) Use for orchestration, summaries, extraction, routing, log parsing.
Tier B (mid / coding workhorse)
Pick one:
GLM-5 MaaS (Vertex) — strong reasoning + cost-effective
Qwen coder MaaS (Vertex) — strong coding, predictable cost
This model does the heavy lifting for code edits and feature building.
Tier C (premium escalation)
Pick one:
Claude Sonnet 4.6 on Vertex (reliability + long-chain coding)
or Gemini 3.1 Pro Preview (if it proves better for your workflows)
This is your “expert brain” used sparingly.
- Routing Policy: How the System Chooses Models
You’re not letting users pick models manually. The orchestrator routes based on task complexity and risk.
Default rules
All “read/search/list/summarize” → Tier A
Most code edits/refactors/tests → Tier B
High-risk or repeated failure → Tier C
Escalation triggers (simple + effective)
Escalate Tier B → Tier C when any of these happen:
2 failed iterations (tests still failing, same error persists)
Touching >5 files
Diff size exceeds ~400 LOC changed
Deployment / infra / secrets / migration steps involved
Context pressure (approaching model limits)
De-escalation rule
Once the hard part is resolved (cause found / plan decided), drop back to Tier B for implementation.
- Business Model: Subscription + Credits (Not “Unlimited AI”)
You clarified the intended split:
Subscription covers your fixed costs
Subscription pays for:
your hosted infrastructure (hot tier + shared services)
Agent workspace orchestration (cold tier)
your people costs (support, ops, ongoing development)
product value (templates, UX, dashboards, workflows)
baseline included usage / small AI overhead
Credits cover variable compute
Credits pay for:
model calls (Tier A/B/C)
heavy tasks (builds, refactors, debugging loops)
long chain tasks
autonomous agent execution
This protects you from heavy users and keeps margins predictable.
- Template Access as a Tiered Product (Shopify-style) Templates are the moat
Templates reduce:
architecture planning cost
retry loops
token burn
complexity and failure rates
Templates also create:
differentiation
a marketplace opportunity later
compounding margins
Tiering via template access
Instead of just “more AI,” higher tiers unlock better starter systems.
Example approach:
Starter tier
landing page template
simple SaaS CRUD template
basic auth + Stripe
limited integrations
Builder tier
multi-tenant SaaS template
marketplace template
analytics dashboard template
stronger RBAC patterns
more integrations
Pro tier
“OpsOS / analytics warehouse” template
monitoring + alerting template
ML-ready pipeline template
advanced data model scaffolds
Enterprise
custom templates
compliance add-ons
private deployments
dedicated support / SLAs
- Credit Pricing: Fixed Markup per Model
You said you want:
credits based on user actions, with fixed markup on every model
This implies:
Each model has an internal “true cost”
You charge credits at a consistent markup multiplier
Premium models may have a higher markup (optional), but you can keep it fixed if you prefer simplicity
How it should feel to the user
“This action will cost ~X credits”
“Set a spending cap per day/project”
“Require approval if a task is estimated > Y credits”
This prevents runaway spending and builds trust.
- Key Risk Controls We Agreed Are Necessary
To make this sellable and safe:
Token and autonomy guardrails
max tokens per step
max retries per task
auto-summarize context aggressively
store structured memory, not chat replay
only send diffs / minimal file slices
caching where possible (especially for repeated prefixes)
UX controls
show credit burn in real time
warn/approve for high-cost tasks
allow user-set budgets
explain why escalation happened (briefly)
- The End State
VibnAI becomes:
A template-first “product builder OS”
powered by multi-model orchestration
hosted on your infra
with predictable economics via subscription + credits
and a defensible moat via templates + routing intelligence