VibnAI Plan Summary — “Shopify Template Model” + Your Infra + Model Routing + Pricing Below is the consolidated plan we’ve converged on: VibnAI as a template-first product builder (Shopify-style), with your own hosted infra, and usage-based AI credits powered by Vertex marketplace models with smart routing. 1) Product Strategy: VibnAI Is Shopify for Building Software Core positioning VibnAI is not “blank page AI coding.” VibnAI is: Build production-ready apps from elite starter templates then customize via guided AI workflows. This reduces: token burn failure loops architectural ambiguity debugging chaos And increases: predictability success rate margins retention Template-first rule No project starts from an empty repo by default. Users must choose: a starter template, or “Advanced: Custom Build” (explicitly warned as costlier) 2) Platform Architecture: Your Infra + Event-Driven AI High-level architecture decisions You host the infrastructure layer yourself (Hot + Cold tiers). AI compute is purchased via credits. Hot tier (shared, always running) API Gateway (auth, WebSockets, rate limits) Orchestrator service (task routing + state machine) Job queue + worker pool Postgres (conversations, tasks, state) Redis (optional: queue/pubsub) Gitea (code/content source-of-truth) Coolify (deploys, logs, runtime orchestration) Key rule: The hot tier is always on, but it should be cheap to run because it is mostly event-driven and does not constantly call expensive models. Cold tier (per-user, on-demand) Agent workspace containers Hibernate / wake-on-access Persistent storage volumes “Master Orchestrator” behavior change (critical cost control) Even if it’s “always running,” it should behave like: event-driven stateless compute minimal model calls structured memory, not replaying chat history Structured memory > conversation replay Instead of resending entire conversation history, persist and inject: project summary architecture summary repo map summary deploy state open tasks known bugs This is a major cost reducer. 3) AI Model Strategy: 3-Tier Routing (Cost-Efficient Orchestration) You’re building your own agents, but the principle applies: choose models per tool/task. Tier A / Tier B / Tier C (the blend) We landed on this operational blend: 40% Tier A (cheap) 45% Tier B (mid / workhorse coder) 15% Tier C (premium escalation) This is not arbitrary—it aligns with tool/task reality: most actions are parsing, routing, search, summarizing (cheap) most code edits and implementations are workhorse coding (mid) only a small fraction require deep reasoning / high-stakes decisions (premium) Tier purpose Tier A — Cheap “Utility / Router” Use for: routing decisions summarizing logs, errors, context file discovery + search interpretation command suggestion drafts task context updates chat summaries / naming monitoring analysis This tier should handle the majority of orchestration. Tier B — Workhorse Coding Model Use for: generating diffs writing/refactoring code tests standard bug fixes “agent mode” loops when tasks are scoped iterating on features inside templates This tier should handle most coding. Tier C — Premium Escalation Model Use only when: architecture decisions high-risk changes (deploy, infra, migrations) cross-service debugging persistent failures (2 failed iterations) very large diffs / multi-file refactors security-sensitive changes This tier should be rare by design. 4) Vertex Models: What to Use in Each Tier You wanted to stay on Google infra and Vertex marketplace/API models. Recommended mapping (Vertex-first) Tier A (cheap) Gemini Flash-class model (fast, low cost) Use for orchestration, summaries, extraction, routing, log parsing. Tier B (mid / coding workhorse) Pick one: GLM-5 MaaS (Vertex) — strong reasoning + cost-effective Qwen coder MaaS (Vertex) — strong coding, predictable cost This model does the heavy lifting for code edits and feature building. Tier C (premium escalation) Pick one: Claude Sonnet 4.6 on Vertex (reliability + long-chain coding) or Gemini 3.1 Pro Preview (if it proves better for your workflows) This is your “expert brain” used sparingly. 5) Routing Policy: How the System Chooses Models You’re not letting users pick models manually. The orchestrator routes based on task complexity and risk. Default rules All “read/search/list/summarize” → Tier A Most code edits/refactors/tests → Tier B High-risk or repeated failure → Tier C Escalation triggers (simple + effective) Escalate Tier B → Tier C when any of these happen: 2 failed iterations (tests still failing, same error persists) Touching >5 files Diff size exceeds ~400 LOC changed Deployment / infra / secrets / migration steps involved Context pressure (approaching model limits) De-escalation rule Once the hard part is resolved (cause found / plan decided), drop back to Tier B for implementation. 6) Business Model: Subscription + Credits (Not “Unlimited AI”) You clarified the intended split: Subscription covers your fixed costs Subscription pays for: your hosted infrastructure (hot tier + shared services) Agent workspace orchestration (cold tier) your people costs (support, ops, ongoing development) product value (templates, UX, dashboards, workflows) baseline included usage / small AI overhead Credits cover variable compute Credits pay for: model calls (Tier A/B/C) heavy tasks (builds, refactors, debugging loops) long chain tasks autonomous agent execution This protects you from heavy users and keeps margins predictable. 7) Template Access as a Tiered Product (Shopify-style) Templates are the moat Templates reduce: architecture planning cost retry loops token burn complexity and failure rates Templates also create: differentiation a marketplace opportunity later compounding margins Tiering via template access Instead of just “more AI,” higher tiers unlock better starter systems. Example approach: Starter tier landing page template simple SaaS CRUD template basic auth + Stripe limited integrations Builder tier multi-tenant SaaS template marketplace template analytics dashboard template stronger RBAC patterns more integrations Pro tier “OpsOS / analytics warehouse” template monitoring + alerting template ML-ready pipeline template advanced data model scaffolds Enterprise custom templates compliance add-ons private deployments dedicated support / SLAs 8) Credit Pricing: Fixed Markup per Model You said you want: credits based on user actions, with fixed markup on every model This implies: Each model has an internal “true cost” You charge credits at a consistent markup multiplier Premium models may have a higher markup (optional), but you can keep it fixed if you prefer simplicity How it should feel to the user “This action will cost ~X credits” “Set a spending cap per day/project” “Require approval if a task is estimated > Y credits” This prevents runaway spending and builds trust. 9) Key Risk Controls We Agreed Are Necessary To make this sellable and safe: Token and autonomy guardrails max tokens per step max retries per task auto-summarize context aggressively store structured memory, not chat replay only send diffs / minimal file slices caching where possible (especially for repeated prefixes) UX controls show credit burn in real time warn/approve for high-cost tasks allow user-set budgets explain why escalation happened (briefly) 10) The End State VibnAI becomes: A template-first “product builder OS” powered by multi-model orchestration hosted on your infra with predictable economics via subscription + credits and a defensible moat via templates + routing intelligence