409 lines
7.7 KiB
Markdown
409 lines
7.7 KiB
Markdown
VibnAI Plan Summary — “Shopify Template Model” + Your Infra + Model Routing + Pricing
|
||
|
||
Below is the consolidated plan we’ve converged on: VibnAI as a template-first product builder (Shopify-style), with your own hosted infra, and usage-based AI credits powered by Vertex marketplace models with smart routing.
|
||
|
||
1) Product Strategy: VibnAI Is Shopify for Building Software
|
||
Core positioning
|
||
|
||
VibnAI is not “blank page AI coding.”
|
||
|
||
VibnAI is:
|
||
|
||
Build production-ready apps from elite starter templates
|
||
then customize via guided AI workflows.
|
||
|
||
This reduces:
|
||
|
||
token burn
|
||
|
||
failure loops
|
||
|
||
architectural ambiguity
|
||
|
||
debugging chaos
|
||
|
||
And increases:
|
||
|
||
predictability
|
||
|
||
success rate
|
||
|
||
margins
|
||
|
||
retention
|
||
|
||
Template-first rule
|
||
|
||
No project starts from an empty repo by default.
|
||
|
||
Users must choose:
|
||
|
||
a starter template, or
|
||
|
||
“Advanced: Custom Build” (explicitly warned as costlier)
|
||
|
||
2) Platform Architecture: Your Infra + Event-Driven AI
|
||
High-level architecture decisions
|
||
|
||
You host the infrastructure layer yourself (Hot + Cold tiers). AI compute is purchased via credits.
|
||
|
||
Hot tier (shared, always running)
|
||
|
||
API Gateway (auth, WebSockets, rate limits)
|
||
|
||
Orchestrator service (task routing + state machine)
|
||
|
||
Job queue + worker pool
|
||
|
||
Postgres (conversations, tasks, state)
|
||
|
||
Redis (optional: queue/pubsub)
|
||
|
||
Gitea (code/content source-of-truth)
|
||
|
||
Coolify (deploys, logs, runtime orchestration)
|
||
|
||
Key rule: The hot tier is always on, but it should be cheap to run because it is mostly event-driven and does not constantly call expensive models.
|
||
|
||
Cold tier (per-user, on-demand)
|
||
|
||
Agent workspace containers
|
||
|
||
Hibernate / wake-on-access
|
||
|
||
Persistent storage volumes
|
||
|
||
“Master Orchestrator” behavior change (critical cost control)
|
||
|
||
Even if it’s “always running,” it should behave like:
|
||
|
||
event-driven
|
||
|
||
stateless compute
|
||
|
||
minimal model calls
|
||
|
||
structured memory, not replaying chat history
|
||
|
||
Structured memory > conversation replay
|
||
Instead of resending entire conversation history, persist and inject:
|
||
|
||
project summary
|
||
|
||
architecture summary
|
||
|
||
repo map summary
|
||
|
||
deploy state
|
||
|
||
open tasks
|
||
|
||
known bugs
|
||
|
||
This is a major cost reducer.
|
||
|
||
3) AI Model Strategy: 3-Tier Routing (Cost-Efficient Orchestration)
|
||
|
||
You’re building your own agents, but the principle applies: choose models per tool/task.
|
||
|
||
Tier A / Tier B / Tier C (the blend)
|
||
|
||
We landed on this operational blend:
|
||
|
||
40% Tier A (cheap)
|
||
|
||
45% Tier B (mid / workhorse coder)
|
||
|
||
15% Tier C (premium escalation)
|
||
|
||
This is not arbitrary—it aligns with tool/task reality:
|
||
|
||
most actions are parsing, routing, search, summarizing (cheap)
|
||
|
||
most code edits and implementations are workhorse coding (mid)
|
||
|
||
only a small fraction require deep reasoning / high-stakes decisions (premium)
|
||
|
||
Tier purpose
|
||
Tier A — Cheap “Utility / Router”
|
||
|
||
Use for:
|
||
|
||
routing decisions
|
||
|
||
summarizing logs, errors, context
|
||
|
||
file discovery + search interpretation
|
||
|
||
command suggestion drafts
|
||
|
||
task context updates
|
||
|
||
chat summaries / naming
|
||
|
||
monitoring analysis
|
||
|
||
This tier should handle the majority of orchestration.
|
||
|
||
Tier B — Workhorse Coding Model
|
||
|
||
Use for:
|
||
|
||
generating diffs
|
||
|
||
writing/refactoring code
|
||
|
||
tests
|
||
|
||
standard bug fixes
|
||
|
||
“agent mode” loops when tasks are scoped
|
||
|
||
iterating on features inside templates
|
||
|
||
This tier should handle most coding.
|
||
|
||
Tier C — Premium Escalation Model
|
||
|
||
Use only when:
|
||
|
||
architecture decisions
|
||
|
||
high-risk changes (deploy, infra, migrations)
|
||
|
||
cross-service debugging
|
||
|
||
persistent failures (2 failed iterations)
|
||
|
||
very large diffs / multi-file refactors
|
||
|
||
security-sensitive changes
|
||
|
||
This tier should be rare by design.
|
||
|
||
4) Vertex Models: What to Use in Each Tier
|
||
|
||
You wanted to stay on Google infra and Vertex marketplace/API models.
|
||
|
||
Recommended mapping (Vertex-first)
|
||
Tier A (cheap)
|
||
|
||
Gemini Flash-class model (fast, low cost)
|
||
Use for orchestration, summaries, extraction, routing, log parsing.
|
||
|
||
Tier B (mid / coding workhorse)
|
||
|
||
Pick one:
|
||
|
||
GLM-5 MaaS (Vertex) — strong reasoning + cost-effective
|
||
|
||
Qwen coder MaaS (Vertex) — strong coding, predictable cost
|
||
|
||
This model does the heavy lifting for code edits and feature building.
|
||
|
||
Tier C (premium escalation)
|
||
|
||
Pick one:
|
||
|
||
Claude Sonnet 4.6 on Vertex (reliability + long-chain coding)
|
||
|
||
or Gemini 3.1 Pro Preview (if it proves better for your workflows)
|
||
|
||
This is your “expert brain” used sparingly.
|
||
|
||
5) Routing Policy: How the System Chooses Models
|
||
|
||
You’re not letting users pick models manually. The orchestrator routes based on task complexity and risk.
|
||
|
||
Default rules
|
||
|
||
All “read/search/list/summarize” → Tier A
|
||
|
||
Most code edits/refactors/tests → Tier B
|
||
|
||
High-risk or repeated failure → Tier C
|
||
|
||
Escalation triggers (simple + effective)
|
||
|
||
Escalate Tier B → Tier C when any of these happen:
|
||
|
||
2 failed iterations (tests still failing, same error persists)
|
||
|
||
Touching >5 files
|
||
|
||
Diff size exceeds ~400 LOC changed
|
||
|
||
Deployment / infra / secrets / migration steps involved
|
||
|
||
Context pressure (approaching model limits)
|
||
|
||
De-escalation rule
|
||
|
||
Once the hard part is resolved (cause found / plan decided), drop back to Tier B for implementation.
|
||
|
||
6) Business Model: Subscription + Credits (Not “Unlimited AI”)
|
||
|
||
You clarified the intended split:
|
||
|
||
Subscription covers your fixed costs
|
||
|
||
Subscription pays for:
|
||
|
||
your hosted infrastructure (hot tier + shared services)
|
||
|
||
Agent workspace orchestration (cold tier)
|
||
|
||
your people costs (support, ops, ongoing development)
|
||
|
||
product value (templates, UX, dashboards, workflows)
|
||
|
||
baseline included usage / small AI overhead
|
||
|
||
Credits cover variable compute
|
||
|
||
Credits pay for:
|
||
|
||
model calls (Tier A/B/C)
|
||
|
||
heavy tasks (builds, refactors, debugging loops)
|
||
|
||
long chain tasks
|
||
|
||
autonomous agent execution
|
||
|
||
This protects you from heavy users and keeps margins predictable.
|
||
|
||
7) Template Access as a Tiered Product (Shopify-style)
|
||
Templates are the moat
|
||
|
||
Templates reduce:
|
||
|
||
architecture planning cost
|
||
|
||
retry loops
|
||
|
||
token burn
|
||
|
||
complexity and failure rates
|
||
|
||
Templates also create:
|
||
|
||
differentiation
|
||
|
||
a marketplace opportunity later
|
||
|
||
compounding margins
|
||
|
||
Tiering via template access
|
||
|
||
Instead of just “more AI,” higher tiers unlock better starter systems.
|
||
|
||
Example approach:
|
||
|
||
Starter tier
|
||
|
||
landing page template
|
||
|
||
simple SaaS CRUD template
|
||
|
||
basic auth + Stripe
|
||
|
||
limited integrations
|
||
|
||
Builder tier
|
||
|
||
multi-tenant SaaS template
|
||
|
||
marketplace template
|
||
|
||
analytics dashboard template
|
||
|
||
stronger RBAC patterns
|
||
|
||
more integrations
|
||
|
||
Pro tier
|
||
|
||
“OpsOS / analytics warehouse” template
|
||
|
||
monitoring + alerting template
|
||
|
||
ML-ready pipeline template
|
||
|
||
advanced data model scaffolds
|
||
|
||
Enterprise
|
||
|
||
custom templates
|
||
|
||
compliance add-ons
|
||
|
||
private deployments
|
||
|
||
dedicated support / SLAs
|
||
|
||
8) Credit Pricing: Fixed Markup per Model
|
||
|
||
You said you want:
|
||
|
||
credits based on user actions, with fixed markup on every model
|
||
|
||
This implies:
|
||
|
||
Each model has an internal “true cost”
|
||
|
||
You charge credits at a consistent markup multiplier
|
||
|
||
Premium models may have a higher markup (optional), but you can keep it fixed if you prefer simplicity
|
||
|
||
How it should feel to the user
|
||
|
||
“This action will cost ~X credits”
|
||
|
||
“Set a spending cap per day/project”
|
||
|
||
“Require approval if a task is estimated > Y credits”
|
||
|
||
This prevents runaway spending and builds trust.
|
||
|
||
9) Key Risk Controls We Agreed Are Necessary
|
||
|
||
To make this sellable and safe:
|
||
|
||
Token and autonomy guardrails
|
||
|
||
max tokens per step
|
||
|
||
max retries per task
|
||
|
||
auto-summarize context aggressively
|
||
|
||
store structured memory, not chat replay
|
||
|
||
only send diffs / minimal file slices
|
||
|
||
caching where possible (especially for repeated prefixes)
|
||
|
||
UX controls
|
||
|
||
show credit burn in real time
|
||
|
||
warn/approve for high-cost tasks
|
||
|
||
allow user-set budgets
|
||
|
||
explain why escalation happened (briefly)
|
||
|
||
10) The End State
|
||
|
||
VibnAI becomes:
|
||
|
||
A template-first “product builder OS”
|
||
|
||
powered by multi-model orchestration
|
||
|
||
hosted on your infra
|
||
|
||
with predictable economics via subscription + credits
|
||
|
||
and a defensible moat via templates + routing intelligence |