This repository has been archived on 2026-06-07. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
master-ai/product-idea-a.md
mawkone 99deb546c8 Rip out Theia, bump submodules, retire platform/ scaffold, snapshot docs + design assets
Theia rip-out (parent):
- Remove theia submodule entry (the local fork, Gitea repo, Coolify app,
  Cloud Run services, and Artifact Registry image are all gone)
- Drop README.md + INFRASTRUCTURE.md (obsolete "Project OS" snapshots
  that also leaked API tokens) and setup.sh (Theia clone bootstrap)
- Delete UI-DESIGN-GUIDE.md, BACKEND_AGENTS_PLAN.md, VIBN_BUILD_PLAN.md,
  VISUAL_EDITOR_PLAN.md, core-packages.md, ai-packages.md, tools-list.md
  (all 100% Theia-specific or superseded)
- Surgical scrubs of remaining Theia mentions in
  AGENT_EXECUTION_ARCHITECTURE.md and TURBOREPO_MIGRATION_PLAN.md

Submodule bumps:
- vibn-agent-runner: Theia rip-out + MCP refactor (api/wrapper/server
  pattern across shell/file/git/memory/prd/search/agent/gitea/coolify)
- vibn-frontend: Theia rip-out + P5.1 attach E2E + Justine UI WIP

Retire platform/ scaffold:
- Remove platform/backend/ (control-plane, executors, mcp-adapter),
  platform/client-ide/ (gcp-productos extension), platform/contracts/,
  platform/infra/terraform/, platform/scripts/templates/turborepo/
  (replaced by vibn-agent-runner + vibn-frontend + Coolify direct)
- Drop architecture.md, technical_spec.md, vision-ext.md,
  "1.Generate Control Plane API scaffold.md" (same era)

Docs / planning snapshots (new):
- AI_CAPABILITIES.md, AI_CAPABILITIES_ROADMAP.md
- AGENT_TELEMETRY_STREAMING_PROJECT.md
- VIBN_PRD.md, product-idea-a.md

Design assets (new):
- branding/{coolify,gitea,ux-testing}/ static brand collateral
- justine/ HTML mockups for the new onboarding/build flows
- preview-assist-ui/ Vite scratch app
- master-ai.code-workspace

Infra helpers (new):
- setup-coolify-montreal.sh provisioner
- gitea-docker-compose.yml
- vibn-coolify-schema.sql for the Coolify Postgres extensions
- prd-agent-prompt.pdf, prompt, root.txt, remixed-9edec9e9.tsx scratch
- flatten.sh helper

.gitignore: ignore **/node_modules, **/.next, **/.turbo, **/coverage

Made-with: Cursor
2026-04-22 18:06:37 -07:00

409 lines
7.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
VibnAI Plan Summary — “Shopify Template Model” + Your Infra + Model Routing + Pricing
Below is the consolidated plan weve converged on: VibnAI as a template-first product builder (Shopify-style), with your own hosted infra, and usage-based AI credits powered by Vertex marketplace models with smart routing.
1) Product Strategy: VibnAI Is Shopify for Building Software
Core positioning
VibnAI is not “blank page AI coding.”
VibnAI is:
Build production-ready apps from elite starter templates
then customize via guided AI workflows.
This reduces:
token burn
failure loops
architectural ambiguity
debugging chaos
And increases:
predictability
success rate
margins
retention
Template-first rule
No project starts from an empty repo by default.
Users must choose:
a starter template, or
“Advanced: Custom Build” (explicitly warned as costlier)
2) Platform Architecture: Your Infra + Event-Driven AI
High-level architecture decisions
You host the infrastructure layer yourself (Hot + Cold tiers). AI compute is purchased via credits.
Hot tier (shared, always running)
API Gateway (auth, WebSockets, rate limits)
Orchestrator service (task routing + state machine)
Job queue + worker pool
Postgres (conversations, tasks, state)
Redis (optional: queue/pubsub)
Gitea (code/content source-of-truth)
Coolify (deploys, logs, runtime orchestration)
Key rule: The hot tier is always on, but it should be cheap to run because it is mostly event-driven and does not constantly call expensive models.
Cold tier (per-user, on-demand)
Agent workspace containers
Hibernate / wake-on-access
Persistent storage volumes
“Master Orchestrator” behavior change (critical cost control)
Even if its “always running,” it should behave like:
event-driven
stateless compute
minimal model calls
structured memory, not replaying chat history
Structured memory > conversation replay
Instead of resending entire conversation history, persist and inject:
project summary
architecture summary
repo map summary
deploy state
open tasks
known bugs
This is a major cost reducer.
3) AI Model Strategy: 3-Tier Routing (Cost-Efficient Orchestration)
Youre building your own agents, but the principle applies: choose models per tool/task.
Tier A / Tier B / Tier C (the blend)
We landed on this operational blend:
40% Tier A (cheap)
45% Tier B (mid / workhorse coder)
15% Tier C (premium escalation)
This is not arbitrary—it aligns with tool/task reality:
most actions are parsing, routing, search, summarizing (cheap)
most code edits and implementations are workhorse coding (mid)
only a small fraction require deep reasoning / high-stakes decisions (premium)
Tier purpose
Tier A — Cheap “Utility / Router”
Use for:
routing decisions
summarizing logs, errors, context
file discovery + search interpretation
command suggestion drafts
task context updates
chat summaries / naming
monitoring analysis
This tier should handle the majority of orchestration.
Tier B — Workhorse Coding Model
Use for:
generating diffs
writing/refactoring code
tests
standard bug fixes
“agent mode” loops when tasks are scoped
iterating on features inside templates
This tier should handle most coding.
Tier C — Premium Escalation Model
Use only when:
architecture decisions
high-risk changes (deploy, infra, migrations)
cross-service debugging
persistent failures (2 failed iterations)
very large diffs / multi-file refactors
security-sensitive changes
This tier should be rare by design.
4) Vertex Models: What to Use in Each Tier
You wanted to stay on Google infra and Vertex marketplace/API models.
Recommended mapping (Vertex-first)
Tier A (cheap)
Gemini Flash-class model (fast, low cost)
Use for orchestration, summaries, extraction, routing, log parsing.
Tier B (mid / coding workhorse)
Pick one:
GLM-5 MaaS (Vertex) — strong reasoning + cost-effective
Qwen coder MaaS (Vertex) — strong coding, predictable cost
This model does the heavy lifting for code edits and feature building.
Tier C (premium escalation)
Pick one:
Claude Sonnet 4.6 on Vertex (reliability + long-chain coding)
or Gemini 3.1 Pro Preview (if it proves better for your workflows)
This is your “expert brain” used sparingly.
5) Routing Policy: How the System Chooses Models
Youre not letting users pick models manually. The orchestrator routes based on task complexity and risk.
Default rules
All “read/search/list/summarize” → Tier A
Most code edits/refactors/tests → Tier B
High-risk or repeated failure → Tier C
Escalation triggers (simple + effective)
Escalate Tier B → Tier C when any of these happen:
2 failed iterations (tests still failing, same error persists)
Touching >5 files
Diff size exceeds ~400 LOC changed
Deployment / infra / secrets / migration steps involved
Context pressure (approaching model limits)
De-escalation rule
Once the hard part is resolved (cause found / plan decided), drop back to Tier B for implementation.
6) Business Model: Subscription + Credits (Not “Unlimited AI”)
You clarified the intended split:
Subscription covers your fixed costs
Subscription pays for:
your hosted infrastructure (hot tier + shared services)
Agent workspace orchestration (cold tier)
your people costs (support, ops, ongoing development)
product value (templates, UX, dashboards, workflows)
baseline included usage / small AI overhead
Credits cover variable compute
Credits pay for:
model calls (Tier A/B/C)
heavy tasks (builds, refactors, debugging loops)
long chain tasks
autonomous agent execution
This protects you from heavy users and keeps margins predictable.
7) Template Access as a Tiered Product (Shopify-style)
Templates are the moat
Templates reduce:
architecture planning cost
retry loops
token burn
complexity and failure rates
Templates also create:
differentiation
a marketplace opportunity later
compounding margins
Tiering via template access
Instead of just “more AI,” higher tiers unlock better starter systems.
Example approach:
Starter tier
landing page template
simple SaaS CRUD template
basic auth + Stripe
limited integrations
Builder tier
multi-tenant SaaS template
marketplace template
analytics dashboard template
stronger RBAC patterns
more integrations
Pro tier
“OpsOS / analytics warehouse” template
monitoring + alerting template
ML-ready pipeline template
advanced data model scaffolds
Enterprise
custom templates
compliance add-ons
private deployments
dedicated support / SLAs
8) Credit Pricing: Fixed Markup per Model
You said you want:
credits based on user actions, with fixed markup on every model
This implies:
Each model has an internal “true cost”
You charge credits at a consistent markup multiplier
Premium models may have a higher markup (optional), but you can keep it fixed if you prefer simplicity
How it should feel to the user
“This action will cost ~X credits”
“Set a spending cap per day/project”
“Require approval if a task is estimated > Y credits”
This prevents runaway spending and builds trust.
9) Key Risk Controls We Agreed Are Necessary
To make this sellable and safe:
Token and autonomy guardrails
max tokens per step
max retries per task
auto-summarize context aggressively
store structured memory, not chat replay
only send diffs / minimal file slices
caching where possible (especially for repeated prefixes)
UX controls
show credit burn in real time
warn/approve for high-cost tasks
allow user-set budgets
explain why escalation happened (briefly)
10) The End State
VibnAI becomes:
A template-first “product builder OS”
powered by multi-model orchestration
hosted on your infra
with predictable economics via subscription + credits
and a defensible moat via templates + routing intelligence