master-ai/product-idea-a.md at b679d0f6e69343c9e39efab0d49d748fb8f6a75b

Archived

This repository has been archived on 2026-06-07. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

mawkone 99deb546c8 Rip out Theia, bump submodules, retire platform/ scaffold, snapshot docs + design assets

Theia rip-out (parent):
- Remove theia submodule entry (the local fork, Gitea repo, Coolify app,
  Cloud Run services, and Artifact Registry image are all gone)
- Drop README.md + INFRASTRUCTURE.md (obsolete "Project OS" snapshots
  that also leaked API tokens) and setup.sh (Theia clone bootstrap)
- Delete UI-DESIGN-GUIDE.md, BACKEND_AGENTS_PLAN.md, VIBN_BUILD_PLAN.md,
  VISUAL_EDITOR_PLAN.md, core-packages.md, ai-packages.md, tools-list.md
  (all 100% Theia-specific or superseded)
- Surgical scrubs of remaining Theia mentions in
  AGENT_EXECUTION_ARCHITECTURE.md and TURBOREPO_MIGRATION_PLAN.md

Submodule bumps:
- vibn-agent-runner: Theia rip-out + MCP refactor (api/wrapper/server
  pattern across shell/file/git/memory/prd/search/agent/gitea/coolify)
- vibn-frontend: Theia rip-out + P5.1 attach E2E + Justine UI WIP

Retire platform/ scaffold:
- Remove platform/backend/ (control-plane, executors, mcp-adapter),
  platform/client-ide/ (gcp-productos extension), platform/contracts/,
  platform/infra/terraform/, platform/scripts/templates/turborepo/
  (replaced by vibn-agent-runner + vibn-frontend + Coolify direct)
- Drop architecture.md, technical_spec.md, vision-ext.md,
  "1.Generate Control Plane API scaffold.md" (same era)

Docs / planning snapshots (new):
- AI_CAPABILITIES.md, AI_CAPABILITIES_ROADMAP.md
- AGENT_TELEMETRY_STREAMING_PROJECT.md
- VIBN_PRD.md, product-idea-a.md

Design assets (new):
- branding/{coolify,gitea,ux-testing}/ static brand collateral
- justine/ HTML mockups for the new onboarding/build flows
- preview-assist-ui/ Vite scratch app
- master-ai.code-workspace

Infra helpers (new):
- setup-coolify-montreal.sh provisioner
- gitea-docker-compose.yml
- vibn-coolify-schema.sql for the Coolify Postgres extensions
- prd-agent-prompt.pdf, prompt, root.txt, remixed-9edec9e9.tsx scratch
- flatten.sh helper

.gitignore: ignore **/node_modules, **/.next, **/.turbo, **/coverage

Made-with: Cursor

2026-04-22 18:06:37 -07:00

7.7 KiB

Raw Blame History

VibnAI Plan Summary — “Shopify Template Model” + Your Infra + Model Routing + Pricing

Below is the consolidated plan we’ve converged on: VibnAI as a template-first product builder (Shopify-style), with your own hosted infra, and usage-based AI credits powered by Vertex marketplace models with smart routing.

Product Strategy: VibnAI Is Shopify for Building Software Core positioning

VibnAI is not “blank page AI coding.”

VibnAI is:

Build production-ready apps from elite starter templates then customize via guided AI workflows.

This reduces:

token burn

failure loops

architectural ambiguity

debugging chaos

And increases:

predictability

success rate

margins

retention

Template-first rule

No project starts from an empty repo by default.

Users must choose:

a starter template, or

“Advanced: Custom Build” (explicitly warned as costlier)

Platform Architecture: Your Infra + Event-Driven AI High-level architecture decisions

You host the infrastructure layer yourself (Hot + Cold tiers). AI compute is purchased via credits.

Hot tier (shared, always running)

API Gateway (auth, WebSockets, rate limits)

Orchestrator service (task routing + state machine)

Job queue + worker pool

Postgres (conversations, tasks, state)

Redis (optional: queue/pubsub)

Gitea (code/content source-of-truth)

Coolify (deploys, logs, runtime orchestration)

Key rule: The hot tier is always on, but it should be cheap to run because it is mostly event-driven and does not constantly call expensive models.

Cold tier (per-user, on-demand)

Agent workspace containers

Hibernate / wake-on-access

Persistent storage volumes

“Master Orchestrator” behavior change (critical cost control)

Even if it’s “always running,” it should behave like:

event-driven

stateless compute

minimal model calls

structured memory, not replaying chat history

Structured memory > conversation replay Instead of resending entire conversation history, persist and inject:

project summary

architecture summary

repo map summary

deploy state

open tasks

known bugs

This is a major cost reducer.

AI Model Strategy: 3-Tier Routing (Cost-Efficient Orchestration)

You’re building your own agents, but the principle applies: choose models per tool/task.

Tier A / Tier B / Tier C (the blend)

We landed on this operational blend:

40% Tier A (cheap)

45% Tier B (mid / workhorse coder)

15% Tier C (premium escalation)

This is not arbitrary—it aligns with tool/task reality:

most actions are parsing, routing, search, summarizing (cheap)

most code edits and implementations are workhorse coding (mid)

only a small fraction require deep reasoning / high-stakes decisions (premium)

Tier purpose Tier A — Cheap “Utility / Router”

Use for:

routing decisions

summarizing logs, errors, context

file discovery + search interpretation

command suggestion drafts

task context updates

chat summaries / naming

monitoring analysis

This tier should handle the majority of orchestration.

Tier B — Workhorse Coding Model

Use for:

generating diffs

writing/refactoring code

tests

standard bug fixes

“agent mode” loops when tasks are scoped

iterating on features inside templates

This tier should handle most coding.

Tier C — Premium Escalation Model

Use only when:

architecture decisions

high-risk changes (deploy, infra, migrations)

cross-service debugging

persistent failures (2 failed iterations)

very large diffs / multi-file refactors

security-sensitive changes

This tier should be rare by design.

Vertex Models: What to Use in Each Tier

You wanted to stay on Google infra and Vertex marketplace/API models.

Recommended mapping (Vertex-first) Tier A (cheap)

Gemini Flash-class model (fast, low cost) Use for orchestration, summaries, extraction, routing, log parsing.

Tier B (mid / coding workhorse)

Pick one:

GLM-5 MaaS (Vertex) — strong reasoning + cost-effective

Qwen coder MaaS (Vertex) — strong coding, predictable cost

This model does the heavy lifting for code edits and feature building.

Tier C (premium escalation)

Pick one:

Claude Sonnet 4.6 on Vertex (reliability + long-chain coding)

or Gemini 3.1 Pro Preview (if it proves better for your workflows)

This is your “expert brain” used sparingly.

Routing Policy: How the System Chooses Models

You’re not letting users pick models manually. The orchestrator routes based on task complexity and risk.

Default rules

All “read/search/list/summarize” → Tier A

Most code edits/refactors/tests → Tier B

High-risk or repeated failure → Tier C

Escalation triggers (simple + effective)

Escalate Tier B → Tier C when any of these happen:

2 failed iterations (tests still failing, same error persists)

Touching >5 files

Diff size exceeds ~400 LOC changed

Deployment / infra / secrets / migration steps involved

Context pressure (approaching model limits)

De-escalation rule

Once the hard part is resolved (cause found / plan decided), drop back to Tier B for implementation.

Business Model: Subscription + Credits (Not “Unlimited AI”)

You clarified the intended split:

Subscription covers your fixed costs

Subscription pays for:

your hosted infrastructure (hot tier + shared services)

Agent workspace orchestration (cold tier)

your people costs (support, ops, ongoing development)

product value (templates, UX, dashboards, workflows)

baseline included usage / small AI overhead

Credits cover variable compute

Credits pay for:

model calls (Tier A/B/C)

heavy tasks (builds, refactors, debugging loops)

long chain tasks

autonomous agent execution

This protects you from heavy users and keeps margins predictable.

Template Access as a Tiered Product (Shopify-style) Templates are the moat

Templates reduce:

architecture planning cost

retry loops

token burn

complexity and failure rates

Templates also create:

differentiation

a marketplace opportunity later

compounding margins

Tiering via template access

Instead of just “more AI,” higher tiers unlock better starter systems.

Example approach:

Starter tier

landing page template

simple SaaS CRUD template

basic auth + Stripe

limited integrations

Builder tier

multi-tenant SaaS template

marketplace template

analytics dashboard template

stronger RBAC patterns

more integrations

Pro tier

“OpsOS / analytics warehouse” template

monitoring + alerting template

ML-ready pipeline template

advanced data model scaffolds

Enterprise

custom templates

compliance add-ons

private deployments

dedicated support / SLAs

Credit Pricing: Fixed Markup per Model

You said you want:

credits based on user actions, with fixed markup on every model

This implies:

Each model has an internal “true cost”

You charge credits at a consistent markup multiplier

Premium models may have a higher markup (optional), but you can keep it fixed if you prefer simplicity

How it should feel to the user

“This action will cost ~X credits”

“Set a spending cap per day/project”

“Require approval if a task is estimated > Y credits”

This prevents runaway spending and builds trust.

Key Risk Controls We Agreed Are Necessary

To make this sellable and safe:

Token and autonomy guardrails

max tokens per step

max retries per task

auto-summarize context aggressively

store structured memory, not chat replay

only send diffs / minimal file slices

caching where possible (especially for repeated prefixes)

UX controls

show credit burn in real time

warn/approve for high-cost tasks

allow user-set budgets

explain why escalation happened (briefly)

The End State

VibnAI becomes:

A template-first “product builder OS”

hosted on your infra

with predictable economics via subscription + credits

and a defensible moat via templates + routing intelligence

7.7 KiB Raw Blame History Unescape Escape

7.7 KiB

Raw Blame History