vibn-agent-runner/docs_archive/product-idea-a.md at d5467bf2364658b36ffb91cde341f0c30034bf09

Files

mawkone 3563b98de1 chore: clean up root directory, move docs to /docs and legacy plans to /docs_archive

2026-05-07 15:05:34 -07:00

7.7 KiB

Raw Blame History

VibnAI Plan Summary — “Shopify Template Model” + Your Infra + Model Routing + Pricing

Below is the consolidated plan we’ve converged on: VibnAI as a template-first product builder (Shopify-style), with your own hosted infra, and usage-based AI credits powered by Vertex marketplace models with smart routing.

Product Strategy: VibnAI Is Shopify for Building Software Core positioning

VibnAI is not “blank page AI coding.”

VibnAI is:

Build production-ready apps from elite starter templates then customize via guided AI workflows.

This reduces:

token burn

failure loops

architectural ambiguity

debugging chaos

And increases:

predictability

success rate

margins

retention

Template-first rule

No project starts from an empty repo by default.

Users must choose:

a starter template, or

“Advanced: Custom Build” (explicitly warned as costlier)

Platform Architecture: Your Infra + Event-Driven AI High-level architecture decisions

You host the infrastructure layer yourself (Hot + Cold tiers). AI compute is purchased via credits.

Hot tier (shared, always running)

API Gateway (auth, WebSockets, rate limits)

Orchestrator service (task routing + state machine)

Job queue + worker pool

Postgres (conversations, tasks, state)

Redis (optional: queue/pubsub)

Gitea (code/content source-of-truth)

Coolify (deploys, logs, runtime orchestration)

Key rule: The hot tier is always on, but it should be cheap to run because it is mostly event-driven and does not constantly call expensive models.

Cold tier (per-user, on-demand)

Agent workspace containers

Hibernate / wake-on-access

Persistent storage volumes

“Master Orchestrator” behavior change (critical cost control)

Even if it’s “always running,” it should behave like:

event-driven

stateless compute

minimal model calls

structured memory, not replaying chat history

Structured memory > conversation replay Instead of resending entire conversation history, persist and inject:

project summary

architecture summary

repo map summary

deploy state

open tasks

known bugs

This is a major cost reducer.

AI Model Strategy: 3-Tier Routing (Cost-Efficient Orchestration)

You’re building your own agents, but the principle applies: choose models per tool/task.

Tier A / Tier B / Tier C (the blend)

We landed on this operational blend:

40% Tier A (cheap)

45% Tier B (mid / workhorse coder)

15% Tier C (premium escalation)

This is not arbitrary—it aligns with tool/task reality:

most actions are parsing, routing, search, summarizing (cheap)

most code edits and implementations are workhorse coding (mid)

only a small fraction require deep reasoning / high-stakes decisions (premium)

Tier purpose Tier A — Cheap “Utility / Router”

Use for:

routing decisions

summarizing logs, errors, context

file discovery + search interpretation

command suggestion drafts

task context updates

chat summaries / naming

monitoring analysis

This tier should handle the majority of orchestration.

Tier B — Workhorse Coding Model

Use for:

generating diffs

writing/refactoring code

tests

standard bug fixes

“agent mode” loops when tasks are scoped

iterating on features inside templates

This tier should handle most coding.

Tier C — Premium Escalation Model

Use only when:

architecture decisions

high-risk changes (deploy, infra, migrations)

cross-service debugging

persistent failures (2 failed iterations)

very large diffs / multi-file refactors

security-sensitive changes

This tier should be rare by design.

Vertex Models: What to Use in Each Tier

You wanted to stay on Google infra and Vertex marketplace/API models.

Recommended mapping (Vertex-first) Tier A (cheap)

Gemini Flash-class model (fast, low cost) Use for orchestration, summaries, extraction, routing, log parsing.

Tier B (mid / coding workhorse)

Pick one:

GLM-5 MaaS (Vertex) — strong reasoning + cost-effective

Qwen coder MaaS (Vertex) — strong coding, predictable cost

This model does the heavy lifting for code edits and feature building.

Tier C (premium escalation)

Pick one:

Claude Sonnet 4.6 on Vertex (reliability + long-chain coding)

or Gemini 3.1 Pro Preview (if it proves better for your workflows)

This is your “expert brain” used sparingly.

Routing Policy: How the System Chooses Models

You’re not letting users pick models manually. The orchestrator routes based on task complexity and risk.

Default rules

All “read/search/list/summarize” → Tier A

Most code edits/refactors/tests → Tier B

High-risk or repeated failure → Tier C

Escalation triggers (simple + effective)

Escalate Tier B → Tier C when any of these happen:

2 failed iterations (tests still failing, same error persists)

Touching >5 files

Diff size exceeds ~400 LOC changed

Deployment / infra / secrets / migration steps involved

Context pressure (approaching model limits)

De-escalation rule

Once the hard part is resolved (cause found / plan decided), drop back to Tier B for implementation.

Business Model: Subscription + Credits (Not “Unlimited AI”)

You clarified the intended split:

Subscription covers your fixed costs

Subscription pays for:

your hosted infrastructure (hot tier + shared services)

Agent workspace orchestration (cold tier)

your people costs (support, ops, ongoing development)

product value (templates, UX, dashboards, workflows)

baseline included usage / small AI overhead

Credits cover variable compute

Credits pay for:

model calls (Tier A/B/C)

heavy tasks (builds, refactors, debugging loops)

long chain tasks

autonomous agent execution

This protects you from heavy users and keeps margins predictable.

Template Access as a Tiered Product (Shopify-style) Templates are the moat

Templates reduce:

architecture planning cost

retry loops

token burn

complexity and failure rates

Templates also create:

differentiation

a marketplace opportunity later

compounding margins

Tiering via template access

Instead of just “more AI,” higher tiers unlock better starter systems.

Example approach:

Starter tier

landing page template

simple SaaS CRUD template

basic auth + Stripe

limited integrations

Builder tier

multi-tenant SaaS template

marketplace template

analytics dashboard template

stronger RBAC patterns

more integrations

Pro tier

“OpsOS / analytics warehouse” template

monitoring + alerting template

ML-ready pipeline template

advanced data model scaffolds

Enterprise

custom templates

compliance add-ons

private deployments

dedicated support / SLAs

Credit Pricing: Fixed Markup per Model

You said you want:

credits based on user actions, with fixed markup on every model

This implies:

Each model has an internal “true cost”

You charge credits at a consistent markup multiplier

Premium models may have a higher markup (optional), but you can keep it fixed if you prefer simplicity

How it should feel to the user

“This action will cost ~X credits”

“Set a spending cap per day/project”

“Require approval if a task is estimated > Y credits”

This prevents runaway spending and builds trust.

Key Risk Controls We Agreed Are Necessary

To make this sellable and safe:

Token and autonomy guardrails

max tokens per step

max retries per task

auto-summarize context aggressively

store structured memory, not chat replay

only send diffs / minimal file slices

caching where possible (especially for repeated prefixes)

UX controls

show credit burn in real time

warn/approve for high-cost tasks

allow user-set budgets

explain why escalation happened (briefly)

The End State

VibnAI becomes:

A template-first “product builder OS”

hosted on your infra

with predictable economics via subscription + credits

and a defensible moat via templates + routing intelligence

7.7 KiB Raw Blame History Unescape Escape

7.7 KiB

Raw Blame History