Files
master-ai/architecture.md
2026-01-21 15:35:57 -08:00

32 KiB
Raw Permalink Blame History

  1. Recommended reference architecture (Web SaaS-first, 1 product = 1 GCP project per env) Project model

One product = one GCP project per environment

product-foo-dev

product-foo-staging

product-foo-prod

Optional “platform” projects (yours, not the customers):

productos-control-plane (your backend + tool registry + auth)

productos-observability (optional central dashboards / cross-product rollups)

productos-billing-export (optional BigQuery billing export aggregation)

High-level runtime pattern

IDE + Supervisor AI never touch DBs/services directly. They call your Control Plane API, which routes to domain Executors (Cloud Run services) with least-privilege service accounts.

VSCodium IDE (Product OS UI) Supervisor AI (Vertex) \ / \ / -----> Control Plane API ---- | ------------------------------------------------- | | | | | Deploy Exec Analytics Exec Firestore SQL Exec Marketing Exec (Cloud Build (BigQuery jobs) Exec Exec (Missinglettr,

  • Cloud Run) (Company (Cloud email provider) Brain) SQL)

Per-product (customer) project: “product-foo-prod”

Must-have services

Cloud Run: product services + executors (if you deploy executors into product project)

Cloud SQL (Postgres/MySQL): transactional app data

Firestore: config + “Company Brain” + style profiles + run metadata (if you keep metadata per product)

BigQuery: event warehouse + analytics datasets/views + experimentation tables

Pub/Sub: event bus for product events + tool events

Cloud Tasks / Workflows / Scheduler: durable automation + cron-based routines

Secret Manager: tokens, DB creds, OAuth secrets (never in code)

Logging/Monitoring/Trace: observability

Where to place executors

Simplest: executors live in the product project (tight coupling, simple data access)

More “platform”: executors live in your platform project, and access product resources cross-project (strong central control, but more IAM + org policy considerations)

For your “product per project” approach, I recommend:

Deploy executor can live in platform (deploy across projects)

Data executors (SQL/Firestore/BigQuery) often live in product project (least-cross-project permissions)

Data flows

Events: Product apps → Pub/Sub → BigQuery (raw + curated)

Causation/insights: Analytics Exec reads BigQuery → writes Insight Objects to:

BigQuery tables (truth)

GCS artifacts (reports)

Firestore (summary pointers for UI)

Marketing: Marketing Exec pulls Insight Objects + Company Brain → generates campaigns → publishes via Missinglettr/social APIs; stores outputs in GCS + metadata in Firestore

  1. Service-by-service IAM roles matrix (least privilege template) Identities (service accounts)

Youll typically have:

sa-control-plane (platform): routes tool calls, enforces policy, writes run metadata/artifacts

sa-deploy-executor (platform): triggers builds and deploys to Cloud Run in product projects

sa-analytics-executor (product): reads BigQuery + writes insights

sa-firestore-executor (product): reads/writes Company Brain + configs

sa-sql-executor (product): connects to Cloud SQL (plus DB user for SQL-level permissions)

sa-marketing-executor (platform or product): reads insights + calls Missinglettr/email providers; reads secrets

Where I say “product project”, apply it to each env project (dev/staging/prod).

IAM matrix (by service) Service / Scope Principal Roles (suggested) Notes Cloud Run (product) sa-deploy-executor roles/run.admin (or narrower), roles/iam.serviceAccountUser (only on the runtime SA), roles/run.invoker (optional) Deploy revisions. Narrow iam.serviceAccountUser to only the runtime SA used by the service being deployed. Cloud Build (platform or product) sa-deploy-executor roles/cloudbuild.builds.editor (or builds.builder depending on workflow) Triggers builds. Many teams keep builds centralized in platform. Artifact Registry sa-deploy-executor roles/artifactregistry.writer Push images. If per-product registries, scope accordingly. Secret Manager (platform/product) sa-marketing-executor, sa-deploy-executor roles/secretmanager.secretAccessor Only for the specific secrets needed. BigQuery dataset (product) sa-analytics-executor roles/bigquery.dataViewer + roles/bigquery.jobUser Dataset-level grants. Prefer views/curated datasets. BigQuery dataset (product write) sa-analytics-executor roles/bigquery.dataEditor (only for insight tables dataset) Separate datasets: events_raw (read), events_curated (read), insights (write). Firestore (product) sa-firestore-executor roles/datastore.user (or roles/datastore.viewer) Use viewer when possible; writer only for Brain/config updates. Cloud SQL (product) sa-sql-executor roles/cloudsql.client IAM to connect; SQL permissions handled by DB user(s). Pub/Sub (product) Producers roles/pubsub.publisher For product services emitting events. Pub/Sub (product) Consumers/executors roles/pubsub.subscriber For analytics/executor ingestion. Cloud Tasks (product/platform) sa-control-plane or orchestrator roles/cloudtasks.enqueuer + roles/cloudtasks.viewer If you queue tool runs or retries. Workflows (product/platform) sa-control-plane roles/workflows.invoker For orchestrated multi-step automations. Cloud Storage (GCS artifacts) sa-control-plane roles/storage.objectAdmin (bucket-level) Write run artifacts; consider objectCreator + separate delete policy if you want immutability. Cloud Run executors (wherever hosted) sa-control-plane roles/run.invoker Control Plane calls executors over HTTP. Strongly recommended scoping rules

Grant BigQuery roles at the dataset level, not project level.

Use separate datasets for raw, curated, and insights.

For Cloud SQL, enforce read-only DB users for most endpoints; create a separate writer user only when needed.

Keep a “high risk” policy that requires approval for:

pricing changes

billing actions

production destructive infra

legal/claim-heavy marketing copy

  1. Agent tool catalog (seed tool registry mapped to GCP services)

This is a starter “tool universe” your Supervisor AI + IDE can call. Ive grouped by module and listed the backing GCP service.

A) Code module (build/test/deploy) Tool name Purpose Executes in Backed by repo.apply_patch Apply diff to repo (local or PR flow) Control Plane / Repo service (GitHub App or local workspace) repo.open_pr Open PR with changes Control Plane GitHub App build.run_tests Run unit tests Executor (local/offline or remote) Cloud Build / local runner cloudrun.deploy_service Build + deploy service Deploy Exec Cloud Build + Cloud Run cloudrun.rollback_service Roll back revision Deploy Exec Cloud Run cloudrun.get_service_status Health, revisions, URL Deploy Exec Cloud Run logs.tail Tail logs for service/run Observability Exec Cloud Logging B) Marketing module (campaign creation + publishing) Tool name Purpose Executes in Backed by brand.get_profile Fetch voice/style/claims Firestore Exec Firestore brand.update_profile Update voice/style rules Firestore Exec Firestore marketing.generate_campaign_plan Create campaign plan from insight/product update Marketing Exec Vertex AI (Gemini) marketing.generate_channel_posts Generate platform-specific posts Marketing Exec Vertex AI (Gemini) marketing.publish_missinglettr Schedule/publish via Missinglettr Marketing Exec Missinglettr API + Secret Manager marketing.publish_email Send email campaign Marketing Exec Email provider (SendGrid/etc) + Secret Manager marketing.store_assets Save creatives/outputs Marketing Exec GCS marketing.get_campaign_status Poll publish status Marketing Exec Missinglettr / provider APIs C) Analytics module (events, funnels, causation) Tool name Purpose Executes in Backed by events.ingest Ingest events (if you own ingestion endpoint) Analytics/Ingress Exec Pub/Sub + BigQuery analytics.funnel_summary Funnel metrics Analytics Exec BigQuery analytics.cohort_retention Retention cohorts Analytics Exec BigQuery analytics.anomaly_detect Detect anomalies in KPIs Analytics Exec BigQuery / BQML analytics.top_drivers Feature/sequence drivers Analytics Exec BigQuery / BQML / Vertex analytics.causal_uplift Uplift/causal impact estimate Analytics Exec BigQuery + Vertex (optional) analytics.write_insight Persist insight object Analytics Exec BigQuery + Firestore pointer + GCS artifact D) Growth module (onboarding + lifecycle optimization) Tool name Purpose Executes in Backed by growth.identify_dropoffs Identify where users drop Analytics Exec BigQuery growth.propose_experiment Generate experiment hypothesis/design Growth Exec Gemini + policies experiments.create Create experiment definition Experiments Exec Firestore/SQL + your assignment service experiments.evaluate Evaluate results Analytics/Experiments Exec BigQuery growth.generate_lifecycle_messages Draft onboarding/lifecycle content Marketing/Growth Exec Gemini E) Support module (feedback + ticket assist) Tool name Purpose Executes in Backed by support.ingest_tickets Pull tickets from provider Support Exec Zendesk/Intercom API support.summarize_ticket Summarize and classify Support Exec Gemini support.draft_reply Draft response Support Exec Gemini + brand profile support.update_kb Generate/update KB article Support Exec CMS/Docs + GCS support.escalate_issue Create issue/task Support Exec GitHub Issues/Jira/etc F) Infrastructure module (safe, templated ops only) Tool name Purpose Executes in Backed by infra.provision_service_template Create a Cloud Run service template Infra Exec Terraform/Cloud APIs infra.provision_database Create Cloud SQL/Firestore config Infra Exec Cloud SQL / Firestore infra.provision_pubsub Topics/subscriptions Infra Exec Pub/Sub infra.rotate_secret Rotate/refresh secrets Infra Exec Secret Manager infra.cost_report Cost summary for product Analytics/FinOps Exec Billing export → BigQuery G) Auth module (product auth + internal access) Tool name Purpose Executes in Backed by auth.configure_identity_platform Set up end-user auth Auth/Infra Exec Identity Platform auth.configure_iap Protect internal tools Infra Exec IAP auth.create_oauth_client Create OAuth creds for integrations Infra Exec Google OAuth / Secret Manager Recommended “v1 tool registry seed” (small, shippable)

If you want a tight first release, seed only these tools:

Code

cloudrun.deploy_service

cloudrun.get_service_status

logs.tail

Company Brain

brand.get_profile

brand.update_profile

Analytics

analytics.funnel_summary

analytics.top_drivers

analytics.write_insight

Marketing

marketing.generate_channel_posts

marketing.publish_missinglettr

Thats enough to demonstrate the full Product OS loop: deploy → events → insight → campaign → publish → measure.

If you want, I can also output this as an actual tool-registry.yaml starter file (matching the schema we drafted earlier) with concrete JSON Schemas for each of the v1 tools.

yes gor for it

Heres a starter tool-registry.yaml you can drop into contracts/tool-registry.yaml. It matches the schema we defined earlier and includes concrete JSON Schemas for the v1 seed tools:

cloudrun.deploy_service

cloudrun.get_service_status

logs.tail

brand.get_profile

brand.update_profile

analytics.funnel_summary

analytics.top_drivers

analytics.write_insight

marketing.generate_channel_posts

marketing.publish_missinglettr

Replace the executor.url placeholders with your actual Cloud Run service URLs.

version: 1

tools:

----------------------------

CODE / DEPLOYMENT

----------------------------

cloudrun.deploy_service: description: Build and deploy a Cloud Run service using Cloud Build. Returns the service URL and deployed revision. risk: medium executor: kind: http url: https://deploy-executor-REPLACE.a.run.app path: /execute/cloudrun/deploy inputSchema: type: object additionalProperties: false required: [service_name, repo, ref, env, region] properties: service_name: type: string minLength: 1 description: Cloud Run service name. repo: type: string minLength: 1 description: Git repo URL (HTTPS). ref: type: string minLength: 1 description: Git ref (branch/tag/SHA). env: type: string enum: [dev, staging, prod] region: type: string minLength: 1 description: GCP region for the Cloud Run service (e.g., us-central1). build: type: object additionalProperties: false properties: dockerfile_path: type: string default: Dockerfile build_context: type: string default: . env_vars: type: object additionalProperties: type: string description: Environment variables to set during build/deploy (non-secret). deploy: type: object additionalProperties: false properties: cpu: type: string description: Cloud Run CPU (e.g., "1", "2"). memory: type: string description: Cloud Run memory (e.g., "512Mi", "1Gi"). min_instances: type: integer minimum: 0 max_instances: type: integer minimum: 1 concurrency: type: integer minimum: 1 timeout_seconds: type: integer minimum: 1 maximum: 3600 service_account_email: type: string description: Runtime service account email for the Cloud Run service. allow_unauthenticated: type: boolean default: false outputSchema: type: object additionalProperties: false required: [service_url, revision] properties: service_url: type: string revision: type: string build_id: type: string warnings: type: array items: type: string

cloudrun.get_service_status: description: Fetch Cloud Run service status including latest revision and URL. risk: low executor: kind: http url: https://deploy-executor-REPLACE.a.run.app path: /execute/cloudrun/status inputSchema: type: object additionalProperties: false required: [service_name, region] properties: service_name: type: string minLength: 1 region: type: string minLength: 1 outputSchema: type: object additionalProperties: false required: [service_name, region, service_url, latest_ready_revision, status] properties: service_name: type: string region: type: string service_url: type: string latest_ready_revision: type: string status: type: string enum: [ready, deploying, error, unknown] last_deploy_time: type: string description: ISO timestamp if available.

logs.tail: description: Tail recent logs for a Cloud Run service or for a specific run_id. Returns log lines (best-effort). risk: low executor: kind: http url: https://observability-executor-REPLACE.a.run.app path: /execute/logs/tail inputSchema: type: object additionalProperties: false required: [scope, limit] properties: scope: type: string enum: [service, run] description: Tail logs by service or by tool run. service_name: type: string description: Required if scope=service. region: type: string description: Optional when scope=service, depending on your log query strategy. run_id: type: string description: Required if scope=run. limit: type: integer minimum: 1 maximum: 2000 default: 200 since_seconds: type: integer minimum: 1 maximum: 86400 default: 900 outputSchema: type: object additionalProperties: false required: [lines] properties: lines: type: array items: type: object additionalProperties: false required: [timestamp, text] properties: timestamp: type: string severity: type: string text: type: string

----------------------------

COMPANY BRAIN (BRAND + STYLE)

----------------------------

brand.get_profile: description: Retrieve the tenant's brand profile (voice, tone, positioning, compliance constraints). risk: low executor: kind: http url: https://firestore-executor-REPLACE.a.run.app path: /execute/brand/get_profile inputSchema: type: object additionalProperties: false required: [profile_id] properties: profile_id: type: string minLength: 1 description: Brand profile identifier (e.g., "default"). outputSchema: type: object additionalProperties: false required: [profile_id, brand] properties: profile_id: type: string brand: type: object additionalProperties: false required: [name, voice, audience, claims_policy] properties: name: type: string voice: type: object additionalProperties: false required: [tone, style_notes, do, dont] properties: tone: type: array items: { type: string } style_notes: type: array items: { type: string } do: type: array items: { type: string } dont: type: array items: { type: string } audience: type: object additionalProperties: false properties: primary: type: string secondary: type: string claims_policy: type: object additionalProperties: false properties: forbidden_claims: type: array items: { type: string } required_disclaimers: type: array items: { type: string } compliance_notes: type: array items: { type: string }

brand.update_profile: description: Update the tenant's brand profile. Write operations should be validated and audited. risk: medium executor: kind: http url: https://firestore-executor-REPLACE.a.run.app path: /execute/brand/update_profile inputSchema: type: object additionalProperties: false required: [profile_id, patch] properties: profile_id: type: string minLength: 1 patch: type: object description: Partial update object; executor must validate allowed fields. outputSchema: type: object additionalProperties: false required: [ok, updated_at] properties: ok: type: boolean updated_at: type: string

----------------------------

ANALYTICS / CAUSATION (V1 metrics + drivers)

----------------------------

analytics.funnel_summary: description: Return funnel metrics for a time window. Uses curated events in BigQuery. risk: low executor: kind: http url: https://analytics-executor-REPLACE.a.run.app path: /execute/analytics/funnel_summary inputSchema: type: object additionalProperties: false required: [range_days, funnel] properties: range_days: type: integer minimum: 1 maximum: 365 funnel: type: object additionalProperties: false required: [name, steps] properties: name: type: string steps: type: array minItems: 2 items: type: object additionalProperties: false required: [event_name] properties: event_name: type: string filter: type: object description: Optional event property filters (executor-defined). segment: type: object description: Optional segment definition (executor-defined). outputSchema: type: object additionalProperties: false required: [funnel_name, range_days, steps] properties: funnel_name: type: string range_days: type: integer steps: type: array items: type: object additionalProperties: false required: [event_name, users, conversion_from_prev] properties: event_name: type: string users: type: integer minimum: 0 conversion_from_prev: type: number minimum: 0 maximum: 1

analytics.top_drivers: description: Identify top correlated drivers for a target metric/event (v1: correlation/feature importance; later: causality). risk: low executor: kind: http url: https://analytics-executor-REPLACE.a.run.app path: /execute/analytics/top_drivers inputSchema: type: object additionalProperties: false required: [range_days, target] properties: range_days: type: integer minimum: 1 maximum: 365 target: type: object additionalProperties: false required: [metric] properties: metric: type: string description: Named metric (e.g., "trial_to_paid", "activation_rate") or event-based metric. event_name: type: string description: Optional: if metric is event-based, supply event_name. candidate_features: type: array items: type: string description: Optional list of features/properties to consider. segment: type: object description: Optional segmentation. outputSchema: type: object additionalProperties: false required: [target, range_days, drivers] properties: target: type: object range_days: type: integer drivers: type: array items: type: object additionalProperties: false required: [name, score, direction, evidence] properties: name: type: string score: type: number direction: type: string enum: [positive, negative, mixed, unknown] evidence: type: string description: Human-readable summary of why this driver matters. confidence: type: number minimum: 0 maximum: 1

analytics.write_insight: description: Persist an insight object (BigQuery table + Firestore pointer + GCS artifact). Returns an insight_id. risk: medium executor: kind: http url: https://analytics-executor-REPLACE.a.run.app path: /execute/analytics/write_insight inputSchema: type: object additionalProperties: false required: [insight] properties: insight: type: object additionalProperties: false required: [type, title, summary, severity, confidence, window, recommendations] properties: type: type: string enum: [funnel_drop, anomaly, driver, experiment_result, general] title: type: string summary: type: string severity: type: string enum: [info, low, medium, high, critical] confidence: type: number minimum: 0 maximum: 1 window: type: object additionalProperties: false required: [range_days] properties: range_days: type: integer minimum: 1 maximum: 365 context: type: object description: Arbitrary structured context (metric names, segments, charts pointers). recommendations: type: array minItems: 1 items: type: object additionalProperties: false required: [action, rationale] properties: action: type: string rationale: type: string links: type: array items: type: object additionalProperties: false required: [label, url] properties: label: { type: string } url: { type: string } outputSchema: type: object additionalProperties: false required: [insight_id, stored] properties: insight_id: type: string stored: type: object additionalProperties: false required: [bigquery, firestore, gcs] properties: bigquery: type: object additionalProperties: false required: [dataset, table] properties: dataset: { type: string } table: { type: string } firestore: type: object additionalProperties: false required: [collection, doc_id] properties: collection: { type: string } doc_id: { type: string } gcs: type: object additionalProperties: false required: [bucket, prefix] properties: bucket: { type: string } prefix: { type: string }

----------------------------

MARKETING (GENERATION + PUBLISH)

----------------------------

marketing.generate_channel_posts: description: Generate platform-specific social posts from a campaign brief + brand profile. risk: low executor: kind: http url: https://marketing-executor-REPLACE.a.run.app path: /execute/marketing/generate_channel_posts inputSchema: type: object additionalProperties: false required: [brief, channels, brand_profile_id] properties: brand_profile_id: type: string description: Brand profile id to load (e.g., "default"). brief: type: object additionalProperties: false required: [goal, product, audience, key_points] properties: goal: type: string description: What outcome are we driving? (e.g., "trial signups") product: type: string audience: type: string key_points: type: array minItems: 1 items: { type: string } offer: type: string call_to_action: type: string landing_page_url: type: string channels: type: array minItems: 1 items: type: string enum: [x, linkedin, facebook, instagram, tiktok, youtube, pinterest, reddit, google_business, mastodon, bluesky, threads] variations_per_channel: type: integer minimum: 1 maximum: 10 default: 3 constraints: type: object additionalProperties: false properties: max_length: type: integer minimum: 50 maximum: 4000 emoji_level: type: string enum: [none, light, medium, heavy] default: light include_hashtags: type: boolean default: true outputSchema: type: object additionalProperties: false required: [channels] properties: channels: type: array items: type: object additionalProperties: false required: [channel, posts] properties: channel: type: string posts: type: array items: type: object additionalProperties: false required: [text] properties: text: { type: string } title: { type: string } alt_text: { type: string } hashtags: type: array items: { type: string } media_suggestions: type: array items: { type: string }

marketing.publish_missinglettr: description: Publish or schedule a campaign via Missinglettr using stored OAuth/token secrets. risk: medium executor: kind: http url: https://marketing-executor-REPLACE.a.run.app path: /execute/marketing/publish_missinglettr inputSchema: type: object additionalProperties: false required: [campaign, schedule] properties: campaign: type: object additionalProperties: false required: [name, posts] properties: name: type: string posts: type: array minItems: 1 items: type: object additionalProperties: false required: [channel, text] properties: channel: type: string enum: [x, linkedin, facebook, instagram, tiktok, youtube, pinterest, reddit, google_business, mastodon, bluesky, threads] text: type: string media_urls: type: array items: { type: string } link_url: type: string schedule: type: object additionalProperties: false required: [mode] properties: mode: type: string enum: [now, scheduled] start_time: type: string description: ISO timestamp required if mode=scheduled. timezone: type: string default: UTC idempotency_key: type: string description: Optional idempotency key to prevent duplicates. outputSchema: type: object additionalProperties: false required: [provider, campaign_id, status] properties: provider: type: string enum: [missinglettr] campaign_id: type: string status: type: string enum: [queued, scheduled, published, failed] provider_response: type: object description: Raw provider response (redacted as needed).