Files
master-ai/architecture.md
2026-01-21 15:35:57 -08:00

949 lines
32 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
1) Recommended reference architecture (Web SaaS-first, 1 product = 1 GCP project per env)
Project model
One product = one GCP project per environment
product-foo-dev
product-foo-staging
product-foo-prod
Optional “platform” projects (yours, not the customers):
productos-control-plane (your backend + tool registry + auth)
productos-observability (optional central dashboards / cross-product rollups)
productos-billing-export (optional BigQuery billing export aggregation)
High-level runtime pattern
IDE + Supervisor AI never touch DBs/services directly.
They call your Control Plane API, which routes to domain Executors (Cloud Run services) with least-privilege service accounts.
VSCodium IDE (Product OS UI) Supervisor AI (Vertex)
\ /
\ /
-----> Control Plane API ----
|
-------------------------------------------------
| | | | |
Deploy Exec Analytics Exec Firestore SQL Exec Marketing Exec
(Cloud Build (BigQuery jobs) Exec Exec (Missinglettr,
+ Cloud Run) (Company (Cloud email provider)
Brain) SQL)
Per-product (customer) project: “product-foo-prod”
Must-have services
Cloud Run: product services + executors (if you deploy executors into product project)
Cloud SQL (Postgres/MySQL): transactional app data
Firestore: config + “Company Brain” + style profiles + run metadata (if you keep metadata per product)
BigQuery: event warehouse + analytics datasets/views + experimentation tables
Pub/Sub: event bus for product events + tool events
Cloud Tasks / Workflows / Scheduler: durable automation + cron-based routines
Secret Manager: tokens, DB creds, OAuth secrets (never in code)
Logging/Monitoring/Trace: observability
Where to place executors
Simplest: executors live in the product project (tight coupling, simple data access)
More “platform”: executors live in your platform project, and access product resources cross-project (strong central control, but more IAM + org policy considerations)
For your “product per project” approach, I recommend:
Deploy executor can live in platform (deploy across projects)
Data executors (SQL/Firestore/BigQuery) often live in product project (least-cross-project permissions)
Data flows
Events: Product apps → Pub/Sub → BigQuery (raw + curated)
Causation/insights: Analytics Exec reads BigQuery → writes Insight Objects to:
BigQuery tables (truth)
GCS artifacts (reports)
Firestore (summary pointers for UI)
Marketing: Marketing Exec pulls Insight Objects + Company Brain → generates campaigns → publishes via Missinglettr/social APIs; stores outputs in GCS + metadata in Firestore
2) Service-by-service IAM roles matrix (least privilege template)
Identities (service accounts)
Youll typically have:
sa-control-plane (platform): routes tool calls, enforces policy, writes run metadata/artifacts
sa-deploy-executor (platform): triggers builds and deploys to Cloud Run in product projects
sa-analytics-executor (product): reads BigQuery + writes insights
sa-firestore-executor (product): reads/writes Company Brain + configs
sa-sql-executor (product): connects to Cloud SQL (plus DB user for SQL-level permissions)
sa-marketing-executor (platform or product): reads insights + calls Missinglettr/email providers; reads secrets
Where I say “product project”, apply it to each env project (dev/staging/prod).
IAM matrix (by service)
Service / Scope Principal Roles (suggested) Notes
Cloud Run (product) sa-deploy-executor roles/run.admin (or narrower), roles/iam.serviceAccountUser (only on the runtime SA), roles/run.invoker (optional) Deploy revisions. Narrow iam.serviceAccountUser to only the runtime SA used by the service being deployed.
Cloud Build (platform or product) sa-deploy-executor roles/cloudbuild.builds.editor (or builds.builder depending on workflow) Triggers builds. Many teams keep builds centralized in platform.
Artifact Registry sa-deploy-executor roles/artifactregistry.writer Push images. If per-product registries, scope accordingly.
Secret Manager (platform/product) sa-marketing-executor, sa-deploy-executor roles/secretmanager.secretAccessor Only for the specific secrets needed.
BigQuery dataset (product) sa-analytics-executor roles/bigquery.dataViewer + roles/bigquery.jobUser Dataset-level grants. Prefer views/curated datasets.
BigQuery dataset (product write) sa-analytics-executor roles/bigquery.dataEditor (only for insight tables dataset) Separate datasets: events_raw (read), events_curated (read), insights (write).
Firestore (product) sa-firestore-executor roles/datastore.user (or roles/datastore.viewer) Use viewer when possible; writer only for Brain/config updates.
Cloud SQL (product) sa-sql-executor roles/cloudsql.client IAM to connect; SQL permissions handled by DB user(s).
Pub/Sub (product) Producers roles/pubsub.publisher For product services emitting events.
Pub/Sub (product) Consumers/executors roles/pubsub.subscriber For analytics/executor ingestion.
Cloud Tasks (product/platform) sa-control-plane or orchestrator roles/cloudtasks.enqueuer + roles/cloudtasks.viewer If you queue tool runs or retries.
Workflows (product/platform) sa-control-plane roles/workflows.invoker For orchestrated multi-step automations.
Cloud Storage (GCS artifacts) sa-control-plane roles/storage.objectAdmin (bucket-level) Write run artifacts; consider objectCreator + separate delete policy if you want immutability.
Cloud Run executors (wherever hosted) sa-control-plane roles/run.invoker Control Plane calls executors over HTTP.
Strongly recommended scoping rules
Grant BigQuery roles at the dataset level, not project level.
Use separate datasets for raw, curated, and insights.
For Cloud SQL, enforce read-only DB users for most endpoints; create a separate writer user only when needed.
Keep a “high risk” policy that requires approval for:
pricing changes
billing actions
production destructive infra
legal/claim-heavy marketing copy
3) Agent tool catalog (seed tool registry mapped to GCP services)
This is a starter “tool universe” your Supervisor AI + IDE can call. Ive grouped by module and listed the backing GCP service.
A) Code module (build/test/deploy)
Tool name Purpose Executes in Backed by
repo.apply_patch Apply diff to repo (local or PR flow) Control Plane / Repo service (GitHub App or local workspace)
repo.open_pr Open PR with changes Control Plane GitHub App
build.run_tests Run unit tests Executor (local/offline or remote) Cloud Build / local runner
cloudrun.deploy_service Build + deploy service Deploy Exec Cloud Build + Cloud Run
cloudrun.rollback_service Roll back revision Deploy Exec Cloud Run
cloudrun.get_service_status Health, revisions, URL Deploy Exec Cloud Run
logs.tail Tail logs for service/run Observability Exec Cloud Logging
B) Marketing module (campaign creation + publishing)
Tool name Purpose Executes in Backed by
brand.get_profile Fetch voice/style/claims Firestore Exec Firestore
brand.update_profile Update voice/style rules Firestore Exec Firestore
marketing.generate_campaign_plan Create campaign plan from insight/product update Marketing Exec Vertex AI (Gemini)
marketing.generate_channel_posts Generate platform-specific posts Marketing Exec Vertex AI (Gemini)
marketing.publish_missinglettr Schedule/publish via Missinglettr Marketing Exec Missinglettr API + Secret Manager
marketing.publish_email Send email campaign Marketing Exec Email provider (SendGrid/etc) + Secret Manager
marketing.store_assets Save creatives/outputs Marketing Exec GCS
marketing.get_campaign_status Poll publish status Marketing Exec Missinglettr / provider APIs
C) Analytics module (events, funnels, causation)
Tool name Purpose Executes in Backed by
events.ingest Ingest events (if you own ingestion endpoint) Analytics/Ingress Exec Pub/Sub + BigQuery
analytics.funnel_summary Funnel metrics Analytics Exec BigQuery
analytics.cohort_retention Retention cohorts Analytics Exec BigQuery
analytics.anomaly_detect Detect anomalies in KPIs Analytics Exec BigQuery / BQML
analytics.top_drivers Feature/sequence drivers Analytics Exec BigQuery / BQML / Vertex
analytics.causal_uplift Uplift/causal impact estimate Analytics Exec BigQuery + Vertex (optional)
analytics.write_insight Persist insight object Analytics Exec BigQuery + Firestore pointer + GCS artifact
D) Growth module (onboarding + lifecycle optimization)
Tool name Purpose Executes in Backed by
growth.identify_dropoffs Identify where users drop Analytics Exec BigQuery
growth.propose_experiment Generate experiment hypothesis/design Growth Exec Gemini + policies
experiments.create Create experiment definition Experiments Exec Firestore/SQL + your assignment service
experiments.evaluate Evaluate results Analytics/Experiments Exec BigQuery
growth.generate_lifecycle_messages Draft onboarding/lifecycle content Marketing/Growth Exec Gemini
E) Support module (feedback + ticket assist)
Tool name Purpose Executes in Backed by
support.ingest_tickets Pull tickets from provider Support Exec Zendesk/Intercom API
support.summarize_ticket Summarize and classify Support Exec Gemini
support.draft_reply Draft response Support Exec Gemini + brand profile
support.update_kb Generate/update KB article Support Exec CMS/Docs + GCS
support.escalate_issue Create issue/task Support Exec GitHub Issues/Jira/etc
F) Infrastructure module (safe, templated ops only)
Tool name Purpose Executes in Backed by
infra.provision_service_template Create a Cloud Run service template Infra Exec Terraform/Cloud APIs
infra.provision_database Create Cloud SQL/Firestore config Infra Exec Cloud SQL / Firestore
infra.provision_pubsub Topics/subscriptions Infra Exec Pub/Sub
infra.rotate_secret Rotate/refresh secrets Infra Exec Secret Manager
infra.cost_report Cost summary for product Analytics/FinOps Exec Billing export → BigQuery
G) Auth module (product auth + internal access)
Tool name Purpose Executes in Backed by
auth.configure_identity_platform Set up end-user auth Auth/Infra Exec Identity Platform
auth.configure_iap Protect internal tools Infra Exec IAP
auth.create_oauth_client Create OAuth creds for integrations Infra Exec Google OAuth / Secret Manager
Recommended “v1 tool registry seed” (small, shippable)
If you want a tight first release, seed only these tools:
Code
cloudrun.deploy_service
cloudrun.get_service_status
logs.tail
Company Brain
brand.get_profile
brand.update_profile
Analytics
analytics.funnel_summary
analytics.top_drivers
analytics.write_insight
Marketing
marketing.generate_channel_posts
marketing.publish_missinglettr
Thats enough to demonstrate the full Product OS loop:
deploy → events → insight → campaign → publish → measure.
If you want, I can also output this as an actual tool-registry.yaml starter file (matching the schema we drafted earlier) with concrete JSON Schemas for each of the v1 tools.
yes gor for it
Heres a starter tool-registry.yaml you can drop into contracts/tool-registry.yaml. It matches the schema we defined earlier and includes concrete JSON Schemas for the v1 seed tools:
cloudrun.deploy_service
cloudrun.get_service_status
logs.tail
brand.get_profile
brand.update_profile
analytics.funnel_summary
analytics.top_drivers
analytics.write_insight
marketing.generate_channel_posts
marketing.publish_missinglettr
Replace the executor.url placeholders with your actual Cloud Run service URLs.
version: 1
tools:
# ----------------------------
# CODE / DEPLOYMENT
# ----------------------------
cloudrun.deploy_service:
description: Build and deploy a Cloud Run service using Cloud Build. Returns the service URL and deployed revision.
risk: medium
executor:
kind: http
url: https://deploy-executor-REPLACE.a.run.app
path: /execute/cloudrun/deploy
inputSchema:
type: object
additionalProperties: false
required: [service_name, repo, ref, env, region]
properties:
service_name:
type: string
minLength: 1
description: Cloud Run service name.
repo:
type: string
minLength: 1
description: Git repo URL (HTTPS).
ref:
type: string
minLength: 1
description: Git ref (branch/tag/SHA).
env:
type: string
enum: [dev, staging, prod]
region:
type: string
minLength: 1
description: GCP region for the Cloud Run service (e.g., us-central1).
build:
type: object
additionalProperties: false
properties:
dockerfile_path:
type: string
default: Dockerfile
build_context:
type: string
default: .
env_vars:
type: object
additionalProperties:
type: string
description: Environment variables to set during build/deploy (non-secret).
deploy:
type: object
additionalProperties: false
properties:
cpu:
type: string
description: Cloud Run CPU (e.g., "1", "2").
memory:
type: string
description: Cloud Run memory (e.g., "512Mi", "1Gi").
min_instances:
type: integer
minimum: 0
max_instances:
type: integer
minimum: 1
concurrency:
type: integer
minimum: 1
timeout_seconds:
type: integer
minimum: 1
maximum: 3600
service_account_email:
type: string
description: Runtime service account email for the Cloud Run service.
allow_unauthenticated:
type: boolean
default: false
outputSchema:
type: object
additionalProperties: false
required: [service_url, revision]
properties:
service_url:
type: string
revision:
type: string
build_id:
type: string
warnings:
type: array
items:
type: string
cloudrun.get_service_status:
description: Fetch Cloud Run service status including latest revision and URL.
risk: low
executor:
kind: http
url: https://deploy-executor-REPLACE.a.run.app
path: /execute/cloudrun/status
inputSchema:
type: object
additionalProperties: false
required: [service_name, region]
properties:
service_name:
type: string
minLength: 1
region:
type: string
minLength: 1
outputSchema:
type: object
additionalProperties: false
required: [service_name, region, service_url, latest_ready_revision, status]
properties:
service_name:
type: string
region:
type: string
service_url:
type: string
latest_ready_revision:
type: string
status:
type: string
enum: [ready, deploying, error, unknown]
last_deploy_time:
type: string
description: ISO timestamp if available.
logs.tail:
description: Tail recent logs for a Cloud Run service or for a specific run_id. Returns log lines (best-effort).
risk: low
executor:
kind: http
url: https://observability-executor-REPLACE.a.run.app
path: /execute/logs/tail
inputSchema:
type: object
additionalProperties: false
required: [scope, limit]
properties:
scope:
type: string
enum: [service, run]
description: Tail logs by service or by tool run.
service_name:
type: string
description: Required if scope=service.
region:
type: string
description: Optional when scope=service, depending on your log query strategy.
run_id:
type: string
description: Required if scope=run.
limit:
type: integer
minimum: 1
maximum: 2000
default: 200
since_seconds:
type: integer
minimum: 1
maximum: 86400
default: 900
outputSchema:
type: object
additionalProperties: false
required: [lines]
properties:
lines:
type: array
items:
type: object
additionalProperties: false
required: [timestamp, text]
properties:
timestamp:
type: string
severity:
type: string
text:
type: string
# ----------------------------
# COMPANY BRAIN (BRAND + STYLE)
# ----------------------------
brand.get_profile:
description: Retrieve the tenant's brand profile (voice, tone, positioning, compliance constraints).
risk: low
executor:
kind: http
url: https://firestore-executor-REPLACE.a.run.app
path: /execute/brand/get_profile
inputSchema:
type: object
additionalProperties: false
required: [profile_id]
properties:
profile_id:
type: string
minLength: 1
description: Brand profile identifier (e.g., "default").
outputSchema:
type: object
additionalProperties: false
required: [profile_id, brand]
properties:
profile_id:
type: string
brand:
type: object
additionalProperties: false
required: [name, voice, audience, claims_policy]
properties:
name:
type: string
voice:
type: object
additionalProperties: false
required: [tone, style_notes, do, dont]
properties:
tone:
type: array
items: { type: string }
style_notes:
type: array
items: { type: string }
do:
type: array
items: { type: string }
dont:
type: array
items: { type: string }
audience:
type: object
additionalProperties: false
properties:
primary:
type: string
secondary:
type: string
claims_policy:
type: object
additionalProperties: false
properties:
forbidden_claims:
type: array
items: { type: string }
required_disclaimers:
type: array
items: { type: string }
compliance_notes:
type: array
items: { type: string }
brand.update_profile:
description: Update the tenant's brand profile. Write operations should be validated and audited.
risk: medium
executor:
kind: http
url: https://firestore-executor-REPLACE.a.run.app
path: /execute/brand/update_profile
inputSchema:
type: object
additionalProperties: false
required: [profile_id, patch]
properties:
profile_id:
type: string
minLength: 1
patch:
type: object
description: Partial update object; executor must validate allowed fields.
outputSchema:
type: object
additionalProperties: false
required: [ok, updated_at]
properties:
ok:
type: boolean
updated_at:
type: string
# ----------------------------
# ANALYTICS / CAUSATION (V1 metrics + drivers)
# ----------------------------
analytics.funnel_summary:
description: Return funnel metrics for a time window. Uses curated events in BigQuery.
risk: low
executor:
kind: http
url: https://analytics-executor-REPLACE.a.run.app
path: /execute/analytics/funnel_summary
inputSchema:
type: object
additionalProperties: false
required: [range_days, funnel]
properties:
range_days:
type: integer
minimum: 1
maximum: 365
funnel:
type: object
additionalProperties: false
required: [name, steps]
properties:
name:
type: string
steps:
type: array
minItems: 2
items:
type: object
additionalProperties: false
required: [event_name]
properties:
event_name:
type: string
filter:
type: object
description: Optional event property filters (executor-defined).
segment:
type: object
description: Optional segment definition (executor-defined).
outputSchema:
type: object
additionalProperties: false
required: [funnel_name, range_days, steps]
properties:
funnel_name:
type: string
range_days:
type: integer
steps:
type: array
items:
type: object
additionalProperties: false
required: [event_name, users, conversion_from_prev]
properties:
event_name:
type: string
users:
type: integer
minimum: 0
conversion_from_prev:
type: number
minimum: 0
maximum: 1
analytics.top_drivers:
description: Identify top correlated drivers for a target metric/event (v1: correlation/feature importance; later: causality).
risk: low
executor:
kind: http
url: https://analytics-executor-REPLACE.a.run.app
path: /execute/analytics/top_drivers
inputSchema:
type: object
additionalProperties: false
required: [range_days, target]
properties:
range_days:
type: integer
minimum: 1
maximum: 365
target:
type: object
additionalProperties: false
required: [metric]
properties:
metric:
type: string
description: Named metric (e.g., "trial_to_paid", "activation_rate") or event-based metric.
event_name:
type: string
description: Optional: if metric is event-based, supply event_name.
candidate_features:
type: array
items:
type: string
description: Optional list of features/properties to consider.
segment:
type: object
description: Optional segmentation.
outputSchema:
type: object
additionalProperties: false
required: [target, range_days, drivers]
properties:
target:
type: object
range_days:
type: integer
drivers:
type: array
items:
type: object
additionalProperties: false
required: [name, score, direction, evidence]
properties:
name:
type: string
score:
type: number
direction:
type: string
enum: [positive, negative, mixed, unknown]
evidence:
type: string
description: Human-readable summary of why this driver matters.
confidence:
type: number
minimum: 0
maximum: 1
analytics.write_insight:
description: Persist an insight object (BigQuery table + Firestore pointer + GCS artifact). Returns an insight_id.
risk: medium
executor:
kind: http
url: https://analytics-executor-REPLACE.a.run.app
path: /execute/analytics/write_insight
inputSchema:
type: object
additionalProperties: false
required: [insight]
properties:
insight:
type: object
additionalProperties: false
required: [type, title, summary, severity, confidence, window, recommendations]
properties:
type:
type: string
enum: [funnel_drop, anomaly, driver, experiment_result, general]
title:
type: string
summary:
type: string
severity:
type: string
enum: [info, low, medium, high, critical]
confidence:
type: number
minimum: 0
maximum: 1
window:
type: object
additionalProperties: false
required: [range_days]
properties:
range_days:
type: integer
minimum: 1
maximum: 365
context:
type: object
description: Arbitrary structured context (metric names, segments, charts pointers).
recommendations:
type: array
minItems: 1
items:
type: object
additionalProperties: false
required: [action, rationale]
properties:
action:
type: string
rationale:
type: string
links:
type: array
items:
type: object
additionalProperties: false
required: [label, url]
properties:
label: { type: string }
url: { type: string }
outputSchema:
type: object
additionalProperties: false
required: [insight_id, stored]
properties:
insight_id:
type: string
stored:
type: object
additionalProperties: false
required: [bigquery, firestore, gcs]
properties:
bigquery:
type: object
additionalProperties: false
required: [dataset, table]
properties:
dataset: { type: string }
table: { type: string }
firestore:
type: object
additionalProperties: false
required: [collection, doc_id]
properties:
collection: { type: string }
doc_id: { type: string }
gcs:
type: object
additionalProperties: false
required: [bucket, prefix]
properties:
bucket: { type: string }
prefix: { type: string }
# ----------------------------
# MARKETING (GENERATION + PUBLISH)
# ----------------------------
marketing.generate_channel_posts:
description: Generate platform-specific social posts from a campaign brief + brand profile.
risk: low
executor:
kind: http
url: https://marketing-executor-REPLACE.a.run.app
path: /execute/marketing/generate_channel_posts
inputSchema:
type: object
additionalProperties: false
required: [brief, channels, brand_profile_id]
properties:
brand_profile_id:
type: string
description: Brand profile id to load (e.g., "default").
brief:
type: object
additionalProperties: false
required: [goal, product, audience, key_points]
properties:
goal:
type: string
description: What outcome are we driving? (e.g., "trial signups")
product:
type: string
audience:
type: string
key_points:
type: array
minItems: 1
items: { type: string }
offer:
type: string
call_to_action:
type: string
landing_page_url:
type: string
channels:
type: array
minItems: 1
items:
type: string
enum: [x, linkedin, facebook, instagram, tiktok, youtube, pinterest, reddit, google_business, mastodon, bluesky, threads]
variations_per_channel:
type: integer
minimum: 1
maximum: 10
default: 3
constraints:
type: object
additionalProperties: false
properties:
max_length:
type: integer
minimum: 50
maximum: 4000
emoji_level:
type: string
enum: [none, light, medium, heavy]
default: light
include_hashtags:
type: boolean
default: true
outputSchema:
type: object
additionalProperties: false
required: [channels]
properties:
channels:
type: array
items:
type: object
additionalProperties: false
required: [channel, posts]
properties:
channel:
type: string
posts:
type: array
items:
type: object
additionalProperties: false
required: [text]
properties:
text: { type: string }
title: { type: string }
alt_text: { type: string }
hashtags:
type: array
items: { type: string }
media_suggestions:
type: array
items: { type: string }
marketing.publish_missinglettr:
description: Publish or schedule a campaign via Missinglettr using stored OAuth/token secrets.
risk: medium
executor:
kind: http
url: https://marketing-executor-REPLACE.a.run.app
path: /execute/marketing/publish_missinglettr
inputSchema:
type: object
additionalProperties: false
required: [campaign, schedule]
properties:
campaign:
type: object
additionalProperties: false
required: [name, posts]
properties:
name:
type: string
posts:
type: array
minItems: 1
items:
type: object
additionalProperties: false
required: [channel, text]
properties:
channel:
type: string
enum: [x, linkedin, facebook, instagram, tiktok, youtube, pinterest, reddit, google_business, mastodon, bluesky, threads]
text:
type: string
media_urls:
type: array
items: { type: string }
link_url:
type: string
schedule:
type: object
additionalProperties: false
required: [mode]
properties:
mode:
type: string
enum: [now, scheduled]
start_time:
type: string
description: ISO timestamp required if mode=scheduled.
timezone:
type: string
default: UTC
idempotency_key:
type: string
description: Optional idempotency key to prevent duplicates.
outputSchema:
type: object
additionalProperties: false
required: [provider, campaign_id, status]
properties:
provider:
type: string
enum: [missinglettr]
campaign_id:
type: string
status:
type: string
enum: [queued, scheduled, published, failed]
provider_response:
type: object
description: Raw provider response (redacted as needed).