docs: add market research system summary and rename categories folder to market_data_assets

This commit is contained in:
2026-05-07 21:17:48 -07:00
parent 3d8b4f0a37
commit 933eb31fb1
8 changed files with 73288 additions and 0 deletions

View File

@@ -0,0 +1,40 @@
# Market Research & Data Co-op System Summary
> **Overview:** This document summarizes the "Business in a Box" market research pipeline built into the Vibn platform. It allows the AI to autonomously identify target markets, scrape leads, analyze competitor technology stacks, and pull SEO/Ad spend data to generate a complete Go-To-Market (GTM) strategy for users.
## 1. BigQuery Database Schema (`vibn_market_data`)
The data foundation is a highly scalable, relational model hosted in Google BigQuery (Montreal region for data residency):
* **`gbp_categories`**: 4,000+ Google Business Profile categories (e.g., `gcid:dentist`).
* **`software_categories`**: 800+ SMB-relevant software categories (e.g., `dental-practice-management`).
* **`gbp_software_links`**: A junction table linking Main Street business types to the software they buy (19,000+ mapped rows).
* **`market_leads`**: The "Data Co-op" table containing exact geospatial leads (name, address, phone, website, emails).
* **`software_providers`**: Proprietary SaaS competitors mapped to software categories (e.g., "Curve Dental").
* **`open_source_repos`**: MIT/Apache licensed GitHub starter kits mapped to software categories.
## 2. MCP Tools Added (`lib/ai/vibn-tools.ts`)
### `market_research_run`
* **Purpose:** Fetches a list of real-world business leads for a specific category and location.
* **Data Source:** DataForSEO Business Listings Live API.
* **Guardrails:**
* Requires explicit user permission (`user_explicitly_approved: true`).
* **Geospatial Caching:** Queries BigQuery using PostGIS (`ST_DWithin`) first. If leads exist within a 20km radius of the target coordinates, it serves them for $0.00 instead of hitting the paid API.
* **Data Co-op:** Any newly fetched leads are automatically `INSERT`ed into the BigQuery `market_leads` table.
### `tech_stack_analyze`
* **Purpose:** A free, native alternative to BuiltWith. Scans a list of URLs (up to 100) to determine what software, CMS, and tracking tools they use.
* **Intelligent Spidering:** Loads the homepage, extracts high-intent links (`/book`, `/contact`), and dynamically crawls depth-2 subpages to find hidden booking widgets or portals.
* **Dynamic Competitor Injection:** Reads the `software_category_id`, pulls all known competitors from BigQuery, and dynamically searches the target websites' source code for traces of those competitors.
* **Custom Checks:** Allows the AI to pass a `custom_checks` array of custom strings/domains to look for on the fly.
### `market_seo_analyze`
* **Purpose:** Analyzes a competitor's domain for SEO and Google Ads metrics.
* **Data Source:** DataForSEO Labs (Domain Metrics & Ranked Keywords APIs).
* **Output:** Returns estimated organic traffic, paid Google Ads traffic, estimated monthly Ad Spend (USD), and their top paid keywords.
## 3. The "Business in a Box" Workflow
When a founder asks to build software for a specific niche (e.g., "Dentists in BC"):
1. **TAM & Leads:** The AI runs `market_research_run` to get the Total Addressable Market and real contact info.
2. **Competitor Teardown:** The AI identifies incumbents and runs `market_seo_analyze` to see their Ad Spend.
3. **Wedge Discovery:** The AI runs `tech_stack_analyze` on the leads to find technological gaps (e.g., "70% use WordPress but lack a booking widget").
4. **Plan Generation:** The AI writes a business plan to the dashboard, including a financial model, compliance warnings, a wedge strategy, and cold-email scripts.