48 lines
4.3 KiB
Markdown
48 lines
4.3 KiB
Markdown
# Market Research & Data Co-op System Summary
|
|
|
|
> **Overview:** This document summarizes the "Business in a Box" market research pipeline built into the Vibn platform. It allows the AI to autonomously identify target markets, scrape leads, analyze competitor technology stacks, and pull SEO/Ad spend data to generate a complete Go-To-Market (GTM) strategy for users.
|
|
|
|
## 1. BigQuery Database Schema (`vibn_market_data`)
|
|
The data foundation is a highly scalable, relational model hosted in Google BigQuery (Montreal region for data residency):
|
|
* **`gbp_categories`**: 4,000+ Google Business Profile categories (e.g., `gcid:dentist`).
|
|
* **`software_categories`**: 800+ SMB-relevant software categories (e.g., `dental-practice-management`).
|
|
* **`gbp_software_links`**: A junction table linking Main Street business types to the software they buy (19,000+ mapped rows).
|
|
* **`market_leads`**: The "Data Co-op" table containing exact geospatial leads (name, address, phone, website, emails).
|
|
* **`software_providers`**: Proprietary SaaS competitors mapped to software categories (e.g., "Curve Dental").
|
|
* **`open_source_repos`**: MIT/Apache licensed GitHub starter kits mapped to software categories.
|
|
|
|
## 2. MCP Tools Added (`lib/ai/vibn-tools.ts`)
|
|
|
|
### `market_research_run`
|
|
* **Purpose:** Fetches the exact Total Addressable Market (TAM) counts and extracts the raw lead data (emails, addresses, phones) for a specific category and location.
|
|
* **Data Source:** DataForSEO Business Listings Live API (using the `search/live` endpoint).
|
|
* **Quality Control:** Automatically applies strict filters (`is_claimed: true` and `current_status <> "closed_forever"`) to ensure only verified, active businesses are returned.
|
|
* **Guardrails:**
|
|
* Requires explicit user permission (`user_explicitly_approved: true`).
|
|
* **Geospatial Caching:** Queries BigQuery using PostGIS (`ST_DWithin`) first. If leads exist within a 20km radius of the target coordinates, it serves them for $0.00 instead of hitting the paid API.
|
|
* **Data Co-op:** Any newly fetched leads are automatically `INSERT`ed into the BigQuery `market_leads` table.
|
|
|
|
### `tech_stack_analyze`
|
|
* **Purpose:** A free, native alternative to BuiltWith. Scans a list of URLs (up to 100) to determine what software, CMS, and tracking tools they use.
|
|
* **Intelligent Spidering:** Loads the homepage, extracts high-intent links (`/book`, `/contact`), and dynamically crawls depth-2 subpages to find hidden booking widgets or portals.
|
|
* **Dynamic Competitor Injection:** Reads the `software_category_id`, pulls all known competitors from BigQuery, and dynamically searches the target websites' source code for traces of those competitors.
|
|
* **Custom Checks:** Allows the AI to pass a `custom_checks` array of custom strings/domains to look for on the fly.
|
|
|
|
|
|
### `market_aggregate_insights`
|
|
* **Purpose:** Fetches aggregated insights for a specific market niche to uncover qualitative data before building a product.
|
|
* **Data Source:** DataForSEO Categories Aggregation Live API.
|
|
* **Output:** Returns a breakdown of sub-niches (e.g., Pediatric vs Cosmetic), the total number of businesses with/without websites (technical debt indicator), and crucially, the **Top Customer Review Topics** (e.g., "receptionist", "price", "wait time"). The AI uses these pain points to write the Value Proposition and positioning strategy.
|
|
|
|
### `market_seo_analyze`
|
|
* **Purpose:** Analyzes a competitor's domain for SEO and Google Ads metrics.
|
|
* **Data Source:** DataForSEO Labs (Domain Metrics & Ranked Keywords APIs).
|
|
* **Output:** Returns estimated organic traffic, paid Google Ads traffic, estimated monthly Ad Spend (USD), and their top paid keywords.
|
|
|
|
## 3. The "Business in a Box" Workflow
|
|
When a founder asks to build software for a specific niche (e.g., "Dentists in BC"):
|
|
1. **TAM & Leads:** The AI runs `market_research_run` to get the Total Addressable Market and real contact info.
|
|
2. **Competitor Teardown:** The AI identifies incumbents and runs `market_seo_analyze` to see their Ad Spend.
|
|
3. **Wedge Discovery:** The AI runs `tech_stack_analyze` on the leads to find technological gaps (e.g., "70% use WordPress but lack a booking widget").
|
|
4. **Plan Generation:** The AI writes a business plan to the dashboard, including a financial model, compliance warnings, a wedge strategy, and cold-email scripts.
|