Files
vibn-agent-runner/docs/MARKET_RESEARCH_TOOLS_SUMMARY.md

48 lines
4.3 KiB
Markdown

# Market Research & Data Co-op System Summary
> **Overview:** This document summarizes the "Business in a Box" market research pipeline built into the Vibn platform. It allows the AI to autonomously identify target markets, scrape leads, analyze competitor technology stacks, and pull SEO/Ad spend data to generate a complete Go-To-Market (GTM) strategy for users.
## 1. BigQuery Database Schema (`vibn_market_data`)
The data foundation is a highly scalable, relational model hosted in Google BigQuery (Montreal region for data residency):
* **`gbp_categories`**: 4,000+ Google Business Profile categories (e.g., `gcid:dentist`).
* **`software_categories`**: 800+ SMB-relevant software categories (e.g., `dental-practice-management`).
* **`gbp_software_links`**: A junction table linking Main Street business types to the software they buy (19,000+ mapped rows).
* **`market_leads`**: The "Data Co-op" table containing exact geospatial leads (name, address, phone, website, emails).
* **`software_providers`**: Proprietary SaaS competitors mapped to software categories (e.g., "Curve Dental").
* **`open_source_repos`**: MIT/Apache licensed GitHub starter kits mapped to software categories.
## 2. MCP Tools Added (`lib/ai/vibn-tools.ts`)
### `market_research_run`
* **Purpose:** Fetches the exact Total Addressable Market (TAM) counts and extracts the raw lead data (emails, addresses, phones) for a specific category and location.
* **Data Source:** DataForSEO Business Listings Live API (using the `search/live` endpoint).
* **Quality Control:** Automatically applies strict filters (`is_claimed: true` and `current_status <> "closed_forever"`) to ensure only verified, active businesses are returned.
* **Guardrails:**
* Requires explicit user permission (`user_explicitly_approved: true`).
* **Geospatial Caching:** Queries BigQuery using PostGIS (`ST_DWithin`) first. If leads exist within a 20km radius of the target coordinates, it serves them for $0.00 instead of hitting the paid API.
* **Data Co-op:** Any newly fetched leads are automatically `INSERT`ed into the BigQuery `market_leads` table.
### `tech_stack_analyze`
* **Purpose:** A free, native alternative to BuiltWith. Scans a list of URLs (up to 100) to determine what software, CMS, and tracking tools they use.
* **Intelligent Spidering:** Loads the homepage, extracts high-intent links (`/book`, `/contact`), and dynamically crawls depth-2 subpages to find hidden booking widgets or portals.
* **Dynamic Competitor Injection:** Reads the `software_category_id`, pulls all known competitors from BigQuery, and dynamically searches the target websites' source code for traces of those competitors.
* **Custom Checks:** Allows the AI to pass a `custom_checks` array of custom strings/domains to look for on the fly.
### `market_aggregate_insights`
* **Purpose:** Fetches aggregated insights for a specific market niche to uncover qualitative data before building a product.
* **Data Source:** DataForSEO Categories Aggregation Live API.
* **Output:** Returns a breakdown of sub-niches (e.g., Pediatric vs Cosmetic), the total number of businesses with/without websites (technical debt indicator), and crucially, the **Top Customer Review Topics** (e.g., "receptionist", "price", "wait time"). The AI uses these pain points to write the Value Proposition and positioning strategy.
### `market_seo_analyze`
* **Purpose:** Analyzes a competitor's domain for SEO and Google Ads metrics.
* **Data Source:** DataForSEO Labs (Domain Metrics & Ranked Keywords APIs).
* **Output:** Returns estimated organic traffic, paid Google Ads traffic, estimated monthly Ad Spend (USD), and their top paid keywords.
## 3. The "Business in a Box" Workflow
When a founder asks to build software for a specific niche (e.g., "Dentists in BC"):
1. **TAM & Leads:** The AI runs `market_research_run` to get the Total Addressable Market and real contact info.
2. **Competitor Teardown:** The AI identifies incumbents and runs `market_seo_analyze` to see their Ad Spend.
3. **Wedge Discovery:** The AI runs `tech_stack_analyze` on the leads to find technological gaps (e.g., "70% use WordPress but lack a booking widget").
4. **Plan Generation:** The AI writes a business plan to the dashboard, including a financial model, compliance warnings, a wedge strategy, and cold-email scripts.