This repository has been archived on 2026-06-07. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
master-ai/docs/MARKET_RESEARCH_TOOLS_SUMMARY.md

4.3 KiB

Market Research & Data Co-op System Summary

Overview: This document summarizes the "Business in a Box" market research pipeline built into the Vibn platform. It allows the AI to autonomously identify target markets, scrape leads, analyze competitor technology stacks, and pull SEO/Ad spend data to generate a complete Go-To-Market (GTM) strategy for users.

1. BigQuery Database Schema (vibn_market_data)

The data foundation is a highly scalable, relational model hosted in Google BigQuery (Montreal region for data residency):

  • gbp_categories: 4,000+ Google Business Profile categories (e.g., gcid:dentist).
  • software_categories: 800+ SMB-relevant software categories (e.g., dental-practice-management).
  • gbp_software_links: A junction table linking Main Street business types to the software they buy (19,000+ mapped rows).
  • market_leads: The "Data Co-op" table containing exact geospatial leads (name, address, phone, website, emails).
  • software_providers: Proprietary SaaS competitors mapped to software categories (e.g., "Curve Dental").
  • open_source_repos: MIT/Apache licensed GitHub starter kits mapped to software categories.

2. MCP Tools Added (lib/ai/vibn-tools.ts)

market_research_run

  • Purpose: Fetches the exact Total Addressable Market (TAM) counts and extracts the raw lead data (emails, addresses, phones) for a specific category and location.
  • Data Source: DataForSEO Business Listings Live API (using the search/live endpoint).
  • Quality Control: Automatically applies strict filters (is_claimed: true and current_status <> "closed_forever") to ensure only verified, active businesses are returned.
  • Guardrails:
    • Requires explicit user permission (user_explicitly_approved: true).
    • Geospatial Caching: Queries BigQuery using PostGIS (ST_DWithin) first. If leads exist within a 20km radius of the target coordinates, it serves them for $0.00 instead of hitting the paid API.
  • Data Co-op: Any newly fetched leads are automatically INSERTed into the BigQuery market_leads table.

tech_stack_analyze

  • Purpose: A free, native alternative to BuiltWith. Scans a list of URLs (up to 100) to determine what software, CMS, and tracking tools they use.
  • Intelligent Spidering: Loads the homepage, extracts high-intent links (/book, /contact), and dynamically crawls depth-2 subpages to find hidden booking widgets or portals.
  • Dynamic Competitor Injection: Reads the software_category_id, pulls all known competitors from BigQuery, and dynamically searches the target websites' source code for traces of those competitors.
  • Custom Checks: Allows the AI to pass a custom_checks array of custom strings/domains to look for on the fly.

market_aggregate_insights

  • Purpose: Fetches aggregated insights for a specific market niche to uncover qualitative data before building a product.
  • Data Source: DataForSEO Categories Aggregation Live API.
  • Output: Returns a breakdown of sub-niches (e.g., Pediatric vs Cosmetic), the total number of businesses with/without websites (technical debt indicator), and crucially, the Top Customer Review Topics (e.g., "receptionist", "price", "wait time"). The AI uses these pain points to write the Value Proposition and positioning strategy.

market_seo_analyze

  • Purpose: Analyzes a competitor's domain for SEO and Google Ads metrics.
  • Data Source: DataForSEO Labs (Domain Metrics & Ranked Keywords APIs).
  • Output: Returns estimated organic traffic, paid Google Ads traffic, estimated monthly Ad Spend (USD), and their top paid keywords.

3. The "Business in a Box" Workflow

When a founder asks to build software for a specific niche (e.g., "Dentists in BC"):

  1. TAM & Leads: The AI runs market_research_run to get the Total Addressable Market and real contact info.
  2. Competitor Teardown: The AI identifies incumbents and runs market_seo_analyze to see their Ad Spend.
  3. Wedge Discovery: The AI runs tech_stack_analyze on the leads to find technological gaps (e.g., "70% use WordPress but lack a booking widget").
  4. Plan Generation: The AI writes a business plan to the dashboard, including a financial model, compliance warnings, a wedge strategy, and cold-email scripts.