Startup Categorization and Similarity Data and prizes courtesy of
Algorithmic Venture Capital Rocketship VC makes Venture Capital investments based on data and algorithms Team from Stanford, Amazon, NASA
Startup Categorization As a data-driven VC, Rocketship identifies about 1M new companies every month! A key step is to understand the nature of the company’s business Problem: Classify these startups into a set of 74 categories
Examples Company_Name Company_Domain Description Category Lendable lendable.co.uk Provider of peer-to-peer lending platform. The company provides lending platform that makes borrowing money easy unlike banks, who hand out loans from piles of cash they look after on behalf of savers. Alternative Lending OMVeterinary omveterinary.com Provider and developer of regenerative medicine techniques. The company provides joint preservation and wound care treatment. Biotech & Pharmaceuticals
Data Available for Categorization Manually Curated Set: 5000 Companies with names, domains, descriptions and categories Full data set: 500k companies with names, domains and descriptions Rocketship will create a test set from the full set of companies to help with evaluation
Categorization Baseline Performance of our current (simple) algorithm: Category prediction accuracy: 60% Top 3 accuracy: 78%
Startup Similarity Given two companies, how similar are they in terms of their customer value proposition? Direct Competitors (high similarity) Indirect Competitors (medium similarity) Unrelated (no similarity) Use case: given a target company, find the K most similar companies Ignore geography
Example Name Domain Description Competitor Level Zillow Zillow.com Zillow is an online real estate marketplace for finding and sharing information about homes, real estate, and mortgages. Target company Redfin Redfin.com Redfin provides real estate search and brokerage services through a combination of real estate web platforms and access to live agents. Direct Competitor Openlistings Openlistings.com Open Listings helps anyone buy a home without an individual real estate agent. Buyers use their self-service platform to manage the process online Indirect Competitor Moovo Moovo.com Movvo is a Behavioural Analytics and Location Based Marketing Company, with a focus on retail and Retail Real Estates Irrelevant
Training and Test Data Company data 500k companies with names, domains and descriptions 77k pairs of companies with some level of similarity E.g., from Crunchbase Manually Rated Set 5000 company pairs with name, domain and descriptions, rated as Direct Competitors, Indirect Competitors or Irrelevant
For more info on the dataset Email me! Anand Rajaraman anand@cs.stanford.edu