Search Market and Technologies Sudong Chung Yahoo! Korea October 8, 2004
Table of Contents Search Advertisement Market Web Search Engines 1. How do Search Portals make money ? 2. Who are the key players ? 3. What are the technologies ? 4. What are the challenges ? Web Search Engines 1. Who are the key players ? 2. What are the technologies ? 3. What are the challenges ?
Revenue Sources of Portals Advertisement Revenue Paid Listings Graphic Ads (banner) User Fees E-mail, Personal Storage … etc. Other Services Game, Job Posting …
Advertisement Market Size Estimated Advertisement Market Size of year 2004 Overall US Advertisement market(including offline) : $ 476.7billion Online Advertisement market : $ 14.2billion Paid Listing Advertisement Market : $ 5.1billion Graphical Ads : $ 9.1billion
Paid Listing Products Paid Inclusion Search Keyword Marketing Pay to place listings in the search database (ex) Directory Inclusion, Web(page) Index Inclusion Search Keyword Marketing Pay to place listings among the search results Separately managed database from directory and web index (ex) Explicit or Implicit User search result
Paid Search Key Players Overture (a Yahoo company) Paid Search on Web Search/Local Search/Product/Travel Search Paid Search on Contents(News Articles and etc.) Google Paid Search on Web Search/Local Search/Product Search Paid Search on Contents(Email, News Articles and etc.) FindWhat LookSmart
Key Players’ Market Capital Estimated Based on 2003/7~2004/6 performance Market Capital and Listing Service Revenue (in Million Dollars)
Areas of Paid Search Market Web Search(webpage, product, travel…etc) Overture, Google, FindWhat, LookSmart Local Search (phone directory, local interests) Overture, Google, Verizon+FindWhat Content Match (News Sites, Community Sites) Overture(Content Match), Google(AdSense), Quigo(AdSonar) Contextual Match (E-mail) Google Behavioral Targeting Tacoda Systems, Claria, Revenue Science, Kanoodle
Paid Search Technologies Keyword Index Punctuation Symbol/Noise Word Removal Keyword Normalization Keyword Match/Search Exact : whole query should match ad keyword( or phrase) Phrase : sub-phrase of query should match ad keyword(phrase) Broad : query contains words in ad keyword(phrase) Expanded *: query is relevant to ad keyword(only available in Google) Keyword Extraction Come up with keywords relevant to the content (run-time) Generate relevant ad keyword by behavioral study
Hawaii tour, hawaii flight <expanded match> Keyword Match Types Ad Keyword: Hawaii Travel Hawaii tour, hawaii flight <expanded match> Travel waikiki hawaii <broad match> Maui hawaii travel <phrase match> Hawaii travel <Exact match>
Content Match Technology Latent Semantic Indexing http://javelina.cet.middlebury.edu/lsa/out/cover_page.htm Multi-Dimensional Scaling Concept Network
Challenges Relevancy Issues in Phrase, Broad, Expanded Matches Identifying meaningful unit of multiple words (ex) New York, Sony Theater, Tokyo University Homographs Same word but multiple meanings (ex) nail, blind Mood detection Computing Complexity
Web Search Engines(robot-based) Yahoo(Inktomi,Altavista,AlltheWeb) Google LookSmart(Wisenut) AskJeeves/Teoma Vivisimo
Search Market Share(per web site) * comScore Media Metrix (May, 2004)
Search Market Share(per provider) * comScore Media Metrix (May, 2004)
Key Metrics R : Relevancy C : Comprehensiveness F : Freshness Are the search results are relevant to the user query ? C : Comprehensiveness Are the web documents(that are likely to be searched) crawled and indexed ? F : Freshness Is the index fresh enough to be relevant when users visit the result ? P : Presentation Is the search result page presented okay for user to find what he/she is looking for ?
Modern Search Engine Requirements Indexing over 4billion documents Refresh the whole index less than 3 months Refresh some part of index everyday Serving over 10,000 searches per second Search 4billion documents less than half second Spam, Alias, Duplicate Docs, Deadlinks Char Encoding, Language Relevancy
Challenges Find good content documents Not only for the text information but also for the linkage information Find sweet spot of relevancy and computation complexity Spam Detection Refresh Scheme System/Network Management
Modern Search Engine Architecture Huge Crawler ( ~1000pgs/sec in average) Incremental Indexing Distributed Computing(Search) Operation Optimized Disk Access Solid Fail-over Mechanism Run-time Ranking calculation Proximity, Page Importance… etc. Tiered Architecture
New Technologies Allow User to look through more results easily Categorization Classification Help User to refine his/her search query Related Searches Personalization Input from User preference on Search Results and apply to the search results Sharing Personalized Information Likely there are other people who have found what you are looking for.