Online Expansion of Rare Queries for Sponsored Search Defended by Mykell Miller.

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval
Advertisements

Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Catching the Drift: Learning Broad Matches from Clickthrough Data Sonal Gupta, Mikhail Bilenko, Matthew Richardson University of Texas at Austin, Microsoft.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
Evaluating Search Engine
Information Retrieval in Practice
ONLINE EXPANSION OF RARE QUERIES FOR SPONSORED SEARCH attack Chih-Hung Wu.
Search Engines and Information Retrieval
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
1 Statistical correlation analysis in image retrieval Reporter : Erica Li 2004/9/30.
INFO 624 Week 3 Retrieval System Evaluation
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Information Retrieval
A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.
Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Search Engines and Information Retrieval Chapter 1.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
Interpreting Advertiser Intent in Sponsored Search BHANU C VATTIKONDA, SANTHOSH KODIPAKA, HONGYAN ZHOU, VACHA DAVE, SAIKAT GUHA, ALEX C SNOEREN 1.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Online Advertising Greg Lackey. Advertising Life Cycle The Past Mass media Current Media fragmentation The Future Target market Audio/visual enhancements.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Lecture 4 Title: Search Engines By: Mr Hashem Alaidaros MKT 445.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Section 4 & 5 Review Google Adwords.  Contextual Targeting.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Author : Stamatina Thomaidou, Konstantinos Leymonis, and Michalis Vazirgiannis.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Post-Ranking query suggestion by diversifying search Chao Wang.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
User-Friendly Systems Instead of User-Friendly Front-Ends Present user interfaces are not accepted because the underlying systems are too difficult to.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10 Evaluation.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
Text Similarity: an Alternative Way to Search MEDLINE James Lewis, Stephan Ossowski, Justin Hicks, Mounir Errami and Harold R. Garner Translational Research.
Information Retrieval in Practice
Sampath Jayarathna Cal Poly Pomona
Linguistic Graph Similarity for News Sentence Searching
Information Retrieval in Practice
Web News Sentence Searching Using Linguistic Graph Similarity
Information Retrieval and Web Search
Lecture 10 Evaluation.
Multimedia Information Retrieval
Lecture 6 Evaluation.
INF 141: Information Retrieval
Information Retrieval and Web Design
Presentation transcript:

Online Expansion of Rare Queries for Sponsored Search Defended by Mykell Miller

Summary: The Short Version This paper describes and evaluates a method of determining which ads to display on a search engine result page. Users input varied queries, so it is beneficial to post ads pertaining to not only the query, but to related queries as well. However, previous methods of finding these related queries and transforming them into ads takes a long time, and therefore are done offline. This paper describes a method that allows some of the work to be done on the fly without too much overhead.

Why it’s good: The Short Version Useful Useful Ads fund search enginesAds fund search engines If ads were more relevant, Jared might actually click on themIf ads were more relevant, Jared might actually click on them The method shows statistically significant improvement in making ads more relevant, at a low overheadThe method shows statistically significant improvement in making ads more relevant, at a low overhead Interesting Interesting Interestingness is subjective, but this is MY defenseInterestingness is subjective, but this is MY defense Well-written Well-written Well-organizedWell-organized I could actually understand the math because they very clearly told me what all the variables meantI could actually understand the math because they very clearly told me what all the variables meant They defined all the relevant terms and summarized all the references so I didn’t have to read 32 other papers.They defined all the relevant terms and summarized all the references so I didn’t have to read 32 other papers. Time Travel Time Travel This paper is only three weeks oldThis paper is only three weeks old A paper that was published in April cited itA paper that was published in April cited it

Now for the long version…

What this paper is about Broad matching is where an ad is displayed when its bid phrase is similar to, but not exactly, the query the user inputted. Broad matching is where an ad is displayed when its bid phrase is similar to, but not exactly, the query the user inputted.

What this paper is about Sponsored Search Sponsored Search A.K.A. Paid search advertisingA.K.A. Paid search advertising On Search Engine Result PagesOn Search Engine Result Pages All major web search engines do thisAll major web search engines do this Context Match Context Match A.K.A. Contextual AdvertisingA.K.A. Contextual Advertising On other websitesOn other websites What we looked at last WednesdayWhat we looked at last Wednesday

More on Sponsored Search The authors assume a pay-per-click model The authors assume a pay-per-click model Google, Yahoo, and Microsoft all use this modelGoogle, Yahoo, and Microsoft all use this model Bid Phrases Bid Phrases This is the query that will result in showing this ad.This is the query that will result in showing this ad. Bidding system Bidding system An advertiser pays the search company whatever it wants to associate its ad with a bid phraseAn advertiser pays the search company whatever it wants to associate its ad with a bid phrase If an advertiser pays more, its ad gets a higher ranking.If an advertiser pays more, its ad gets a higher ranking. Example: Example: High Bidders pays $1,000,000,000,000,000,000,000 for the bid phrase “Dummy Query”High Bidders pays $1,000,000,000,000,000,000,000 for the bid phrase “Dummy Query” Low Bidders pays $1 for the bid phrase “Dummy Query”Low Bidders pays $1 for the bid phrase “Dummy Query” When I search for “Dummy Query” I see High Bidders’ ad first, then Low Bidders’ ad.When I search for “Dummy Query” I see High Bidders’ ad first, then Low Bidders’ ad.

More on Sponsored Search System An Advertiser An Account An Ad Campaign An Ad Group Creative Bid Phrases More Ad Groups More Ad Campaigns More Accounts Other Advertisers

Why Do This Paper? 30-40% of search engine result pages have no ads on them because Google, Yahoo, etc. don’t know what queries are similar to the bid phrase 30-40% of search engine result pages have no ads on them because Google, Yahoo, etc. don’t know what queries are similar to the bid phrase Previous work has developed systems that are far too inefficient to use in real life Previous work has developed systems that are far too inefficient to use in real life

My Own Experiment Query: Banana Bread Query: Nut-Free Banana Bread Query: Nut-Free Banana Bread Query: Vegan Banana Bread Query: Vegan Banana Bread

Why do tail queries have so few ads? They are often harder to interpret than more common (head and torso) queries They are often harder to interpret than more common (head and torso) queries There are rarely exact matches for bid queries There are rarely exact matches for bid queries There is little historical click data There is little historical click data Search engines don’t like posting irrelevant ads Search engines don’t like posting irrelevant ads

What does this paper accomplish? Online query expansion for tail queries Online query expansion for tail queries New way to index query expansions for fast computation of query similarity New way to index query expansions for fast computation of query similarity A way to go from pre-expanded queries to expanding related queries on the fly A way to go from pre-expanded queries to expanding related queries on the fly A ranking and scoring method A ranking and scoring method

The Architecture of their system

Query Feature Extraction Unigrams Unigrams Process them viaProcess them via Stemming Stemming Taking words like “Extraction” and “Extracting” and stemming them to “Extract”Taking words like “Extraction” and “Extracting” and stemming them to “Extract” Stop words Stop words Ignoring words you don’t likeIgnoring words you don’t like Phrases Phrases Multi-word phrases are from a dictionary of ~10 million phrases gathered from query logs and web pagesMulti-word phrases are from a dictionary of ~10 million phrases gathered from query logs and web pages Semantic Classes Semantic Classes Developed a hierarchical taxonomy of 6000 semantic classesDeveloped a hierarchical taxonomy of 6000 semantic classes Annotate each query with the 5 most likely semantic classesAnnotate each query with the 5 most likely semantic classes

Related Query Retrieval Now we have a pseudo-query made up of features. Now we have a pseudo-query made up of features. Compare this pseudo-query to our inverted index and pull out related pseudo-queries Compare this pseudo-query to our inverted index and pull out related pseudo-queries Runs a system that pulls out key words then calculates the similarity using a dot product Runs a system that pulls out key words then calculates the similarity using a dot product

Query Expansion Q* is the set of features describing the original features and related queries Q* is the set of features describing the original features and related queries The weight of a given feature in Q* is a linear combination of its weight in the original and related queries The weight of a given feature in Q* is a linear combination of its weight in the original and related queries This expansion is efficient because you’re only looking at the features in related queries This expansion is efficient because you’re only looking at the features in related queries

Ad Feature Weighting Extract the same features from the bid phrases of ad groups as from queries (unigrams, phrases, semantic classes) Extract the same features from the bid phrases of ad groups as from queries (unigrams, phrases, semantic classes) Since the weighting from the queries would unfairly benefit short ad groups, use the BM25 weighting scheme. Since the weighting from the queries would unfairly benefit short ad groups, use the BM25 weighting scheme.

Title Match Boosting Increases the score of ads whose titles match the original query very well Increases the score of ads whose titles match the original query very well

Scoring Function The end result of all this The end result of all this A weighted sum of dot products between features and the title match boost A weighted sum of dot products between features and the title match boost

Now on to the results!

Test Set Test set: 400 random rare queries from Yahoo Test set: 400 random rare queries from Yahoo 121 were in the lookup table, 279 were not121 were in the lookup table, 279 were not Eliminated the 10% of rare queries that were foreignEliminated the 10% of rare queries that were foreign Human editors judged the top 3 ads. Human editors judged the top 3 ads judgments3556 judgments The system was built off of every ad Yahoo has and 100 million queries based off of U.S. Yahoo The system was built off of every ad Yahoo has and 100 million queries based off of U.S. Yahoo

Metrics Discounted Cumulative Gain (DCG) Discounted Cumulative Gain (DCG) “a measure of effectiveness of a Web search engine algorithm or related applications, often used in information retrieval. Using a graded relevance scale of documents in a search engine result set, DCG measures the usefulness, or gain, of a document based on its position in the result list. The gain is accumulated cumulatively from the top of the result list to the bottom with the gain of each result discounted at lower ranks.” –Wikipedia“a measure of effectiveness of a Web search engine algorithm or related applications, often used in information retrieval. Using a graded relevance scale of documents in a search engine result set, DCG measures the usefulness, or gain, of a document based on its position in the result list. The gain is accumulated cumulatively from the top of the result list to the bottom with the gain of each result discounted at lower ranks.” –Wikipedia DCG is a number; higher numbers are betterDCG is a number; higher numbers are better Precision-Recall Curves Precision-Recall Curves Precision: Fraction of results returned that are relevantPrecision: Fraction of results returned that are relevant Recall: Fraction of relevant results that are returnedRecall: Fraction of relevant results that are returned A way to visualize it; higher is betterA way to visualize it; higher is better

Ad Matching Algorithms Tested Baseline Baseline The original, unexpanded version of the query vectorThe original, unexpanded version of the query vector Offline Expansion Offline Expansion Expands the original query by pre-processing offline onlyExpands the original query by pre-processing offline only Online Expansion Online Expansion Expands the original query by processing online onlyExpands the original query by processing online only Online + Offline Expansion Online + Offline Expansion Expands the original query using both offline and online expansion algorithmsExpands the original query using both offline and online expansion algorithms

Test Results: Queries not found in lookup table Tested the baseline vs online expansion Tested the baseline vs online expansion The online expansion gave statistically significant improvements The online expansion gave statistically significant improvements

Test Results: Queries found in lookup table Tested all 4 algorithms Tested all 4 algorithms Best: offline expansion Best: offline expansion Second best: online + offline expansion Second best: online + offline expansion Difference between the two was not statistically significant Difference between the two was not statistically significant

Test results: full set Tested on all four algorithms Tested on all four algorithms Best: online + offline expansion Best: online + offline expansion Online expansion also offers statistically significant improvement Online expansion also offers statistically significant improvement Even better: hybrid Even better: hybrid

Efficiency The table lookup takes only 1 ms The table lookup takes only 1 ms Least efficient when a query is not in the lookup table Least efficient when a query is not in the lookup table When a query is not in the lookup table, there is a 50% overhead When a query is not in the lookup table, there is a 50% overhead This is badThis is bad But given the small proportion of queries not in the lookup table, the estimated average is 12.5% overhead But given the small proportion of queries not in the lookup table, the estimated average is 12.5% overhead This is goodThis is good