Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.

Slides:



Advertisements
Similar presentations
Chapter 5 Multiple Linear Regression
Advertisements

Chapter 5: Introduction to Information Retrieval
Ziv Bar-YossefMaxim Gurevich Google and Technion Technion TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA.
Introduction to Information Retrieval
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Implicit Queries for Vitor R. Carvalho (Joint work with Joshua Goodman, at Microsoft Research)
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Intelligent Information Retrieval 1 Vector Space Model for IR: Implementation Notes CSC 575 Intelligent Information Retrieval These notes are based, in.
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
1 Estimating Rates of Rare Events at Multiple Resolutions Deepak Agarwal Andrei Broder Deepayan Chakrabarti Dejan Diklic Vanja Josifovski Mayssam Sayyadian.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
Presented by Zeehasham Rasheed
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
Scalable Text Mining with Sparse Generative Models
1 Matching DOM Trees to Search Logs for Accurate Webpage Clustering Deepayan Chakrabarti Rupesh Mehta.
Information Retrieval
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
Overview of Search Engines
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.
A Privacy Preserving Efficient Protocol for Semantic Similarity Join Using Long String Attributes Bilal Hawashin, Farshad Fotouhi Traian Marius Truta Department.
Multiple testing correction
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
PERSONALIZED SEARCH Ram Nithin Baalay. Personalized Search? Search Engine: A Vital Need Next level of Intelligent Information Retrieval. Retrieval of.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Chapter 6: Information Retrieval and Web Search
1 Computing Relevance, Similarity: The Vector Space Model.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
Contextual Advertising by Combining Relevance with Click Feedback Deepak Agarwal Joint work with Deepayan Chakrabarti & Vanja Josifovski Yahoo! Research.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
NTU & MSRA Ming-Feng Tsai
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Information Retrieval and Web Search
Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Implementation Based on Inverted Files
Overview of Machine Learning
Learning Literature Search Models from Citation Behavior
6. Implementation of Vector-Space Retrieval
Presentation transcript:

Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski

Motivation Match ads to queries  Sponsored Search: The query is a short piece of text input by the user  Content Match: The query is a webpage on which ads can be displayed

Motivation Relevance-based 1. Uses IR measures of match cosine similarity BM25 2. Uses domain knowledge 3. Gives a score Click-based 1. Uses ML methods to learn a good matching function Maximum Entropy 2. Uses existing data  improvement over time 3. Typically gives a probability of click

Motivation Relevance-based 4. Very low training cost  At most one or two params, which can be set by cross- validation 5. Simple computations at testing time  Using the Weighted AND (WAND) algorithm Click-based 4. Training is complicated  Scalability concerns  Extremely imbalanced class sizes  Problems interpreting non- clicks  Sampling methods heavily affect accuracy 5. All features must be computed at test time  Good feature engineering critical

Motivation Relevance-based  Uses domain knowledge  Very low training cost  Simple computations at testing time Click-based  Uses existing data  improvement over time  Training is complicated  Efficiency concerns during testing Combine the two Benefits of both Must control these

Motivation We want a system for computing matches over all ads (~millions)  NOT a re-ranking of filtered results of some other matching algo Training:  Can be done offline  Should be parallelizable (for scalability) Testing:  Must be as fast and scalable as WAND  Accurate results

Outline Motivation WAND Background Proposed Method Experiments Conclusions

WAND Background Red Ball Ad 1Ad 5Ad 8 Ad 7Ad 8Ad 9 Word posting lists Cursors Query = Red Ball skip Candidate Results = Ad 8 … More generally, queries are weighted  compute upper bounds on score for skips

WAND Background Efficiency through cursor skipping Must be able to compute upper bounds quickly  Match scoring formula should not use features of the form (“word X in query AND word Y in ad”)  Such pairwise (“cross-product”) checks can become very costly

Outline Motivation WAND Background Proposed Method Experiments Conclusions

Proposed Method Only use features of the form (“word X in both query AND ad”) Learn to predict click data using such features Add in some function of IR scores as extra features  What function?

Proposed Method A logistic regression method model for CTR CTRMain effect for page (how good is the page) Main effect for ad (how good is the ad) Interaction effect (words shared by page and ad) Model parameters

Proposed Method M p,w = tf p,w M a,w = tf a,w I p,a,w = tf p,w * tf a,w So, IR-based term frequency measures are taken into account

Proposed Method Four sources of complexity  Adding in IR scores  Word selection for efficient learning  Finer resolutions than page-level or ad-level  Fast implementation for training and testing

Proposed Method How can IR scores fit into the model?  What is the relationship between logit(p ij ) and cosine score?  Quadratic relationship Cosine score logit(p ij )

Proposed Method How can IR scores fit into the model? This quadratic relationship can be used in two ways  Put in cosine and cosine 2 as features  Use it as a prior

Proposed Method How can IR scores fit into the model? This quadratic relationship can be used in two ways  We tried both, and they give very similar results

Proposed Method Four sources of complexity  Adding in IR scores  Word selection for efficient learning  Finer resolutions than page-level or ad-level  Fast implementation for training and testing

Proposed Method Word selection  Overall, nearly 110k words in corpus  Learning parameters for each word would be: Very expensive Require a huge amount of data Suffer from diminishing returns  So we want to select ~1k top words which will have the most impact

Proposed Method Word selection Two methods  Data based: Define an interaction measure for each word Higher values for words which have higher-than- expected CTR when they occur on both page and ad

Proposed Method Word selection Two methods  Data based  Relevance based Compute average tfidf score of each word overall pages and ads Higher values imply higher relevance

Proposed Method Word selection Two methods  Data based  Relevance based  We picked the top 1000 words by each measure  Data-based methods give better results Recall Precision

Proposed Method Four sources of complexity  Adding in IR scores  Word selection for efficient learning  Finer resolutions than page-level or ad-level  Fast implementation for training and testing

Proposed Method Finer resolutions than page-level or ad-level The data has finer granularity  Words are in “regions”, such as title, headers, boldfaces, metadata, etc.  Word matches in title can be more important that in the body  Simple extension of the model to region-specific features

Proposed Method Four sources of complexity  Adding in IR scores  Word selection for efficient learning  Finer resolutions than page-level or ad-level  Fast implementation for training and testing

Proposed Method Fast Implementation  Training: Hadoop implementation of Logistic Regression Data Iterative Newton- Raphson Random data splits Mean and Variance estimates Combine estimates Learned model params

Proposed Method Fast Implementation  Testing Main effect for ads is used in ordering of ads in postings list (static) Interaction effect is used to modify the idf- table of words (static) Main effect for pages does not play a role in ad serving (page is given) Building postings lists

Proposed Method Fast Implementation  Testing Model can be integrated into existing code No loss of performance or scalability of the existing system

Proposed Method Four sources of complexity  Adding in IR scores  Word selection for efficient learning  Finer resolutions than page-level or ad-level  Fast implementation for training and testing

Outline Motivation WAND Background Proposed Method Experiments Conclusions

Experiments Recall Precision 25% lift in precision at 10% recall

Experiments Recall Precision 25% lift in precision at 10% recall Magnification for low recall region

Experiments Increasing the number of words from 1000 to 3400 led to only marginal improvement  Diminishing returns  System already performs close to its limit, without needing more training

Outline Motivation WAND Background Proposed Method Experiments Conclusions

Relevance-based  Uses domain knowledge  Very low training cost  Simple computations at testing time Combine the two Parallel code for parameter fitting Use existing system: no code changes or efficiency bottlenecks Click-based  Uses existing data  improvement over time  Training is complicated  Efficiency concerns during testing