Download presentation
Presentation is loading. Please wait.
Published byHoratio Turner Modified over 9 years ago
1
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski
2
Motivation Match ads to queries Sponsored Search: The query is a short piece of text input by the user Content Match: The query is a webpage on which ads can be displayed
3
Motivation Relevance-based 1. Uses IR measures of match cosine similarity BM25 2. Uses domain knowledge 3. Gives a score Click-based 1. Uses ML methods to learn a good matching function Maximum Entropy 2. Uses existing data improvement over time 3. Typically gives a probability of click
4
Motivation Relevance-based 4. Very low training cost At most one or two params, which can be set by cross- validation 5. Simple computations at testing time Using the Weighted AND (WAND) algorithm Click-based 4. Training is complicated Scalability concerns Extremely imbalanced class sizes Problems interpreting non- clicks Sampling methods heavily affect accuracy 5. All features must be computed at test time Good feature engineering critical
5
Motivation Relevance-based Uses domain knowledge Very low training cost Simple computations at testing time Click-based Uses existing data improvement over time Training is complicated Efficiency concerns during testing Combine the two Benefits of both Must control these
6
Motivation We want a system for computing matches over all ads (~millions) NOT a re-ranking of filtered results of some other matching algo Training: Can be done offline Should be parallelizable (for scalability) Testing: Must be as fast and scalable as WAND Accurate results
7
Outline Motivation WAND Background Proposed Method Experiments Conclusions
8
WAND Background Red Ball Ad 1Ad 5Ad 8 Ad 7Ad 8Ad 9 Word posting lists Cursors Query = Red Ball skip Candidate Results = Ad 8 … More generally, queries are weighted compute upper bounds on score for skips
9
WAND Background Efficiency through cursor skipping Must be able to compute upper bounds quickly Match scoring formula should not use features of the form (“word X in query AND word Y in ad”) Such pairwise (“cross-product”) checks can become very costly
10
Outline Motivation WAND Background Proposed Method Experiments Conclusions
11
Proposed Method Only use features of the form (“word X in both query AND ad”) Learn to predict click data using such features Add in some function of IR scores as extra features What function?
12
Proposed Method A logistic regression method model for CTR CTRMain effect for page (how good is the page) Main effect for ad (how good is the ad) Interaction effect (words shared by page and ad) Model parameters
13
Proposed Method M p,w = tf p,w M a,w = tf a,w I p,a,w = tf p,w * tf a,w So, IR-based term frequency measures are taken into account
14
Proposed Method Four sources of complexity Adding in IR scores Word selection for efficient learning Finer resolutions than page-level or ad-level Fast implementation for training and testing
15
Proposed Method How can IR scores fit into the model? What is the relationship between logit(p ij ) and cosine score? Quadratic relationship Cosine score logit(p ij )
16
Proposed Method How can IR scores fit into the model? This quadratic relationship can be used in two ways Put in cosine and cosine 2 as features Use it as a prior
17
Proposed Method How can IR scores fit into the model? This quadratic relationship can be used in two ways We tried both, and they give very similar results
18
Proposed Method Four sources of complexity Adding in IR scores Word selection for efficient learning Finer resolutions than page-level or ad-level Fast implementation for training and testing
19
Proposed Method Word selection Overall, nearly 110k words in corpus Learning parameters for each word would be: Very expensive Require a huge amount of data Suffer from diminishing returns So we want to select ~1k top words which will have the most impact
20
Proposed Method Word selection Two methods Data based: Define an interaction measure for each word Higher values for words which have higher-than- expected CTR when they occur on both page and ad
21
Proposed Method Word selection Two methods Data based Relevance based Compute average tfidf score of each word overall pages and ads Higher values imply higher relevance
22
Proposed Method Word selection Two methods Data based Relevance based We picked the top 1000 words by each measure Data-based methods give better results Recall Precision
23
Proposed Method Four sources of complexity Adding in IR scores Word selection for efficient learning Finer resolutions than page-level or ad-level Fast implementation for training and testing
24
Proposed Method Finer resolutions than page-level or ad-level The data has finer granularity Words are in “regions”, such as title, headers, boldfaces, metadata, etc. Word matches in title can be more important that in the body Simple extension of the model to region-specific features
25
Proposed Method Four sources of complexity Adding in IR scores Word selection for efficient learning Finer resolutions than page-level or ad-level Fast implementation for training and testing
26
Proposed Method Fast Implementation Training: Hadoop implementation of Logistic Regression Data Iterative Newton- Raphson Random data splits Mean and Variance estimates Combine estimates Learned model params
27
Proposed Method Fast Implementation Testing Main effect for ads is used in ordering of ads in postings list (static) Interaction effect is used to modify the idf- table of words (static) Main effect for pages does not play a role in ad serving (page is given) Building postings lists
28
Proposed Method Fast Implementation Testing Model can be integrated into existing code No loss of performance or scalability of the existing system
29
Proposed Method Four sources of complexity Adding in IR scores Word selection for efficient learning Finer resolutions than page-level or ad-level Fast implementation for training and testing
30
Outline Motivation WAND Background Proposed Method Experiments Conclusions
31
Experiments Recall Precision 25% lift in precision at 10% recall
32
Experiments Recall Precision 25% lift in precision at 10% recall Magnification for low recall region
33
Experiments Increasing the number of words from 1000 to 3400 led to only marginal improvement Diminishing returns System already performs close to its limit, without needing more training
34
Outline Motivation WAND Background Proposed Method Experiments Conclusions
35
Relevance-based Uses domain knowledge Very low training cost Simple computations at testing time Combine the two Parallel code for parameter fitting Use existing system: no code changes or efficiency bottlenecks Click-based Uses existing data improvement over time Training is complicated Efficiency concerns during testing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.