Catching the Drift: Learning Broad Matches from Clickthrough Data Sonal Gupta, Mikhail Bilenko, Matthew Richardson University of Texas at Austin, Microsoft Research
Introduction Keyword-based online advertising: bidded keywords are extracted from context Context: query (search ads) or page (content ads) Broad matching: expanding keywords via keyword-to-keywords mapping Example: electric cars tesla, hybrids, toyota prius, golf carts Broad matching benefits advertisers (increased reach, less campaign tuning), users (more relevant ads), ad platform (higher monetization) Expanded Keywords kw 1 kw 11 kw 12 kw n kw n1 kw n2 Broad Match Expansion Ad Selection and Ranking Ad 1 Ad 2 Ad k Extracted Keywords Keyword Extraction kw 1 kw 2 kw n Query or Web Page Selected Ads
Identifying Broad Matches Good keyword mappings retrieve relevant ads that users click How to measure what is relevant and likely to be clicked? Human judgments: expensive, hard to scale Past user clicks: provide click data for kw → kw’ when user was shown ad(kw' ) in context of kw Highly available, less trustworthy What similarity functions may indicate relevance of kw → kw' ? Syntactic (edit distance, TF-IDF cosine, string kernels, …) Co-occurrence (in documents, query sessions, bid campaigns, …) Expanded representation (search result snippets, category bags, …)
Approach Task: train a learner to estimate p(click | kw → kw' ) for any kw → kw' Data triples from clickthrough logs, where kw → kw' was suggested by previous broad match mappings Features Convert each pair to a feature vector capturing similarities etc. (kw → kw') → For each triple, create an instance: ( ϕ (kw, kw' ), click) Learner: max-margin averaged perceptron (strong theory, very efficient) ϕ 1 (kw, kw' ) ϕ n (kw, kw' ) … where ϕ i (kw, kw' ) can be any function of kw, kw' or both
5 Example: Creating an Instance Historical broad match clickthrough data: kw kw' ad(kw' ) click event digital slr canon rebel Canon Rebel Kit for $499 click seattle baseball mariners tickets Mariners season tickets no click Feature functions Instances [ ], 1 [ ], 0 Original kwBroad match kw' ϕ1ϕ1 ϕ2ϕ2 ϕ3ϕ3 digital slrcanon rebel seattle baseballmariners tickets
Experiments Data 2 months of previous broad match ads from Microsoft Content Ads logs 1 month for training, 1 month for testing 68 features (syntactic, co-occurrence based, etc.); greedy feature selection Metrics LogLoss: LogLoss Lift: difference between obtained LogLoss and an oracle that has access to empirical p(click | kw → kw' ) in test set. CTR and revenue results in live test with users
Results
Live Test Results Use CTR prediction to maximize expected revenue Re-rank mappings to incorporate revenue +18% revenue, -2% CTR
Online Learning with Amnesia Advertisers, campaigns, bidded keywords and delivery contexts change very rapidly: high concept drift Recent data is more informative Goal: utilize older data while capturing changes in distributions Averaged Perceptron doesn’t capture drift Solution: Amnesiac Averaged Perceptron Exponential weight decay when averaging hypotheses
Results Model-LogLossLogL Lift Prior Feature Selection + Online Learning + Amnesia Online+Feature Selection, No Amnesia Online+Amnesia, No Feature Selection Feature Selection+Amnesia, Weekly Batch
Contributions and Conclusions learning broad matches from implicit feedback Combining arbitrary similarity measures/features Using clickthrough logs as implicit feedback Amnesiac Averaged Perceptron Exponentially weighted averaging: distant examples “fade out” Online learning adapts to market dynamics
Thank You!
13 Features and Feature Selection Co-occurrence feature examples: User search sessions: keywords searched within 10 mins Advertiser campaigns: keywords co-bidded by the same advertiser Past clickthrough rates of original and broad matched keywords Various syntactic similarities Various existing broad matching lists and so on… Feature Selection: A total of 68 features Greedy feature selection
Additional Information Estimation of expected value of click over all the ads shown for a broad match mapping E(p(click(ad(kw))|q)) Query Expansion vs. Broad Matching Our broad matching algorithm can be extended for query expansion But, broad matching is for a fixed set of bidded keywords Forgetron vs. Amesiac Averaged Perceptron Forgetron maintains a set of budget support vectors: stores examples explicitly and does not take into account all the data AAP: weighted average over all the examples, no need to store examples explicitly
Results Model-LogLossLogL Lift Prior Feature Selection + Online Learning + Amnesia Online+Amnesia, No Feature Selection Feature Selection+Amnesia, Weekly Batch Online+Feature Selection, No Amnesia
16 Amnesiac Averaged Perceptron