Download presentation
Presentation is loading. Please wait.
Published byRebecca Gilmore Modified over 6 years ago
1
Bandits for Taxonomies: A Model-based Approach
Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski
2
The Content Match Problem
Ads Ads DB Advertisers (click) Ad impression: Showing an ad to a user
3
The Content Match Problem
Ads Ads DB Advertisers (click) Ad click: user click leads to revenue for ad server and content provider
4
The Content Match Problem
Ads Ads DB Advertisers The Content Match Problem: Match ads to pages to maximize clicks
5
The Content Match Problem
Ads Ads DB Advertisers Maximizing the number of clicks means: For each webpage, find the ad with the best Click-Through Rate (CTR), but without wasting too many impressions in learning this.
6
Online Learning Maximizing clicks requires: Dimensionality reduction
Exploration Exploitation Both must occur together Online learning is needed, since the system must continuously generate revenue
7
Taxonomies for dimensionality reduction
Root Already exist Actively maintained Existing classifiers to map pages and ads to taxonomy nodes Page/Ad Apparel Computers Travel Learn the matching from page nodes to ad nodes dimensionality reduction
8
Can taxonomies help in explore/exploit as well?
Online Learning Maximizing clicks requires: Dimensionality reduction Exploration Exploitation Taxonomy ? How dim redn? Can taxonomies help in explore/exploit as well?
9
Outline Problem Background: Multi-armed bandits
Proposed Multi-level Policy Experiments Related Work Conclusions
10
Background: Bandits Bandit “arms” p1 p2 p3 (unknown payoff probabilities) Pull arms sequentially so as to maximize the total expected reward Estimate payoff probabilities pi Bias the estimation process towards better arms
11
Background: Bandits ~109 pages ~106 ads Bandit “arms” = ads Webpage 1
12
Background: Bandits Ads Webpages One bandit Unknown CTR
Content Match = A matrix Each row is a bandit Each cell has an unknown CTR
13
Background: Bandits Priority 1 Priority 2 Priority 3 Bandit Policy
Assign priority to each arm “Pull” arm with max priority, and observe reward Update priorities Allocation Estimation
14
Background: Bandits Why not simply apply a bandit policy directly to our problem? Convergence is too slow ~109 bandits, with ~106 arms per bandit Additional structure is available, that can help Taxonomies
15
Outline Problem Background: Multi-armed bandits
Proposed Multi-level Policy Experiments Related Work Conclusions
16
Consider only two levels
Multi-level Policy Ads classes Webpages classes …… … … …… Consider only two levels
17
Consider only two levels
Multi-level Policy Apparel Compu-ters Travel Ad parent classes Ad child classes Apparel …… Block Compu-ters … … …… One bandit Travel Consider only two levels
18
Key idea: CTRs in a block are homogeneous
Multi-level Policy Apparel Compu-ters Travel Ad parent classes Ad child classes Apparel …… Block Compu-ters … … …… One bandit Travel Key idea: CTRs in a block are homogeneous
19
Multi-level Policy CTRs in a block are homogeneous
Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)
20
Multi-level Policy CTRs in a block are homogeneous
Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)
21
Multi-level Policy (Allocation)
? A C T Page classifier A C T Classify webpage page class, parent page class Run bandit on ad parent classes pick one ad parent class “We still haven’t learnt that geeks and high fashion don’t mix.”
22
Multi-level Policy (Allocation)
ad ? A C T Page classifier A C T Classify webpage page class, parent page class Run bandit on ad parent classes pick one ad parent class Run bandit among cells pick one ad class In general, continue from root to leaf final ad
23
Multi-level Policy (Allocation)
ad A C T Page classifier A C T Bandits at higher levels use aggregated information have fewer bandit arms Quickly figure out the best ad parent class
24
Multi-level Policy CTRs in a block are homogeneous
Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)
25
Multi-level Policy (Estimation)
CTRs in a block are homogeneous Observations from one cell also give information about others in the block How can we model this dependence?
26
Multi-level Policy (Estimation)
Shrinkage Model # impressions in cell # clicks in cell Scell | CTRcell ~ Bin (Ncell, CTRcell) CTRcell ~ Beta (Paramsblock) All cells in a block come from the same distribution
27
Multi-level Policy (Estimation)
Intuitively, this leads to shrinkage of cell CTRs towards block CTRs E[CTR] = α.Priorblock + (1-α).Scell/Ncell Estimated CTR Beta prior (“block CTR”) Observed CTR
28
Outline Problem Background: Multi-armed bandits
Proposed Multi-level Policy Experiments Related Work Conclusions
29
Experiments Root 20 nodes We use these 2 levels 221 nodes …
Depth 0 Depth 1 20 nodes We use these 2 levels Depth 2 221 nodes … Depth 7 ~7000 leaves Taxonomy structure
30
Experiments Data collected over a 1 day period
Collected from only one server, under some other ad-matching rules (not our bandit) ~229M impressions CTR values have been linearly transformed for purposes of confidentiality
31
Experiments (Multi-level Policy)
Clicks Number of pulls Multi-level gives much higher #clicks
32
Experiments (Multi-level Policy)
Mean-Squared Error Number of pulls Multi-level gives much better Mean-Squared Error it has learnt more from its explorations
33
Experiments (Shrinkage)
without shrinkage Clicks Mean-Squared Error with shrinkage Number of pulls Number of pulls Shrinkage improved Mean-Squared Error, but no gain in #clicks
34
Outline Problem Background: Multi-armed bandits
Proposed Multi-level Policy Experiments Related Work Conclusions
35
Related Work Typical multi-armed bandit problems
Do not consider dependencies Very few arms Bandits with side information Cannot handle dependencies among ads General MDP solvers Do not use the structure of the bandit problem Emphasis on learning the transition matrix, which is random in our problem. Citations!
36
Conclusions Taxonomies exist for many datasets They can be used for
Dimensionality Reduction Multi-level bandit policy higher #clicks Better estimation via shrinkage models better MSE
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.