Bandits for Taxonomies: A Model-based Approach

Slides:



Advertisements
Similar presentations
Possible Windows 8 Improvements By: Scott Hill. Improve Windows 8 Split Screen Mode The only options for split screen mode currently is to have one screen.
Advertisements

Intel Research Internet Coordinate Systems - 03/03/2004 Internet Coordinate Systems Marcelo Pias Intel Research Cambridge
Packet Classification using Hierarchical Intelligent Cuttings
Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
Planning under Uncertainty
Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski.
Using Trees to Depict a Forest Bin Liu, H. V. Jagadish EECS, University of Michigan, Ann Arbor Presented by Sergey Shepshelvich 1.
Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.
Research © 2008 Yahoo! Statistical Challenges in Online Advertising Deepak Agarwal Deepayan Chakrabarti (Yahoo! Research)
Capturing User Interests by Both Exploitation and Exploration Richard Sia (Joint work with NEC) Feb
1 Estimating Rates of Rare Events at Multiple Resolutions Deepak Agarwal Andrei Broder Deepayan Chakrabarti Dejan Diklic Vanja Josifovski Mayssam Sayyadian.
Multi-armed Bandit Problems with Dependent Arms
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
1 Matching DOM Trees to Search Logs for Accurate Webpage Clustering Deepayan Chakrabarti Rupesh Mehta.
Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
Search and Planning for Inference and Learning in Computer Vision
Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,
Upper Confidence Trees for Game AI Chahine Koleejan.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Research © 2008 Yahoo! Statistical Challenges in Online Advertising Deepak Agarwal Deepayan Chakrabarti (Yahoo! Research)
August th Computer Olympiad1 Learning Opponent-type Probabilities for PrOM search Jeroen Donkers IKAT Universiteit Maastricht.
Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Autonomic Joint Session Admission Control using Reinforcement Learning.
1 N -Queens via Relaxation Labeling Ilana Koreh ( ) Luba Rashkovsky ( )
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.
Towards Robust Revenue Management: Capacity Control Using Limited Demand Information Michael Ball, Huina Gao, Yingjie Lan & Itir Karaesmen Robert H Smith.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.
Bayesian Semi-Parametric Multiple Shrinkage
School of Computing Clemson University Fall, 2012
Figure 5: Change in Blackjack Posterior Distributions over Time.
Quadratic Classifiers (QC)
Data Indexing Herbert A. Evans.
CS 540 Database Management Systems
Computational Intelligence: Methods and Applications
Reinforcement Learning (1)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
B+-Trees.
Static Optimality and Dynamic Search Optimality in Lists and Trees
Measuring Sustainability Reporting using Web Scraping and Natural Language Processing Alessandra Sozzi
Chapter 25 Comparing Counts.
FRM: Modeling Sponsored Search Log with Full Relational Model
Strategies for Implementing Flexible Clinical Trials Jerald S. Schindler, Dr.P.H. Cytel Pharmaceutical Research Services 2006 FDA/Industry Statistics Workshop.
Feedback-Aware Social Event-Participant Arrangement
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Multi-armed Bandit Problems with Dependent Arms
Bandit’s Paradise: The Next Generation of Test-and-Learn Marketing
Online Advertising Multi-billion dollar industry, high growth
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
RL methods in practice Alekh Agarwal.
Dr. Unnikrishnan P.C. Professor, EEE
Stochastic Planning using Decision Diagrams
Radio Network Parameters
Instructor: Vincent Conitzer
October 6, 2011 Dr. Itamar Arel College of Engineering
Chapter 26 Comparing Counts.
CS 188: Artificial Intelligence Fall 2008
Designing Neural Network Architectures Using Reinforcement Learning
Chapter 3: Processes.
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Guess a random word with n errors
CS 416 Artificial Intelligence
Chapter 26 Comparing Counts.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Efficient Aggregation over Objects with Extent
--WWW 2010, Hongji Bao, Edward Y. Chang
Presentation transcript:

Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

The Content Match Problem Ads Ads DB Advertisers (click) Ad impression: Showing an ad to a user

The Content Match Problem Ads Ads DB Advertisers (click) Ad click: user click leads to revenue for ad server and content provider

The Content Match Problem Ads Ads DB Advertisers The Content Match Problem: Match ads to pages to maximize clicks

The Content Match Problem Ads Ads DB Advertisers Maximizing the number of clicks means: For each webpage, find the ad with the best Click-Through Rate (CTR), but without wasting too many impressions in learning this.

Online Learning Maximizing clicks requires: Dimensionality reduction Exploration Exploitation Both must occur together Online learning is needed, since the system must continuously generate revenue

Taxonomies for dimensionality reduction Root Already exist Actively maintained Existing classifiers to map pages and ads to taxonomy nodes Page/Ad Apparel Computers Travel Learn the matching from page nodes to ad nodes  dimensionality reduction

Can taxonomies help in explore/exploit as well? Online Learning Maximizing clicks requires: Dimensionality reduction Exploration Exploitation  Taxonomy ? How dim redn? Can taxonomies help in explore/exploit as well?

Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments Related Work Conclusions

Background: Bandits Bandit “arms” p1 p2 p3 (unknown payoff probabilities) Pull arms sequentially so as to maximize the total expected reward Estimate payoff probabilities pi Bias the estimation process towards better arms

Background: Bandits ~109 pages ~106 ads Bandit “arms” = ads Webpage 1

Background: Bandits Ads Webpages One bandit Unknown CTR Content Match = A matrix Each row is a bandit Each cell has an unknown CTR

Background: Bandits Priority 1 Priority 2 Priority 3 Bandit Policy Assign priority to each arm “Pull” arm with max priority, and observe reward Update priorities Allocation Estimation

Background: Bandits Why not simply apply a bandit policy directly to our problem? Convergence is too slow ~109 bandits, with ~106 arms per bandit Additional structure is available, that can help  Taxonomies

Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments Related Work Conclusions

Consider only two levels Multi-level Policy Ads classes Webpages classes …… … … …… Consider only two levels

Consider only two levels Multi-level Policy Apparel Compu-ters Travel Ad parent classes Ad child classes Apparel …… Block Compu-ters … … …… One bandit Travel Consider only two levels

Key idea: CTRs in a block are homogeneous Multi-level Policy Apparel Compu-ters Travel Ad parent classes Ad child classes Apparel …… Block Compu-ters … … …… One bandit Travel Key idea: CTRs in a block are homogeneous

Multi-level Policy CTRs in a block are homogeneous Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)

Multi-level Policy CTRs in a block are homogeneous Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)

Multi-level Policy (Allocation) ? A C T Page classifier A C T Classify webpage  page class, parent page class Run bandit on ad parent classes  pick one ad parent class “We still haven’t learnt that geeks and high fashion don’t mix.”

Multi-level Policy (Allocation) ad ? A C T Page classifier A C T Classify webpage  page class, parent page class Run bandit on ad parent classes  pick one ad parent class Run bandit among cells  pick one ad class In general, continue from root to leaf  final ad

Multi-level Policy (Allocation) ad A C T Page classifier A C T Bandits at higher levels use aggregated information have fewer bandit arms Quickly figure out the best ad parent class

Multi-level Policy CTRs in a block are homogeneous Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)

Multi-level Policy (Estimation) CTRs in a block are homogeneous Observations from one cell also give information about others in the block How can we model this dependence?

Multi-level Policy (Estimation) Shrinkage Model # impressions in cell # clicks in cell Scell | CTRcell ~ Bin (Ncell, CTRcell) CTRcell ~ Beta (Paramsblock) All cells in a block come from the same distribution

Multi-level Policy (Estimation) Intuitively, this leads to shrinkage of cell CTRs towards block CTRs E[CTR] = α.Priorblock + (1-α).Scell/Ncell Estimated CTR Beta prior (“block CTR”) Observed CTR

Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments Related Work Conclusions

Experiments Root 20 nodes We use these 2 levels 221 nodes … Depth 0 Depth 1 20 nodes We use these 2 levels Depth 2 221 nodes … Depth 7 ~7000 leaves Taxonomy structure

Experiments Data collected over a 1 day period Collected from only one server, under some other ad-matching rules (not our bandit) ~229M impressions CTR values have been linearly transformed for purposes of confidentiality

Experiments (Multi-level Policy) Clicks Number of pulls Multi-level gives much higher #clicks

Experiments (Multi-level Policy) Mean-Squared Error Number of pulls Multi-level gives much better Mean-Squared Error  it has learnt more from its explorations

Experiments (Shrinkage) without shrinkage Clicks Mean-Squared Error with shrinkage Number of pulls Number of pulls Shrinkage  improved Mean-Squared Error, but no gain in #clicks

Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments Related Work Conclusions

Related Work Typical multi-armed bandit problems Do not consider dependencies Very few arms Bandits with side information Cannot handle dependencies among ads General MDP solvers Do not use the structure of the bandit problem Emphasis on learning the transition matrix, which is random in our problem. Citations!

Conclusions Taxonomies exist for many datasets They can be used for Dimensionality Reduction Multi-level bandit policy  higher #clicks Better estimation via shrinkage models  better MSE