Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski.

Slides:

Advertisements

Similar presentations

Markov Decision Process

Advertisements

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.

- 1 - Intro to Content Optimization Yury Lifshits. Yahoo! Research Largely based on slides by Bee-Chung Chen, Deepak Agarwal & Pradheep Elango.

DBLA: D ISTRIBUTED B LOCK L EARNING A LGORITHM F OR C HANNEL S ELECTION I N C OGNITIVE R ADIO N ETWORKS Chowdhury Sayeed Hyder Department of Computer Science.

1 Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern.

Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.

Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.

Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

Planning under Uncertainty

Using Trees to Depict a Forest Bin Liu, H. V. Jagadish EECS, University of Michigan, Ann Arbor Presented by Sergey Shepshelvich 1.

Algorithmic and Economic Aspects of Networks Nicole Immorlica.

Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.

Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.

Research © 2008 Yahoo! Statistical Challenges in Online Advertising Deepak Agarwal Deepayan Chakrabarti (Yahoo! Research)

Capturing User Interests by Both Exploitation and Exploration Richard Sia (Joint work with NEC) Feb

1 Estimating Rates of Rare Events at Multiple Resolutions Deepak Agarwal Andrei Broder Deepayan Chakrabarti Dejan Diklic Vanja Josifovski Mayssam Sayyadian.

Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.

Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?

Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.

Multi-armed Bandit Problems with Dependent Arms

B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.

Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.

Predicting Sequential Rating Elicited from Humans Aviv Zohar & Eran Marom.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

1 Matching DOM Trees to Search Logs for Accurate Webpage Clustering Deepayan Chakrabarti Rupesh Mehta.

Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)

1 Traffic Shaping to Optimize Ad Delivery Deepayan Chakrabarti Erik Vee.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Instructor: Vincent Conitzer

Search and Planning for Inference and Learning in Computer Vision

Particle Filtering in Network Tomography

Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

Upper Confidence Trees for Game AI Chahine Koleejan.

Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.

Evaluation Methods and Challenges. 2 Deepak Agarwal & Bee-Chung ICML’11 Evaluation Methods Ideal method –Experimental Design: Run side-by-side.

Research © 2008 Yahoo! Statistical Challenges in Online Advertising Deepak Agarwal Deepayan Chakrabarti (Yahoo! Research)

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

August th Computer Olympiad1 Learning Opponent-type Probabilities for PrOM search Jeroen Donkers IKAT Universiteit Maastricht.

OLAP : Blitzkreig Introduction 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema :

Center for E-Business Technology Seoul National University Seoul, Korea BrowseRank: letting the web users vote for page importance Yuting Liu, Bin Gao,

Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.

1 N -Queens via Relaxation Labeling Ilana Koreh ( ) Luba Rashkovsky ( )

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

1 Efficient Dependency Tracking for Relevant Events in Shared Memory Systems Anurag Agarwal Vijay K. Garg

OLAP Recap 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema : Hierarchical Dimensions.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.

Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.

Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Output Grouping-Based Decomposition of Logic Functions Petr Fišer, Hana Kubátová Department of Computer Science and Engineering Czech Technical University.

A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.

Towards Robust Revenue Management: Capacity Control Using Limited Demand Information Michael Ball, Huina Gao, Yingjie Lan & Itir Karaesmen Robert H Smith.

Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.

Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Static Optimality and Dynamic Search Optimality in Lists and Trees

Measuring Sustainability Reporting using Web Scraping and Natural Language Processing Alessandra Sozzi

Feedback-Aware Social Event-Participant Arrangement

Bandits for Taxonomies: A Model-based Approach

Multi-armed Bandit Problems with Dependent Arms

Bandit’s Paradise: The Next Generation of Test-and-Learn Marketing

Online Advertising Multi-billion dollar industry, high growth

Propagating Uncertainty In POMDP Value Iteration with Gaussian Process

Stochastic Planning using Decision Diagrams

CS 416 Artificial Intelligence

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

The Content Match Problem Advertisers Ads DB Ads Ad impression: Showing an ad to a user (click)

The Content Match Problem Advertisers Ads Ad click: user click leads to revenue for ad server and content provider Ads DB (click)

The Content Match Problem Advertisers Ads DB Ads The Content Match Problem: Match ads to pages to maximize clicks

The Content Match Problem Advertisers Ads DB Ads Maximizing the number of clicks means:  For each webpage, find the ad with the best Click-Through Rate (CTR),  but without wasting too many impressions in learning this.

Online Learning Maximizing clicks requires:  Dimensionality reduction  Exploration  Exploitation Both must occur together Online learning is needed, since the system must continuously generate revenue

Taxonomies for dimensionality reduction Root Apparel Computers Travel Already exist Actively maintained Existing classifiers to map pages and ads to taxonomy nodes Page/Ad Learn the matching from page nodes to ad nodes  dimensionality reduction

Online Learning Maximizing clicks requires:  Dimensionality reduction  Exploration  Exploitation Can taxonomies help in explore/exploit as well?  Taxonomy ?

Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments Related Work Conclusions

Background: Bandits Bandit “arms” p1p1 p2p2 p3p3 (unknown payoff probabilities) Pull arms sequentially so as to maximize the total expected reward Estimate payoff probabilities p i Bias the estimation process towards better arms

Background: Bandits Webpage 1 Bandit “arms” Webpage 2 Webpage 3 = ads ~10 6 ads ~10 9 pages

Background: Bandits Ads Webpages Content Match =A matrix Each row is a bandit Each cell has an unknown CTR One bandit Unknown CTR

Background: Bandits Bandit Policy 1.Assign priority to each arm 2.“Pull” arm with max priority, and observe reward 3.Update priorities Priority 1 Priority 2 Priority 3 Allocation Estimation

Background: Bandits Why not simply apply a bandit policy directly to our problem? Convergence is too slow ~10 9 bandits, with ~10 6 arms per bandit Additional structure is available, that can help  Taxonomies

Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments Related Work Conclusions

Multi-level Policy Ads Webpages …… …… classes Consider only two levels

Multi-level Policy Apparel Compu- ters Travel …… …… Consider only two levels Travel Compu- ters Apparel Ad parent classes Ad child classes Block One bandit

Multi-level Policy Apparel Compu- ters Travel …… …… Key idea: CTRs in a block are homogeneous Ad parent classes Block One bandit Travel Compu- ters Apparel Ad child classes

Multi-level Policy CTRs in a block are homogeneous Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)

Multi-level Policy CTRs in a block are homogeneous Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)

C AC T A T Multi-level Policy (Allocation) ? Page classifier Classify webpage  page class, parent page class Run bandit on ad parent classes  pick one ad parent class

C AC T A T Multi-level Policy (Allocation) Classify webpage  page class, parent page class Run bandit on ad parent classes  pick one ad parent class Run bandit among cells  pick one ad class In general, continue from root to leaf  final ad ? Page classifier ad

C AC T A T Multi-level Policy (Allocation) Bandits at higher levels use aggregated information have fewer bandit arms  Quickly figure out the best ad parent class Page classifier

Multi-level Policy CTRs in a block are homogeneous Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)

Multi-level Policy (Estimation) CTRs in a block are homogeneous Observations from one cell also give information about others in the block How can we model this dependence?

Multi-level Policy (Estimation) Shrinkage Model S cell | CTR cell ~ Bin (N cell, CTR cell ) CTR cell ~ Beta (Params block ) # clicks in cell # impressions in cell All cells in a block come from the same distribution

Multi-level Policy (Estimation) Intuitively, this leads to shrinkage of cell CTRs towards block CTRs E[CTR] = α.Prior block + (1-α).S cell /N cell Estimated CTR Beta prior (“block CTR”) Observed CTR

Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments Related Work Conclusions

Experiments Root 20 nodes 221 nodes … ~7000 leaves Taxonomy structure We use these 2 levels Depth 0 Depth 7 Depth 1 Depth 2

Experiments Data collected over a 1 day period Collected from only one server, under some other ad-matching rules (not our bandit) ~229M impressions CTR values have been linearly transformed for purposes of confidentiality

Experiments (Multi-level Policy) Multi-level gives much higher #clicks Number of pulls Clicks

Experiments (Multi-level Policy) Multi-level gives much better Mean-Squared Error  it has learnt more from its explorations Mean-Squared Error Number of pulls

Experiments (Shrinkage) Number of pulls Mean-Squared Error Clicks without shrinkage with shrinkage Shrinkage  improved Mean-Squared Error, but no gain in #clicks

Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments Related Work Conclusions

Related Work Typical multi-armed bandit problems Do not consider dependencies Very few arms Bandits with side information Cannot handle dependencies among ads General MDP solvers Do not use the structure of the bandit problem Emphasis on learning the transition matrix, which is random in our problem.

Conclusions Taxonomies exist for many datasets They can be used for Dimensionality Reduction Multi-level bandit policy  higher #clicks Better estimation via shrinkage models  better MSE