Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)

Slides:

Advertisements

Similar presentations

Quantum Versus Classical Proofs and Advice Scott Aaronson Waterloo MIT Greg Kuperberg UC Davis | x {0,1} n ?

Advertisements

TAU Agent Team: Yishay Mansour Mariano Schain Tel Aviv University TAC-AA 2010.

Data Mining and Text Analytics Advertising Laura Quinn.

Incentivize Crowd Labeling under Budget Constraint

On-line learning and Boosting

An Approximate Truthful Mechanism for Combinatorial Auctions An Internet Mathematics paper by Aaron Archer, Christos Papadimitriou, Kunal Talwar and Éva.

Sponsored Search Cory Pender Sherwin Doroudi. Optimal Delivery of Sponsored Search Advertisements Subject to Budget Constraints Zoe Abrams Ofer Mendelevitch.

Ad Auctions: An Algorithmic Perspective Amin Saberi Stanford University Joint work with A. Mehta, U.Vazirani, and V. Vazirani.

2008 External Research Supported by Computational Analysis of Sponsored-Search Auctions External Research Initiative University of British Columbia David.

Discrete Choice Model of Bidder Behavior in Sponsored Search Quang Duong University of Michigan Sebastien Lahaie

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.

A Simple Distribution- Free Approach to the Max k-Armed Bandit Problem Matthew Streeter and Stephen Smith Carnegie Mellon University.

Regret Minimizing Audits: A Learning-theoretic Basis for Privacy Protection Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha Carnegie Mellon.

Tuning bandit algorithms in stochastic environments The 18th International Conference on Algorithmic Learning Theory October 3, 2007, Sendai International.

Gaussian Process Optimization in the Bandit Setting: No Regret & Experimental Design Niranjan Srinivas Andreas Krause Caltech Sham Kakade Matthias Seeger.

Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.

The Roles of Uncertainty and Randomness in Online Advertising Ragavendran Gopalakrishnan Eric Bax Raga Gopalakrishnan 2 nd Year Graduate Student (Computer.

6 - 1 Lecture 4 Analysis Using Spreadsheets. Five Categories of Spreadsheet Analysis Base-case analysis What-if analysis Breakeven analysis Optimization.

Online Scheduling with Known Arrival Times Nicholas G Hall (Ohio State University) Marc E Posner (Ohio State University) Chris N Potts (University of Southampton)

Sponsored Search Presenter: Lory Al Moakar. Outline Motivation Problem Definition VCG solution GSP(Generalized Second Price) GSP vs. VCG Is GSP incentive.

Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski.

Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.

The Value of Knowing a Demand Curve: Regret Bounds for Online Posted-Price Auctions Bobby Kleinberg and Tom Leighton.

CS 345 Data Mining Online algorithms Search advertising.

Jointly Optimal Transmission and Probing Strategies for Multichannel Systems Saswati Sarkar University of Pennsylvania Joint work with Sudipto Guha (Upenn)

1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April

Exploration and Exploitation Strategies for the K-armed Bandit Problem by Alexander L. Strehl.

Multi-armed Bandit Problems with Dependent Arms

WIC: A General-Purpose Algorithm for Monitoring Web Information Sources Sandeep Pandey (speaker) Kedar Dhamdhere Christopher Olston Carnegie Mellon University.

1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.

CS 345 Data Mining Online algorithms Search advertising.

Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

HAL R VARIAN FEBRUARY 16, 2009 PRESENTED BY : SANKET SABNIS Online Ad Auctions 1.

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

Reynold Cheng†, Eric Lo‡, Xuan S

Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.

Upper Confidence Trees for Game AI Chahine Koleejan.

An Online Auction Framework for Dynamic Resource Provisioning in Cloud Computing Weijie Shi*, Linquan Zhang +, Chuan Wu*, Zongpeng Li +, Francis C.M. Lau*

Evaluation Methods and Challenges. 2 Deepak Agarwal & Bee-Chung ICML’11 Evaluation Methods Ideal method –Experimental Design: Run side-by-side.

Online Advertising Greg Lackey. Advertising Life Cycle The Past Mass media Current Media fragmentation The Future Target market Audio/visual enhancements.

Utilizing Call Admission Control for Pricing Optimization of Multiple Service Classes in Wireless Cellular Networks Authors : Okan Yilmaz, Ing-Ray Chen.

Improved Approximation Algorithms for the Quality of Service Steiner Tree Problem M. Karpinski Bonn University I. Măndoiu UC San Diego A. Olshevsky GaTech.

How does the market of sponsored links operate? User enters a query The auction for the link to appear on the search results page takes place Advertisements.

Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,

Competitive Queue Policies for Differentiated Services Seminar in Packet Networks1 Competitive Queue Policies for Differentiated Services William.

© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

Empirical Analysis of Search Advertising Strategies BHANU C. VATTIKONDA, VACHA DAVE, SAIKAT GUHA, ALEX C. SNOEREN.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

Frequency Capping in Online Advertising Moran Feldman Technion Joint work with: Niv Buchbinder,The Open University of Israel Arpita Ghosh,Yahoo! Research.

CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.

Non-Preemptive Buffer Management for Latency Sensitive Packets Moran Feldman Technion Seffi Naor Technion.

March 7, Using Pattern Recognition Techniques to Derive a Formal Analysis of Why Heuristic Functions Work B. John Oommen A Joint Work with Luis.

Predicting Winning Price In Real Time Bidding With Censored Data Tejaswini Veena Sambamurthy Weicong Chen.

Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.

Analysis of Massive Data Sets Prof. dr. sc. Siniša Srbljić Doc. dr. sc. Dejan Škvorc Doc. dr. sc. Ante Đerek Faculty of Electrical Engineering and Computing.

Distributed Learning for Multi-Channel Selection in Wireless Network Monitoring — Yuan Xue, Pan Zhou, Tao Jiang, Shiwen Mao and Xiaolei Huang.

Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.

Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.

Analysis Using Spreadsheets

Online Advertising and Ad Auctions at Google

AdWords and Generalized On-line Matching

Ad Auctions: An Algorithmic Perspective

Bandits for Taxonomies: A Model-based Approach

Multi-armed Bandit Problems with Dependent Arms

Tuning bandit algorithms in stochastic environments

Online algorithms Search advertising

Shunan Zhang, Michael D. Lee, Miles Munro

Computational Advertising and

Presentation transcript:

Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)

Sponsored Search  How does it work? Search engine displays ads next to search results Advertisers pay search engine per click  Who benefits from it? Main source of funding for search engines Information flow from advertisers to users

Sponsored Search  Click-through-rate (CTR): given an ad and a query, CTR = probability that the ad receives a click  Optimal policy to maximize search engine’s revenue: display ads of highest (CTR x bid) value Search query results Sponsored search results

Challenges in Sponsored Search  Problem: CTRs initially unknown estimating CTRs requires going around the circle  Exploration/Exploitation Tradeoff: explore ads to estimate CTRs exploit known high-CTR ads to maximize revenue refine CTR estimates record clicks show ads earn revenue

The Advertisement Problem  Problem: Advertiser A i submits ad a i,j for Query phrase Q j User clicks on a ij -> A i pays b ij (the “bid value”) Queries arrive one after another Select ads to show for each query, in an online fashion  Constraints: Show at most C ads per query Advertisers have daily budgets: A i pays at most d i  Goal: Maximize search engine’s revenue Advertisers Query phrases a 1,1 A1A1 Q1Q1 a 2,1 a 1,3 A2A2 A3A3 Q2Q2 Q3Q3 a 3,2 d1d1 d2d2 d3d3 BudgetsAds

Our Approach  Unbudgeted Advertisement Problem Isomorphic to multi-armed bandit problem  Budgeted Advertisement Problem Similar to bandit problem, but with additional budget constraints that span arms Introduce Budgeted Multi-armed Multi-bandit problem (BMMP)

Unbudgeted Advertisement Problem as Multi-armed Bandit Problem  Bandit: Classical example of online learning under the explore/exploit tradeoff K arms. Arm i has an associated reward r i and unknown payoff probability p i Pull C arms at each time instant to maximize the reward accrued over time p1p1 p2p2 p3p3  Isomorphism: query phrase bandit instance; ads arms; CTR payoff probability; bid reward

Policy for Unbudgeted Problem  Policy “MIX” ( adopted from [Auer et. al. ML’02] )  When query phrase Q j arrives Compute the priority p i,j of each ad a i,j where p i,j = (e i,j + sqrt(2 ln n j / n i,j )). b i,j  e i,j is the MLE of the CTR value of a i,j  b i,j is the price or bid value of ad a i,j  n i,j : # times ad a i,j has been shown in the past  n j : # times query Q j has been answered Display the C highest-priority ads

Budgeted Multi-armed Multi-Bandit problem (BMMP)  Finite set of bandit instances; each instance has a finite number of arms  Each arm has an associated type  Each type T i has budget d i Upper limit on the total amount of reward that can be generated by the arms of type T i  An external actor invokes a bandit instance at each time instant the policy must choose C arms of the invoked instance

Meta Policy for BMMP  Input: BMMP instance and policy POL for the conventional multi-armed bandit problem  Output: The following Policy BPOL Run POL in parallel for each bandit instance B i Whenever B i is invoked:  Discard arm(s) with depleted budget  If one or more arms was discarded, restart POL i  Let POL i decide which of the remaining arms to activate

Performance Guarantee of BPOL  OPT = algorithm that knows in advance: 1.Full sequence of bandit invocations 2.Payoff probabilities  Claim: bpol(N) >= opt(N)/2 – O(f(N)) bpol(N): total expcted reward of BPOL policy after N bandit invocations opt(N): total expected reward of OPT f(N): regret of POL after N invocations of the regular bandit problem

Proof of Performance Guarantee  Divide the time instants into 3 categories: 1 : BPOL chooses an arm of higher expected reward than OPT  opt 1 (N) <= bpol 1 (N) 2 : BPOL chooses an arm of lower expected reward because OPT’s arm has run out of budget  opt 2 (N) <= bpol 2 (N) + (#types. max reward) 3 : otherwise  opt 3 (N) = O(f(N))  Claim (implies from the above bounds) opt(N) <= bpol(N) + bpol(N) + O(1) + O(f(N)) bpol(N) >= opt(N)/2 – O(f(n))

Advertisement Policies  BMIX : Output of our generic BPOL policy when given MIX as input  BMIX-E : Replace sqrt(2 ln n j / n i,j ) in priority p i,j by sqrt(min(0.25, V(n i,j,n j )). ln n j / n i,j ), where V(n i,j,n j ) = e i,j.(1-e i,j ). sqrt(2 ln n j / n i,j ) Suggested in Auer. et. al. ML’02. Purpose: Aggressive exploitation  BMIX-T : Replace b i,j in priority p i,j by b i,j. throttle(d i ‘), throttle(d i ‘) = 1-e^(- d i ‘/d i ) where d i ‘ is the remaining budget of advertiser A i Suggested in Mehta et. al. FOCS’05 Purpose: Delay the depletion of advertisers’ budgets  BMIX-ET: with both E and T modifications

Experiments  Simulations over real data  Data: 85,000 query phrases from Yahoo! query log Yahoo! ads with daily budget constraints CTRs drawn from Yahoo!’s CTR distribution Simulated user clicks using the CTR values  Time horizon = multiple days Policies carried over the CTR estimates from one day to the next

Results  GREEDY : select ads with highest current reward estimate (e i,j. b i,j ) Does not explore. Only exploits. *Revenue values scaled for confidentiality reasons

Conclusion  Search advertisement problem Exploration/exploitation tradeoff Model as multi-armed bandit  Introduced new Bandit variant Budgeted multi-armed multi-bandit problem (BMMP) New policy for BMMP with performance guarantee  In paper: Variable set of ads (ads come and go) Prior CTR estimates