The Value of Knowing a Demand Curve: Regret Bounds for Online Posted-Price Auctions Bobby Kleinberg and Tom Leighton.

Slides:



Advertisements
Similar presentations
Combinatorial Auction
Advertisements

Truthful Mechanisms for Combinatorial Auctions with Subadditive Bidders Speaker: Shahar Dobzinski Based on joint works with Noam Nisan & Michael Schapira.
Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.
Online Learning for Online Pricing Problems Maria Florina Balcan.
Mechanism Design, Machine Learning, and Pricing Problems Maria-Florina Balcan.
Seminar in Auctions and Mechanism Design Based on J. Hartline’s book: Approximation in Economic Design Presented by: Miki Dimenshtein & Noga Levy.
Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.
An Approximate Truthful Mechanism for Combinatorial Auctions An Internet Mathematics paper by Aaron Archer, Christos Papadimitriou, Kunal Talwar and Éva.
A Prior-Free Revenue Maximizing Auction for Secondary Spectrum Access Ajay Gopinathan and Zongpeng Li IEEE INFOCOM 2011, Shanghai, China.
A Simple Distribution- Free Approach to the Max k-Armed Bandit Problem Matthew Streeter and Stephen Smith Carnegie Mellon University.
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
Regret Minimizing Audits: A Learning-theoretic Basis for Privacy Protection Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha Carnegie Mellon.
Online Algorithms Amrinder Arora Permalink:
Regret Minimization and the Price of Total Anarchy Paper by A. Blum, M. Hajiaghayi, K. Ligett, A.Roth Presented by Michael Wunder.
Online learning, minimizing regret, and combining expert advice
1 Regret-based Incremental Partial Revelation Mechanism Design Nathanaël Hyafil, Craig Boutilier AAAI 2006 Department of Computer Science University of.
Tuning bandit algorithms in stochastic environments The 18th International Conference on Algorithmic Learning Theory October 3, 2007, Sendai International.
Chunyang Tong Sriram Dasu Information & Operations Management Marshall School of Business University of Southern California Los Angeles CA Dynamic.
Sponsored Search Presenter: Lory Al Moakar. Outline Motivation Problem Definition VCG solution GSP(Generalized Second Price) GSP vs. VCG Is GSP incentive.
Bundling Equilibrium in Combinatorial Auctions Written by: Presented by: Ron Holzman Rica Gonen Noa Kfir-Dahav Dov Monderer Moshe Tennenholtz.
Maria-Florina Balcan Approximation Algorithms and Online Mechanisms for Item Pricing Maria-Florina Balcan & Avrim Blum CMU, CSD.
Visual Recognition Tutorial
Evaluation of Algorithms for the List Update Problem Suporn Pongnumkul R. Ravi Kedar Dhamdhere.
Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.
Dynamic Internet Congestion with Bursts Stefan Schmid Roger Wattenhofer Distributed Computing Group, ETH Zurich 13th International Conference On High Performance.
Item Pricing for Revenue Maximization in Combinatorial Auctions Maria-Florina Balcan, Carnegie Mellon University Joint with Avrim Blum and Yishay Mansour.
The price of anarchy of finite congestion games Kapelushnik Lior Based on the articles: “ The price of anarchy of finite congestion games ” by Christodoulou.
Jointly Optimal Transmission and Probing Strategies for Multichannel Systems Saswati Sarkar University of Pennsylvania Joint work with Sudipto Guha (Upenn)
Duality Lecture 10: Feb 9. Min-Max theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum Cut Both.
Lower and Upper Bounds on Obtaining History Independence Niv Buchbinder and Erez Petrank Technion, Israel.
Sequences of Take-It-or-Leave-it Offers: Near-Optimal Auctions Without Full Valuation Revelation Tuomas Sandholm and Andrew Gilpin Carnegie Mellon University.
Mechanism Design: Online Auction or Packet Scheduling Online auction of a reusable good (packet slots) Agents types: (arrival, departure, value) –Agents.
Online Oblivious Routing Nikhil Bansal, Avrim Blum, Shuchi Chawla & Adam Meyerson Carnegie Mellon University 6/7/2003.
Machine Learning for Mechanism Design and Pricing Problems Avrim Blum Carnegie Mellon University Joint work with Maria-Florina Balcan, Jason Hartline,
1 Worst-Case Equilibria Elias Koutsoupias and Christos Papadimitriou Proceedings of the 16th Annual Symposium on Theoretical Aspects of Computer Science.
Chapter 11: Limitations of Algorithmic Power
Evaluation of Algorithms for the List Update Problem Suporn Pongnumkul R. Ravi Kedar Dhamdhere.
Competitive Analysis of Incentive Compatible On-Line Auctions Ron Lavi and Noam Nisan SISL/IST, Cal-Tech Hebrew University.
Yang Cai Sep 15, An overview of today’s class Myerson’s Lemma (cont’d) Application of Myerson’s Lemma Revelation Principle Intro to Revenue Maximization.
Near-Optimal Simple and Prior-Independent Auctions Tim Roughgarden (Stanford)
Asaf Cohen Department of Mathematics University of Michigan Financial Mathematics Seminar University of Michigan September 10,
Approximation Algorithms for Stochastic Combinatorial Optimization Part I: Multistage problems Anupam Gupta Carnegie Mellon University.
Reinforcement Learning Evaluative Feedback and Bandit Problems Subramanian Ramamoorthy School of Informatics 20 January 2012.
Online Oblivious Routing Nikhil Bansal, Avrim Blum, Shuchi Chawla & Adam Meyerson Carnegie Mellon University 6/7/2003.
online convex optimization (with partial information)
Multi-Unit Auctions with Budget Limits Shahar Dobzinski, Ron Lavi, and Noam Nisan.
Sequences of Take-It-or-Leave-it Offers: Near-Optimal Auctions Without Full Valuation Revelation Tuomas Sandholm and Andrew Gilpin Carnegie Mellon University.
Auction Seminar Optimal Mechanism Presentation by: Alon Resler Supervised by: Amos Fiat.
Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.
Preference elicitation Communicational Burden by Nisan, Segal, Lahaie and Parkes October 27th, 2004 Jella Pfeiffer.
Market Design and Analysis Lecture 5 Lecturer: Ning Chen ( 陈宁 )
1 Model 3 (Strategic informed trader) Kyle (Econometrica 1985) The economy A group of three agents trades a risky asset for a risk-less asset. One insider.
Unlimited Supply Infinitely many identical items. Each bidder wants one item. –Corresponds to a situation were we have no marginal production cost. –Very.
Regret Minimizing Equilibria of Games with Strict Type Uncertainty Stony Brook Conference on Game Theory Nathanaël Hyafil and Craig Boutilier Department.
Econ 805 Advanced Micro Theory 1 Dan Quint Fall 2007 Lecture 3 – Sept
Umans Complexity Theory Lectures Lecture 1a: Problems and Languages.
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
Auctions serve the dual purpose of eliciting preferences and allocating resources between competing uses. A less fundamental but more practical reason.
CSCE 411H Design and Analysis of Algorithms Set 10: Lower Bounds Prof. Evdokia Nikolova* Spring 2013 CSCE 411H, Spring 2013: Set 10 1 * Slides adapted.
Item Pricing for Revenue Maximization in Combinatorial Auctions Maria-Florina Balcan.
The Message Passing Communication Model David Woodruff IBM Almaden.
Reconstructing Preferences from Opaque Transactions Avrim Blum Carnegie Mellon University Joint work with Yishay Mansour (Tel-Aviv) and Jamie Morgenstern.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Carnegie Mellon University
Tuning bandit algorithms in stochastic environments
The Nonstochastic Multiarmed Bandit Problem
Algorithmic Problems Related To The Internet
Chapter 11 Limitations of Algorithm Power
The Byzantine Secretary Problem
Near-Optimal Simple and Prior-Independent Auctions Tim Roughgarden (Stanford)
Presentation transcript:

The Value of Knowing a Demand Curve: Regret Bounds for Online Posted-Price Auctions Bobby Kleinberg and Tom Leighton

Introduction How do we measure the value of knowing the demand curve for a good?

Introduction How do we measure the value of knowing the demand curve for a good? Mathematical formulation: What is the difference in expected revenue between an informed seller who knows the demand curve, and an uninformed seller using an adaptive pricing strategy … assuming both pursue the optimal strategy.

Online Posted-Price Auctions 1 seller, n buyers, each wants one item. Buyers interact with seller one at a time. Transaction: Seller posts price.

Online Posted-Price Auctions 1 seller, n buyers, each wants one item. Buyers interact with seller one at a time. Transaction: Seller posts price. Buyer arrives. 6¢6¢

Online Posted-Price Auctions 1 seller, n buyers, each wants one item. Buyers interact with seller one at a time. Transaction: Seller posts price. Buyer arrives. Buyer gives YES/NO response. 6¢6¢ YES

Online Posted-Price Auctions 1 seller, n buyers, each wants one item. Buyers interact with seller one at a time. Transaction: Seller posts price. Buyer arrives. Buyer gives YES/NO response. Seller may update price YES 10¢ after each transaction.

Online Posted-Price Auctions A natural transaction model for many forms of commerce, including web commerce. (Our motivation came from ticketmaster.com.) 10¢

Online Posted-Price Auctions A natural transaction model for many forms of commerce, including web commerce. (Our motivation came from ticketmaster.com.) Clearly strategyproof, since agents’ strategic behavior is limited to their YES/NO responses. 10¢

Informed vs. Uninformed Sellers Uninformed Informed

Informed vs. Uninformed Sellers InformedUninformed ValueAskRevenueAskRevenue.8

Informed vs. Uninformed Sellers InformedUninformed ValueAskRevenueAskRevenue.5.8

Informed vs. Uninformed Sellers ValueAskRevenueAskRevenue InformedUninformed

Informed vs. Uninformed Sellers ValueAskRevenueAskRevenue InformedUninformed

Informed vs. Uninformed Sellers ValueAskRevenueAskRevenue InformedUninformed

Informed vs. Uninformed Sellers ValueAskRevenueAskRevenue InformedUninformed

Informed vs. Uninformed Sellers ValueAskRevenueAskRevenue InformedUninformed

Informed vs. Uninformed Sellers ValueAskRevenueAskRevenue Ex ante regret = 0.5 InformedUninformed

Informed vs. Uninformed Sellers ValueAskRevenueAskRevenue Ex post regret = 1.0 InformedUninformed

Definition of Regret Regret = difference in expected revenue between informed and uninformed seller. Ex ante regret corresponds to asking, “What is the value of knowing the demand curve?” Competitive ratio was already considered by Blum, Kumar, et al (SODA’03). They exhibited a (1+ε)-competitive pricing strategy under a mild hypothesis on the informed seller’s revenue.

3 Problem Variants Identical valuations: All buyers have same threshold price v, which is unknown to seller. Random valuations: Buyers are independent samples from a fixed probability distribution (demand curve) which is unknown to seller. Worst-case valuations: Make no assumptions about buyers’ valuations, they may be chosen by an oblivious adversary. Always assume prices are between 0 and 1.

Regret Bounds for the Three Cases Valuation ModelLower BoundUpper Bound IdenticalΩ(log log n)O(log log n) RandomΩ(n 1/2 )O((n log n) 1/2 ) Worst-CaseΩ(n 2/3 )O(n 2/3 (log n) 1/3 ) Ex postEx ante

Identical Valuations Exponentially better than binary search!! Equivalent to a question considered by Karp, Koutsoupias, Papadimitriou, Shenker in the context of congestion control. (KKPS, FOCS 2000). Our lower bound settles two of their open questions. Valuation ModelLower BoundUpper Bound IdenticalΩ(log log n)O(log log n)

Random Valuations x D(x) Demand curve: D(x) = Pr(accepting price x)

Best “Informed” Strategy x D(x) Expected revenue at price x: f(x) = xD(x).

Best “Informed” Strategy x D(x) If demand curve is known, best strategy is fixed price maximizing area of rectangle.

Best “Informed” Strategy x D(x) If demand curve is known, best strategy is fixed price maximizing area of rectangle. Best known uninformed strategy is based on the multi-armed bandit problem...

The Multi-Armed Bandit Problem You are in a casino with K slot machines. Each generates random payoffs by i.i.d. sampling from an unknown distribution.

0.3 The Multi-Armed Bandit Problem You are in a casino with K slot machines. Each generates random payoffs by i.i.d. sampling from an unknown distribution. You choose a slot machine on each step and observe the payoff

The Multi-Armed Bandit Problem You are in a casino with K slot machines. Each generates random payoffs by i.i.d. sampling from an unknown distribution. You choose a slot machine on each step and observe the payoff. Your expected payoff is compared with that of the best single slot machine

The Multi-Armed Bandit Problem Assuming best play: Ex ante regret = θ(log n) [Lai-Robbins, 1986] Ex post regret = θ(√n) [Auer et al, 1995] Ex post bound applies even if the payoffs are adversarial rather than random. (Oblivious adversary.)

Application to Online Pricing Our problem resembles a multi-armed bandit problem with a continuum of “slot machines”, one for each price in [0,1]. Divide [0,1] into K subintervals, treat them as a finite set of slot machines.

Application to Online Pricing Our problem resembles a multi-armed bandit problem with a continuum of “slot machines”, one for each price in [0,1]. Divide [0,1] into K subintervals, treat them as a finite set of slot machines. The existing bandit algorithms have regret O(K 2 log n + K -2 n), provided xD(x) is smooth and has a unique global max in [0,1]. Optimizing K yields regret O((n log n) ½ ).

The Continuum-Armed Bandit The continuum-armed bandit problem has algorithms with regret O(n ¾ ), when exp. payoff depends smoothly on the action chosen. Finite- Armed 2 א 0 - Armed Ex Anteθ(log n) O(n¾)O(n¾) Ex Postθ(√n)

The Continuum-Armed Bandit The continuum-armed bandit problem has algorithms with regret O(n ¾ ), when exp. payoff depends smoothly on the action chosen. But: Best-known lower bound on regret was Ω(log n) coming from the finite-armed case. Finite- Armed 2 א 0 - Armed Ex Anteθ(log n)Ω(log n) O(n ¾ ) Ex Postθ(√n) ?

The Continuum-Armed Bandit The continuum-armed bandit problem has algorithms with regret O(n ¾ ), when exp. payoff depends smoothly on the action chosen. But: Best-known lower bound on regret was Ω(log n) coming from the finite-armed case. We prove: Ω(√n). Finite- Armed 2 א 0 - Armed Ex Anteθ(log n) Ω(√n) O(n ¾ ) Ex Postθ(√n) ?

Lower Bound: Decision Tree Setup x D(x) 0 1 1

Lower Bound: Decision Tree Setup ½ ¼¾ ⅛⅜⅝⅞ x D(x)

Lower Bound: Decision Tree Setup ½ ¼¾ ⅛⅜⅝⅞ x D(x)

Lower Bound: Decision Tree Setup ½ ¼¾ ⅛⅜⅝⅞ x D(x)

Lower Bound: Decision Tree Setup ½ ¼¾ ⅛⅜⅝⅞ vivi ALGOPTReg

How not to prove a lower bound! Natural idea: Lower bound on incremental regret at each level… ½ ¼¾ ⅛⅜⅝⅞

How not to prove a lower bound! Natural idea: Lower bound on incremental regret at each level… If regret is Ω(j -½ ) at level j, then total regret after n steps would be Ω(√n). ½ ¼¾ ⅛⅜⅝⅞ 1 √½ √⅓ 1 + √½ + √⅓ + … = Ω(√n)

How not to prove a lower bound! Natural idea: Lower bound on incremental regret at each level… If regret is Ω(j -½ ) at level j, then total regret after n steps would be Ω(√n). This is how lower bounds were proved for the finite-armed bandit problem, for example. ½ ¼¾ ⅛⅜⅝⅞ 1 √½ √⅓ 1 + √½ + √⅓ + … = Ω(√n)

How not to prove a lower bound! The problem: If you only want to minimize incremental regret at level j, you can typically make it O(1/j). Combining the lower bounds at each level gives only the very weak lower bound Regret = Ω(log n). ½ ¼¾ ⅛⅜⅝⅞ 1 ½ ⅓ 1 + ½ + ⅓ + … = Ω(log n)

How to prove a lower bound So instead a subtler approach is required. Must account for the cost of experimentation. We define a measure of knowledge, K D such that regret scales at least linearly with K D. K D = ω(√n) → TOO COSTLY K D = o(√n) → TOO RISKY ½ ¼¾ ⅛⅜⅝⅞

Discussion of lower bound Our lower bound doesn’t rely on a contrived demand curve. In fact, we show that it holds for almost every demand curve satisfying some “generic” axioms. (e.g. smoothness)

Discussion of lower bound Our lower bound doesn’t rely on a contrived demand curve. In fact, we show that it holds for almost every demand curve satisfying some “generic” axioms. (e.g. smoothness) The definition of K D is quite subtle. This is the hard part of the proof.

Discussion of lower bound Our lower bound doesn’t rely on a contrived demand curve. In fact, we show that it holds for almost every demand curve satisfying some “generic” axioms. (e.g. smoothness) The definition of K D is quite subtle. This is the hard part of the proof. An ex post lower bound of Ω(√n) is easy. The difficulty is solely in strengthening it to an ex ante lower bound.

Open Problems Close the log-factor gaps in random and worst-case models.

Open Problems Close the log-factor gaps in random and worst-case models. What if buyers have some control over the timing of their arrival? Can a temporally strategyproof mechanism have o(n) regret? [Parkes]

Open Problems Close the log-factor gaps in random and worst-case models. What if buyers have some control over the timing of their arrival? Can a temporally strategyproof mechanism have o(n) regret? [Parkes] Investigate online posted-price combinatorial auctions, e.g. auctioning paths in a graph. [Hartline]