The Roles of Uncertainty and Randomness in Online Advertising Ragavendran Gopalakrishnan Eric Bax Raga Gopalakrishnan 2 nd Year Graduate Student (Computer.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Estimation of Means and Proportions
COMM 472: Quantitative Analysis of Financial Decisions
CPS Bayesian games and their use in auctions Vincent Conitzer
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Confidence Intervals Chapter 8.
Kriging.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
Portfolio Diversity and Robustness. TOC  Markowitz Model  Diversification  Robustness Random returns Random covariance  Extensions  Conclusion.
Chapter 14 Comparing two groups Dr Richard Bußmann.
6.853: Topics in Algorithmic Game Theory Fall 2011 Matt Weinberg Lecture 24.
Yang Cai Sep 17, An overview of today’s class Expected Revenue = Expected Virtual Welfare 2 Uniform [0,1] Bidders Example Optimal Auction.
Visual Recognition Tutorial
Error Propagation. Uncertainty Uncertainty reflects the knowledge that a measured value is related to the mean. Probable error is the range from the mean.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Presenting: Assaf Tzabari
Yang Cai Sep 15, An overview of today’s class Myerson’s Lemma (cont’d) Application of Myerson’s Lemma Revelation Principle Intro to Revenue Maximization.
Thanks to Nir Friedman, HU
Lecture II-2: Probability Review
1 Terminating Statistical Analysis By Dr. Jason Merrick.
IE 594 : Research Methodology – Discrete Event Simulation David S. Kim Spring 2009.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 The Black-Scholes-Merton Model MGT 821/ECON 873 The Black-Scholes-Merton Model.
Advanced Risk Management I Lecture 6 Non-linear portfolios.
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
II: Portfolio Theory I 2: Measuring Portfolio Return 3: Measuring Portfolio Risk 4: Diversification.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
Online Financial Intermediation. Types of Intermediaries Brokers –Match buyers and sellers Retailers –Buy products from sellers and resell to buyers Transformers.
Copyright © 2011 Pearson Education, Inc. Association between Random Variables Chapter 10.
Chapter 10 Capital Markets and the Pricing of Risk.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Brian Macpherson Ph.D, Professor of Statistics, University of Manitoba Tom Bingham Statistician, The Boeing Company.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Chapter McGraw-Hill/Irwin Copyright © 2008 by The McGraw-Hill Companies, Inc. All rights reserved. Risk and Capital Budgeting 13.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Covariance Estimation For Markowitz Portfolio Optimization Ka Ki Ng Nathan Mullen Priyanka Agarwal Dzung Du Rezwanuzzaman Chowdhury 14/7/2010.
Ronan McNulty EWWG A general methodology for updating PDF sets with LHC data Francesco de Lorenzi*, Ronan McNulty (University College Dublin)
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 8 Interval Estimation Population Mean:  Known Population Mean:  Known Population.
Bayesian Prior and Posterior Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Nov. 24, 2000.
Primbs1 Receding Horizon Control for Constrained Portfolio Optimization James A. Primbs Management Science and Engineering Stanford University (with Chang.
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
Systems Realization Laboratory The Role and Limitations of Modeling and Simulation in Systems Design Jason Aughenbaugh & Chris Paredis The Systems Realization.
6.853: Topics in Algorithmic Game Theory Fall 2011 Constantinos Daskalakis Lecture 22.
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
1 Stock Valuation Topic #3. 2 Context Financial Decision Making Debt Valuation Equity Valuation Derivatives Real Estate.
Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
CHAPTER- 3.2 ERROR ANALYSIS. 3.3 SPECIFIC ERROR FORMULAS  The expressions of Equations (3.13) and (3.14) were derived for the general relationship of.
Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Bayesian Algorithmic Mechanism Design Jason Hartline Northwestern University Brendan Lucier University of Toronto.
Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
General approach: A: action S: pose O: observation Position at time t depends on position previous position and action, and current observation.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
MLPR - Questions. Can you go through integration, differentiation etc. Why do we need priors? Difference between prior and posterior. What does Bayesian.
Bayesian games and their use in auctions
Usman Roshan CS 675 Machine Learning
Multiple Imputation using SOLAS for Missing Data Analysis
The Markowitz’s Mean-Variance model
Financial Market Theory
Econ 3790: Business and Economics Statistics
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Learning From Observed Data
Multivariate Methods Berlin Chen
Simulation Berlin Chen
Presentation transcript:

The Roles of Uncertainty and Randomness in Online Advertising Ragavendran Gopalakrishnan Eric Bax Raga Gopalakrishnan 2 nd Year Graduate Student (Computer Science), Caltech Product Manager (Marketplace Design), Yahoo!

AD-SLOT Display Advertising

Simple Model for Display Advertising AD-SLOT AD SELECTION ALGORITHM ad calls ads w/ bids resultant matching (selected ad for each ad call) implement feedback webpage

Objective  Make Money! ? m ad calls ad slot ad 1ad 2ad n k1k1 k2k2 knkn b1b1 b2b2 bnbn s1s1 s2s2 snsn... Bid Value Response Rate May not be the right thing to do, for two reasons: – Reason 1: Not Incentive Compatible – Reason 2: Coming up…

The Caveat The response rate is not known, it has to be estimated. The actual revenue differs from the estimated expected revenue due to two factors: – Uncertainty (error in estimating response rates s i ) – Randomness (fluctuations around the response rate: )

billion ad calls per day AD 1AD 2 ad slot $1 per response w/ prob w/ prob ½ w/ prob ½ $1 million $1000 (0.1%)$0.3 million (30%) Bid Value Estimated Response Rate Estimated Expected Revenue Standard Deviation of Revenue How bad can Uncertainty be?

How can we combat it? How much time do we have? Long-TermShort-Term LEARNING RISK SPREADING MAIN FOCUS ? Future Work Again, these solutions are not automatically incentive compatible.

m ad calls ad slot ad 1ad 2ad n k1k1 k2k2 knkn b1b1 b2b2 bnbn S1S1... Bid Value Response Rate S2S2 SnSn RevenueX 1 (S 1 ) X 2 (S 2 )X n (S n )X i (S i ) X 1i (S i ) X 2i (S i )X mi (S i )... Model for Variance of Revenue

Model for Variance (contd.) The variance of the revenue can be derived as: Independent Returns Case: UNCERTAINTYRANDOMNESS

ad 1 S k ad calls Mean = pStd. Dev. = d*p X(S) is Bernoulli w/ parameter S Fraction of Variance Due to Uncertainty is Factors affecting Variance

Uncertainty or Randomness?

Bottom Line Uncertainty can be really bad Real World – Uncertainty dominates Long-TermShort-Term LEARNING RISK SPREADING SOLUTION

ad 1 v learning ad calls u responses p real : Real response rate (unknown) Estimate p real as p = u/v ad 1 k ‘real’ ad calls Fraction of Variance due to Uncertainty is Effect of Learning

Bottom Line Uncertainty can be really bad Real World – Uncertainty dominates Long-TermShort-Term LEARNING RISK SPREADING SOLUTION

AD 1AD 2 $1 per response w/ prob w/ prob ½ w/ prob ½ $1 million $1000 (0.1%)$0.3 million (30%) Bid Value Estimated Response Rate Estimated Expected Revenue Standard Deviation of Revenue billion ad calls per day Variance of Revenue New Strategy: Use each of a billion ads iid to AD 2 on each ad call Variance of revenue = Effect of Risk Sharing

Formalize Risk-Sharing The goal of sharing risk and bringing the variance down motivates the following optimization problem:

generate response rates Normal Distribution  = 0.001,  = “CPC” ADS generate response rates Normal Distribution  = ,  = “CPA” ADS Bid $1 Bid $10 Simulations Start with an assumed prior (uniform, approximate or exact) All 20 ads are given learning ad calls each, responses are counted, corresponding posteriors are obtained using Bayes’ Rule Method 1 (Portfolio): Compute the optimal portfolio and allocate ad calls accordingly Method 2 (Single Winner): Allocate all ad calls to the ad with the highest estimated expected revenue Compare Results

Estimated Expected Revenue

Uniform Prior – Actual Expected Revenue

Uniform Prior – Efficiency

Uniform Prior – Allocation by share of ad calls

Uniform Prior – Allocation by actual expected revenue

Exact Prior – Actual Estimated Revenue

Exact Prior – Allocation by share of ad calls

Exact Prior – Allocation by actual expected revenue

Approximate Prior – Actual Expected Revenue

Approximate Prior – Allocation by share of ad calls

Approximate Prior – Allocation by actual expected revenue

A Word of Caution – Covariance Randomness is usually uncorrelated over different ad calls. More often than not, uncertainty is correlated over multiple ads, as their response rates could be estimated through a common learning algorithm. Covariance can be estimated from empirical data, using models that are specific to the contributing factors (e.g., specific learning methods used).

Summary Actual Revenue differs from Estimated Expected Revenue for two reasons – uncertainty and randomness. Uncertainty can be very bad, and dominates randomness in most cases. Learning helps reduce uncertainty in the long run, but in the short run, portfolio optimization (risk distribution) is one way to combat uncertainty. Simulations show that actual revenue can improve as an important side effect of reducing uncertainty.

Further Directions… Can we tie up the long term and short term solutions? – Example: Consider the explore-exploit family of learning methods. – After every explore step, we have better estimates of response rates, but they may still be bad. So the exploit phase could be replaced with the portfolio optimization step! – Side Effect: Additional exploration in the “exploit” phase. – Is this an optimal way of mixing the two? Financial Markets – does it make sense for risk- neutral investors to employ portfolio optimization? Incentive Compatibility – can we deal with it?

Thank You Questions?