Probabilistic Methods for Targeted Advertising Max Chickering Microsoft Research.

Slides:



Advertisements
Similar presentations
UNIT C The Business of Fashion
Advertisements

Conceptual Clustering
808 Main St. Fitchburg, MA Dutton St Lowell, MA
Sponsored Search Cory Pender Sherwin Doroudi. Optimal Delivery of Sponsored Search Advertisements Subject to Budget Constraints Zoe Abrams Ofer Mendelevitch.
Copyright © 2014 Criteo millions de prédictions par seconde Les défis de Criteo Nicolas Le Roux Scientific Program Manager - R&D.
Target Markets: Segmentation and Evaluation
Business Statistics for Managerial Decision
Revenue Maximization in Probabilistic Single-Item Auctions by means of Signaling Joint work with: Yuval Emek (ETH) Iftah Gamzu (Microsoft Israel) Moshe.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown.
Chapter Eight Traffic Building “A wealth of information creates a poverty of attention.” ~ Herbert Simon.
1 Asim Ansari Carl Mela E-Customization. Page 2 Introduction Marketing Targeted Promotions List Segmentation Conjoint Analysis Recommendation Systems.
A Heuristic Bidding Strategy for Multiple Heterogeneous Auctions Patricia Anthony & Nicholas R. Jennings Dept. of Electronics and Computer Science University.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
chapter 9 Communication McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc., All Rights Reserved.
Clustered or Multilevel Data
Research & Consumer Behavior H Edu Activity On The Draw On The Draw “Drawing” the Customer “Drawing” the Customer.
Thanks to Nir Friedman, HU
Computational Methods for Management and Economics Carla Gomes Module 4 Displaying and Solving LP Models on a Spreadsheet.
Interactive Brand Communication Class 9 Targeting the Internet Consumer Kuen-Hee Ju-Pak CSUF.
CBLOCK: An Automatic Blocking Mechanism for Large-Scale Deduplication Tasks Ashwin Machanavajjhala Duke University with Anish Das Sarma, Ankur Jain, Philip.
Electronic Commerce Creating a Successful Web Presence Marketing Strategy.
3 rd Party Data & Audience Targeting © All rights reserved. 3 rd Party Data – Collected both online and offline by 3 rd party data companies such.
1.Understand the decision-making process of consumer purchasing online. 2.Describe how companies are building one-to-one relationships with customers.
Commitment Level In-story Banner Leaderboard, Rectangle, Skyscraper Half Page Interstitials, Sliding Billboards, Corner Peels Text Links (ROS) Weighted.
ADVERTISING, PR, SELLING Internet marketing Prof. Glen L. Urban Spring 2001.
Qualitative and Quantitative Sampling
by B. Zadrozny and C. Elkan
Target Markets: Segmentation and Evaluation
Business English Upper Intermediate U2W09 John Silberstein
Anindya Ghose Sha Yang Stern School of Business New York University An Empirical Analysis of Sponsored Search Performance in Search Engine Advertising.
4.4 Select target marketing appropriate for product/business to obtain the best return on marketing investment.
Preferences and Decision-Making Decision Making and Risk, Spring 2006: Session 7.
Introduction to Probability and Statistics Consultation time: Ms. Chong.
 Collecting Quantitative  Data  By: Zainab Aidroos.
Evaluation Methods and Challenges. 2 Deepak Agarwal & Bee-Chung ICML’11 Evaluation Methods Ideal method –Experimental Design: Run side-by-side.
2002/4/10IDSL seminar Estimating Business Targets Advisor: Dr. Hsu Graduate: Yung-Chu Lin Data Source: Datta et al., KDD01, pp
Types of IP Models All-integer linear programs Mixed integer linear programs (MILP) Binary integer linear programs, mixed or all integer: some or all of.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Chapter 4 MODELING AND ANALYSIS. Model component Data component provides input data User interface displays solution It is the model component of a DSS.
Market Research and Testing The Key To Business Success Revised June 2010.
1 Business System Analysis & Decision Making – Data Mining and Web Mining Zhangxi Lin ISQS 5340 Summer II 2006.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.
Arben Asllani University of Tennessee at Chattanooga Prescriptive Analytics CHAPTER 8 Marketing Analytics with Linear Programming Business Analytics with.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Standard 3 - Marketing Information Management What you’ll learn: Describe the need for Marketing Information Understand marketing-research activities Understand.
Insert Chapter Title Screen. Understand how marketing research can contribute to a firm’s competitive advantage. Understand that market research includes.
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.
Probability Sampling and Marketing Research. Census Versus Sampling A census means measuring every member of a population or target population. It provides.
QM Spring 2002 Business Statistics Probability Distributions.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Learning User Behaviors for Advertisements Click Prediction Chieh-Jen Wang & Hsin-Hsi Chen National Taiwan University Taipei, Taiwan.
Evaluating Classification Performance
Learning Bayesian Networks For Managing Inventory Of Display Advertisements Max Chickering Mad Scientist Live Labs Microsoft Corporation Max Chickering.
Probabilistic Equational Reasoning Arthur Kantor
1 Research Methods in AD/PR COMM 420 Section 8 Tuesday / Thursday 3:35 pm -5:30 pm 143 Stuckeman Nan Yu 2007 Fall_COMM 420_Week NY.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Learning in Bayesian Networks. Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown Structure Incomplete.
UNIT C The Business of Fashion 3.01 Explain the concept of marketing in fashion.
WHAT’S LOCAL LONDON ONLINE? A network of local newspaper websites covering London and the home counties.
Google Display Network. Targeting options.
DEMAND FORECASTING & MARKET SEGMENTATION. Why demand forecasting?  Planning and scheduling production  Acquiring inputs  Making provision for finances.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Lecture 9 Communication.
Basic Project Scheduling
Basic Project Scheduling
Applications of IScore (using R)
Marketing Experiments I
Presentation transcript:

Probabilistic Methods for Targeted Advertising Max Chickering Microsoft Research

Outline Targeted Mailing To whom should you send a solicitation? Targeted Advertising on the Web How should you display banner ads to maximize click-through?

Targeted Mailing Given a population of potential customers. PersonX 1 X 2 …X n 100…red 203.4…blue.... m17…green Sending an advertisement costs money: - Postage - Possible Discount Which potential customers do you solicit?

Motivating Application Advertisement: MSN subscription Potential customers: People who registered Windows 95 Known variables: 15from questionnaire (e.g. gender, RAM size)

Naïve Solutions Mail to those customers most likely to subscribe to MSN Can waste money by targeting customers who would subscribe anyway Mail to everyone Even worse!

Response Behaviors MailDon’t Mail Always buyerYesYes PersuadableYesNo Anti-persuadableNoYes Never buyerNoNo Will the potential customer buy the product? We only make money from mailing to the persuadable potential customers

Expected Profit for a Population Population of N potential cutomers N alw, N per, N anti, N nev Cost of mailing c Solicited and unsolicited revenue r Expected Profit from mailing Profit from not mailing

Lift in Profit From Mailing Profit from mailing - Profit from not mailing For any set of potential customers, we should only mail if the lift is positive.

Learning Expected Lift S  {s 0, s 1 }(did not subscribe, did subscribe) M  {m 0, m 1 }(did not mail, did mail) Identifiable if S, M known in training data Lift : -c + [ p(S=s 1 |M=m 1 ) – p(S=s 1 |M=m 0 ) ]  r

Controlled Experiment: Identify Profitable Sub-Populations 1.Choose a small sample of the potential customers 2.Randomly divide those customers into a “treatment group” (M = m 1 ) and a “control group” (M = m 0 ) 3.Wait a specified period of time, and record S = s 0 or S = s 1 for each

Controlled Experiment PersonX 1 X 2 …X n M S 100…red m 1 s …blue m 0 s m17…green m 1 s 1 Use machine-learning techniques to identify sub-populations with high positive lift, and then target those customers Lift ( Sub-population corresponding to X n =blue ) = -c + [ p(S=s 1 |M=m 1, X n =blue) – p(S=s 1 |M=m 0, X n =blue) ]  r

Identify Profitable Sub-Populations Partitions of X define sub-populations and statistical model for p(S|M,X) defines the lift Approach: Use Decision Trees Known distinctions in our data : X = {X 1, …, X n }, S, M X 1 > 10, X 4 = 2 X 1 < 10, X 12 = false X 1 < 10, X 12 = true Lift 2 Lift 3 Lift 4 X 1 > 10, X 4  2 Lift 1

Probabilistic Decision Trees p(S | M=m 0, X 1 =1, X 2 =2) p(S | M, X 1, X 2 )

X 2 MX 1 M M p(S=subscribed) = 0.6 p(S=not subscribed) = ,3 mailed not mailed 1 2 p(S=subscribed) = 0.5 p(S=not subscribed) = 0.5 p(S=subscribed) = 0.4 p(S=not subscribed) = 0.6 p(S=subscribed) = 0.2 p(S=not subscribed) = 0.8 mailed not mailed not mailed p(S=subscribed) = 0.7 p(S=not subscribed) = 0.3 p(S=subscribed) = 0.3 p(S=not subscribed) = 0.7 Calculating Lift Potential customer with {X 1 =1, X 2 =2}, Assume c = 0.50, r = 9 Lift = (0.4 – 0.2)  9 = 1.3 Mail to this person!

Traditional Learning Algorithm X1X1 Score 1 (Data) X2X2 Score 2 (Data) XnXn Score n (Data) X2X2 X2X2 X1X1 Score 1 (Data) X2X2 X3X3 Score 3 (Data) X2X2 XnXn Score n (Data)

Lift-Aware Learning Algorithm Traditional Learning Algorithm Identify a tree that represents p(S|M,X) well Lift-Aware Would like the tree to be good at modeling the difference: p(S=s 1 |M=m 1,X=x) - p(S=s 1 |M=m 0,X=x)

A Heuristic Only consider decision trees (for S) with the last split on M M X1X1 MM X1X1 MM Score 1 (Data) XnXn MM Score n (Data) X1X1 M Score 2 (Data) X2X2 MM X1X1 M X2X2 MM

Experiment: Real-world Dataset Product of interest: MSN subscription Potential customers: Windows 95 registrants Known variables (X):15 from questionnaire (e.g. gender, RAM size) Cost to Mail:42 cents Subscription revenue:varied from 1 to 15 dollars Data:sample of ~110,000 potential customers (70% train, 30% test) Compared our algorithm (FORCE) with unconstrained greedy algorithm (NORMAL) for various revenues

Results on Test Data: Per-person improvement over Mail-to-All

Conclusions / Future Work Marginal improvement over standard decision-tree algorithm: Almost every path in the “standard” trees contained a split on M. We expect larger difference for other domains. Algorithm works for discounted prices: Expected Profit from mailing Profit from not mailing

Part II: Targeted Advertising on the Web Given information about a visitor, how do you choose which advertisement to display? ???

Goals of Targeted Advertising Maximize $$$ Maximize Clicks Brand Presence

Naïve Targeting Scheme Possible cluster attributes: Current page category Pages the user has visited on the site Known demographics Inferred demographics Previous advertisement clicks Cluster 1Cluster m Step 1: cluster / segment users

Naïve Targeting Scheme Step 2: Advertiser books ads into clusters Step 3: Measure click probabilities Step 4: Show best ad to each cluster Problems: (Inventory management) Ad Quotas Cluster overbooking

Advertisement Allocation Cluster 1Cluster m Ad 1 Ad 2 Ad n x 11 x 21 xn1 xn1 x1mx1m x2mx2m x nm Cluster 2 x 12 x 22 xn2xn2 x ij = Number of times to show advertisement i to user cluster j

Maximize Expected Clicks Cluster 1Cluster m Ad 1 Ad 2 Ad n p 11  x 11 p 21  x 21 pn1  xn1pn1  xn1 p1m  x1mp1m  x1m p2m  x2mp2m  x2m p nm  x nm Cluster 2 p 12  x 12 p 22  x 22 pn2  xn2pn2  xn2

Inventory-Management Constraints Ad i xi1xi1 x im Cluster j x ij xi1xi1 x in

Linear Program Find the schedule X that maximizes: Subject to: Solve using (e.g.) the simplex algorithm

A Simple Targeting System Estimate probabilities Find the optimal schedule Serve ads to cluster j via

Sensitivity to Estimates Cluster 1 Ad 1 Ad Cluster q 1 = q 2 = c 1 = c 2 =k Cluster 1 Ad 1 Ad 2 0 k Cluster 2 k 0 Probabilities: Optimal Schedule:

Solution: Buckets Cluster 1 Ad 1 Ad Cluster q 1 = q 2 = c 1 = c 2 =k Cluster 1 Ad 1 Ad 2 a c Cluster 2 b d Probabilities: Optimal Schedule: a+b+c+d = 2k Secondary (linear) optimization: Ads are shown as close to uniform across all clusters

Passive Experiment: MSNBC (December 1998) Sports News Health Opinion  Clusters defined by the current page group Manual approach: advertisers buy impressions on page groups

~20 clusters ~500 advertisements ~1.6 million impressions / day Passive Experiment: MSNBC (December 1998) Data from day 1: Estimate p ij (ave ~4K data points per probability) Find optimal schedule (less than 1 minute – no buckets) Data from day 2: Re-estimate p ij Evaluate schedule: Result: 20 – 30 % increase over manual schedule

Particular advertiser: 5 ads Data from weekend 1: Estimate p ij (~15K data points per probability) Find optimal schedule (less than 1 second using buckets) Rearrange advertisements for weekend 2 Data from weekend 2: Count the number of clicks and compare to weekend 1 Active Experiment on MSNBC (May 1999)

0 advertisercontrol Weekend 1 (pre target) Weekend 2 (post target) 30% increase for the advertiser, negligible increase for others Predicted a 20% increase on MSNBC Active Experiment Results

Extensions Problem: Increasing total expected clicks across site may decrease clicks for particular advertiser Solution: Add (linear) constraint that expected clicks cannot decrease Passive experiment: MSNBC overall increase still ~20%

Extensions Focus of talk: p ij = expected #clicks from showing ad i to user j In general: u ij = expected utility from showing ad i to user j Expected utility of X = Alternative u ij choices Weighted probabilities: w i p ij Probability of purchase Increase in brand awareness Expected revenue

My Home Page

Results on Test Data: Per-person improvement over Mail-to-All To evaluate test case given a model: Evaluate the lift given X (ignoring M and S) Recommend Mail if and only if Lift > 0 If recommendation matches M from the test case, add r to the total revenue. Otherwise, ignore.