Bandit’s Paradise: The Next Generation of Test-and-Learn Marketing

Slides:

Advertisements

Similar presentations

Yinyin Yuan and Chang-Tsun Li Computer Science Department

Advertisements

Brief introduction on Logistic Regression

INFO 624 Week 3 Retrieval System Evaluation

An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Simulation Waiting Line. 2 Introduction Definition (informal) A model is a simplified description of an entity (an object, a system of objects) such that.

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

EVAL 6970: Cost Analysis for Evaluation Dr. Chris L. S. Coryn Nick Saxton Fall 2014.

Random Sampling, Point Estimation and Maximum Likelihood.

INFORMS Annual Meeting San Diego 1 HIERARCHICAL KNOWLEDGE GRADIENT FOR SEQUENTIAL SAMPLING Martijn Mes Department of Operational Methods for.

Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”

Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering.

E-Marketing Strategic E-Marketing and Performance Metrics 2-1.

Methods of Presenting and Interpreting Information Class 9.

INTRODUCTION TO WEB HOSTING

A New Algorithm for Unequal Allocation Randomization

Types of risk Market risk

OPERATING SYSTEMS CS 3502 Fall 2017

Analysis for Designs with Assignment of Both Clusters and Individuals

Figure 5: Change in Blackjack Posterior Distributions over Time.

Chapter 7. Classification and Prediction

Market-Risk Measurement

Prepared by Lloyd R. Jaisingh

Adaptive, Personalized Diversity for Visual Discovery

Monte Carlo simulation

Why your conversion rates suck?

CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016

7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.

Chapter 6: Temporal Difference Learning

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017

Drum: A Rhythmic Approach to Interactive Analytics on Large Data

ACCURACY IN PERCENTILES

EPID 503 – Class 6 Standardization.

Session II: Reserve Ranges Who Does What

Measuring Social Life: How Many? How Much? What Type?

INTRODUCTION DIGITAL MARKETINGDIGITAL MARKETING IS A WAY OF MARKETING THE BRANDS OR THINGS BY USING DIGITAL MEDIA SUCH AS MOBILE PHONES, INTERNET, COMPUTERS.

Multi-Agent Exploration

2018 Sponsorship Package.

Bandits for Taxonomies: A Model-based Approach

Professor Arne Thesen, University of Wisconsin-Madison

Types of risk Market risk

CAP 5636 – Advanced Artificial Intelligence

Utilizing the Capabilities of Microsoft Azure, Skipper Offers a Results-Based Platform That Helps Digital Advertisers with the Marketing of Their Mobile.

Activity-Based Costing Systems

Methods of Economic Investigation Lecture 12

 Real-Time Scheduling via Reinforcement Learning

Marketing and Advertising in E-Commerce

Probability Topics Random Variables Joint and Marginal Distributions

Discrete Event Simulation - 4

Instructors: Fei Fang (This Lecture) and Dave Touretzky

CS 188: Artificial Intelligence Fall 2008

Chapter 2: Evaluative Feedback

Integration of sensory modalities

Learning a Policy for Opportunistic Active Learning

Estimation Error and Portfolio Optimization

October 6, 2011 Dr. Itamar Arel College of Engineering

Chapter 6: Temporal Difference Learning

Analytics – Statistical Approaches

Mathematical Foundations of BME Reza Shadmehr

Day 3 Outline Social media overview + trends Social media strategy

COMP60611 Fundamentals of Parallel and Distributed Systems

Deep Reinforcement Learning: Learning how to act using a deep neural network Psych 209, Winter 2019 February 12, 2019.

Optimal Basket Designs for Efficacy Screening with Cherry-Picking

Marketing Experiments I

Using Industrial Operating Models for Cost Benefit Analysis

Chapter 2: Evaluative Feedback

Metrics to Track for Social Media Success

Reinforcement Learning

Modeling and Simulation: Exploring Dynamic System Behaviour

Presentation transcript:

Bandit’s Paradise: The Next Generation of Test-and-Learn Marketing Professor Peter Fader The Wharton School, University of Pennsylvania Co-director, Wharton Customer Analytics Initiative Joint work with Eric Schwartz and Eric Bradlow

STARTING POINT: A/B(/C/D/E…) Testing Randomly divide customers into two (or more groups) Expose each group to different advertisements Measure the outcomes for each group Compare outcomes using confidence intervals

Multivariate testing (MVT) Multivariate testing is simply testing multiple features of your advertising in the same test You should use a multivariate test when: You want to know the relative effects of the different features You think there may be interactions between the features By making some reasonable assumptions about the interactions, you can work out the effect for each feature with fewer customers and without testing all combinations of features.

“State of the art” test-and-learn Adaptive testing “State of the art” test-and-learn Run A/B, A/B/n, or multivariate testing (MVT) Crown the winner Earn-and-learn Make profit while learning Continuous improvement Extensions and natural complications Large set of candidate ads to compare Very rare events (e.g., acquisitions via display ads) Batches of decisions (e.g., “chunky” allocations) Different contexts (e.g., websites differ) I’m going to use the language on online display advertising – with focus on customer acquisition – to make things tangible. Contexts: you’ve bought eh media in advance

Which Ad WILL Bring in the Most Customers? You might just run an A/B/C test. But are we really sure that we can crown the winner? How (and when) do we know? And there’s clearly an attribute structure here. But this is just a small sample…

How should we allocate impressions across many ads (served on many websites) to acquire more customers? This is the real experiment MVT could take a long long time…

typical analytics for experiments Snapshot from a well-established online testing service. But it doesn’t directly inform action.

Introducing the multi-armed bandit $$$ 50+years old. Even applied to advertising. But reality is more complicated.

Broader class of earn-and-Learn problems

Field experiment: summary and scope 4 ad concepts and 3 ad sizes

This is the real experiment MVT could take a long long time…

Field experiment: summary and scope 4 ad concepts and 3 ad sizes 80+ media placements (including websites, portals, ad networks, ad exchanges) 500+ million impressions Conversion rates in line with industry standards (between 1 and 10 out of 1 million impressions)

Field experiment: Time Line Value V Estimate Act E A Initialize Time of the experiment Schwartz AMA 2012

Managing the Multi-Armed Bandit Static, balanced design (equal allocation) Adaptive, “greedy” methods (winner take all) Adaptive, randomized (smooth allocation) Agarwal et al. 2008; Auer et al. 2002; Bertsimas and Mersereau 2007; Lai 1987; Scott 2010; Rumsmevichientong and Tsitsiklis 2010 Gonul and Shi 1998; Gonul and ter Hosftede 2006; Hauser et al. 2009; Montoya et al. 2011; Simester et al. 2006; Sun et al. 2006 Schwartz AMA 2012

Formalizing the optimization problem Hierarchical attribute-based K-armed bandit with J contexts and batching Objective function Bellman equation But this can’t be solved directly…

How is the optimization problem solved? For independent actions, no heterogeneity, one-at-a-time decisions … The Gittins Index is the optimal certainty equivalent of an uncertain arm In other words, it perfectly reflects the “exploration bonus” for each arm

Managing the Multi-Armed Bandit Winner-take-all policies Observed mean (“Greedy”) Gittins Index Conversion rate (scale masked) Observed Mean Allocations of impressions across ads Gittins Index “Exploration bonus” Why should we give up on A & C? And should we truly/always give up on B? Advertisements A B C D Advertisements

How is the optimization problem solved? For the standard sequential optimization problem The Gittins Index is the optimal certainty equivalent of an uncertain arm In other words, it perfectly reflects the “exploration bonus” for each arm But that’s not our problem… Actions are not independent; they are described by attributes Attribute structure across actions improves learning Doesn’t account for heterogeneity Doesn’t say anything about batching

Managing the Multi-Armed Bandit Randomized Probability Matching Conversion rate (scale masked) Allocate resources to each action in proportion to the probability that is the best action. Allocations of impressions across ads B brought in no conversions in this particular time period, but do we really believe it? No! Let’s focus on the underlying structure which generates the probabilities – not just the observed outcomes. Advertisements A B C D Advertisements Berry 2004; Chapelle and Li 2012;Granmo 2010; May et al 2011; Scott 2010; Thompson 1933.

The benefits of Randomized probability matching Explore/Exploit is achieved by sampling from full distribution for each action RPM is asymptotically optimal in maximizing cumulative reward (i.e., loss shrinks in log time) Attribute structure easy to accommodate in a standard hierarchical logit model of conversions:

Field experiment: Time Line Value V Estimate Act E A Initialize Time of the experiment Schwartz AMA 2012

Field experiment: implementation details Timing Update every 6 days for 61 days in 2012 RPM allocation probabilities are “rotation weights” Receive data and upload weights directly to Google DoubleClick DART (Dynamic Advertising Reporting and Targeting)

Field experiment: basic results But significant heterogeneity across websites Conversion rate indexed as percent of average Ad A Ad B Ad C Ad D Tall 160x600 117 88 99 176 Square 300x250 107 72 151 114 Wide 728x90 115 92 66 All Sizes 112 80 100 105

Field experiment: Adaptive versus Not Adaptive 8% About 2000 question acquired overall, so about 150 incremental due to RPM Adaptive: Experiment with changing weights (RPM) Not Adaptive: Static balanced experiment

Simulation study

Total reward distribution

Cumulative conversion rate over time

Proportion of impressions for “best ad”

What should you take away? View interactive marketing problems as adaptive experiments Consider the exploration-exploitation tradeoff when facing uncertainty Use the multi-armed bandit as a logical framework for testing – and randomized probability matching as the way to solve it Test and learn… profitably! Earn and learn!

THANK YOU faderp@wharton.upenn.edu http://www.petefader.com/ Twitter: @faderp http://www.wharton.upenn.edu/wcai/ @whartonCAI