Causal Inference in R Ana Daglis, Farfetch x.

Slides:



Advertisements
Similar presentations
9: Examining Relationships in Quantitative Research ESSENTIALS OF MARKETING RESEARCH Hair/Wolfinbarger/Ortinau/Bush.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Bivariate Regression Analysis
Economics Prof. Buckles1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
SIMPLE LINEAR REGRESSION
Chapter 7 Multicollinearity. What is in this Chapter? In Chapter 4 we stated that one of the assumptions in the basic regression model is that the explanatory.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
T-test.
Chapter 11 Multiple Regression.
© 2004 Prentice-Hall, Inc.Chap 8-1 Basic Business Statistics (9 th Edition) Chapter 8 Confidence Interval Estimation.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Statistics 350 Lecture 17. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Budgeting According to hotel management consultant Kirby Payne, ‘Managing expenses is among the most important things a manager does. (I never say it.
Introduction to Regression Analysis, Chapter 13,
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Objectives of Multiple Regression
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
ECON 6012 Cost Benefit Analysis Memorial University of Newfoundland
Inference for regression - Simple linear regression
Hypothesis Testing in Linear Regression Analysis
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Hypothesis Testing CSCE 587.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics.
Bivariate Regression Analysis The most useful means of discerning causality and significance of variables.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
4 Hypothesis & Testing. CHAPTER OUTLINE 4-1 STATISTICAL INFERENCE 4-2 POINT ESTIMATION 4-3 HYPOTHESIS TESTING Statistical Hypotheses Testing.
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Lecture 10: Correlation and Regression Model.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Chapter 8: Simple Linear Regression Yang Zhenlin.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
1 General Elements in Evaluation Research. 2 Types of Evaluations.
Single Index Model. Lokanandha Reddy Irala 2 Single Index Model MPT Revisited  Take all the assets in the world  Create as many portfolios possible.
An Analysis of Critical Accounting Policies
BINARY LOGISTIC REGRESSION
10.2 Regression If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is.
B&A ; and REGRESSION - ANCOVA B&A ; and
Name of the Business…. The product is….
Analysis of Covariance (ANCOVA)
Regression Analysis Week 4.
Confidence Intervals Tobias Econ 472.
Simultaneous Inferences and Other Regression Topics
Multiple Regression Models
Chengyuan Yin School of Mathematics
Chapter 4, Regression Diagnostics Detection of Model Violation
Confidence Intervals Tobias Econ 472.
Chapter 7: The Normality Assumption and Inference with OLS
Product moment correlation
Regression Forecasting and Model Building
CHAPTER 10 Comparing Two Populations or Groups
Lecturer Dr. Veronika Alhanaqtah
U.S. GAAP Loss Recognition Testing: Actuarial Science Session
BEC 30325: MANAGERIAL ECONOMICS
Presentation transcript:

Causal Inference in R Ana Daglis, Farfetch x

Farfetch Customer Boutique Customer Boutique Customer Boutique x

One of the most common questions we face in marketing is measuring the incremental effects How much incremental revenue did the new pricing strategy drive? What impact did the new feature on the website have? How many incremental conversions were achieved by increasing the commission rate for our affiliates? …

50% of visitors see Version A 50% of visitors see Version B The main gold standard method for estimating causal effects is a randomised experiment 10% Conversion 15% Conversion Version A Version B 50% of visitors see Version A 50% of visitors see Version B

100% of visitors see Version B However, often A/B tests are either too expensive to run or cannot be run, e.g. due to legal reasons 15% Conversion Version A Version B 100% of visitors see Version B

Example: financial performance of a company A Actual share price Scandal broke

Approach: estimate the share price had the scandal not happened Actual share price Predicted share price Scandal broke

By comparing the actual and predicted share price, we can estimate the drop in stock value due to the scandal Actual share price Predicted share price Drop in stock value due to scandal Scandal broke

Thanks to a fully Bayesian approach, we can quantify the confidence level of our predictions Actual share price Predicted share price 95% credible interval Scandal broke

How do we construct the counterfactual estimate? Training Prediction Actual share price Predicted share price 95% credible interval Company B share price Company C share price Scandal broke

Most general form of the model Causal Impact methodology is based on a Bayesian structural time series model Most general form of the model Causal Impact model 𝑦 𝑡 = 𝑍 𝑡 𝑇 𝛼 𝑡 + 𝜀 𝑡 𝛼 𝑡+1 = 𝑇 𝑡 𝑇 𝛼 𝑡 + 𝑅 𝑡 𝜂 𝑡 Observation equation 𝑦 𝑡 = 𝜇 𝑡 + 𝜏 𝑡 + 𝑥 𝑡 𝑇 𝛽+ 𝜀 𝑡 𝜇 𝑡+1 = 𝜇 𝑡 + 𝛿 𝑡 + 𝜂 𝜇,𝑡 𝛿 𝑡+1 = 𝛿 𝑡 + 𝜂 𝛿,𝑡 𝜏 𝑡+1 =− 𝑖=0 𝑆−2 𝜏 𝑡−𝑖 + 𝜂 𝜏,𝑡 State equation

The model has 5 main parameters: 4 variance terms 𝝈 𝜺 𝟐 , 𝝈 𝝁 𝟐 , 𝝈 𝜹 𝟐 , 𝝈 𝝉 𝟐 and regression coefficients 𝜷 𝑦 𝑡 = 𝜇 𝑡 + 𝜏 𝑡 + 𝑥 𝑡 𝑇 𝛽+ 𝜀 𝑡 𝜇 𝑡+1 = 𝜇 𝑡 + 𝛿 𝑡 + 𝜂 𝜇,𝑡 𝛿 𝑡+1 = 𝛿 𝑡 + 𝜂 𝛿,𝑡 𝜏 𝑡+1 =− 𝑖=0 𝑆−2 𝜏 𝑡−𝑖 + 𝜂 𝜏,𝑡 ~𝒩 0, 𝜎 𝜀 2 ~𝒩 0, 𝜎 𝜇 2 ~𝒩 0, 𝜎 𝛿 2 ~𝒩 0, 𝜎 𝜏 2

We impose an inv-gamma prior on 𝝈 𝜺 𝟐 , with parameters 𝒔 𝜺 and 𝒗 𝜺 selected based on the expected goodness-of-fit 𝑦 𝑡 = 𝜇 𝑡 + 𝜏 𝑡 + 𝑥 𝑡 𝑇 𝛽+ 𝜀 𝑡 𝜇 𝑡+1 = 𝜇 𝑡 + 𝛿 𝑡 + 𝜂 𝜇,𝑡 𝛿 𝑡+1 = 𝛿 𝑡 + 𝜂 𝛿,𝑡 𝜏 𝑡+1 =− 𝑖=0 𝑆−2 𝜏 𝑡−𝑖 + 𝜂 𝜏,𝑡 Priors ~𝒩 0, 𝜎 𝜀 2 𝜎 𝜀 2 ~ 𝐼𝑛𝑣−𝐺𝑎𝑚𝑚𝑎 𝑠 𝜀 , 𝑣 𝜀 ~𝒩 0, 𝜎 𝜇 2 ~𝒩 0, 𝜎 𝛿 2 ~𝒩 0, 𝜎 𝜏 2

𝜎 𝜇 2 , 𝜎 𝛿 2 , 𝜎 𝜏 2 ~ 𝐼𝑛𝑣−𝐺𝑎𝑚𝑚𝑎 1, 0.01 × 𝑉𝑎𝑟(𝑦) We impose weak priors on 𝝈 𝝁 𝟐 , 𝝈 𝜹 𝟐 and 𝝈 𝝉 𝟐 reflecting the assumption that errors are small in the state process 𝑦 𝑡 = 𝜇 𝑡 + 𝜏 𝑡 + 𝑥 𝑡 𝑇 𝛽+ 𝜀 𝑡 𝜇 𝑡+1 = 𝜇 𝑡 + 𝛿 𝑡 + 𝜂 𝜇,𝑡 𝛿 𝑡+1 = 𝛿 𝑡 + 𝜂 𝛿,𝑡 𝜏 𝑡+1 =− 𝑖=0 𝑆−2 𝜏 𝑡−𝑖 + 𝜂 𝜏,𝑡 Priors ~𝒩 0, 𝜎 𝜀 2 𝜎 𝜇 2 , 𝜎 𝛿 2 , 𝜎 𝜏 2 ~ 𝐼𝑛𝑣−𝐺𝑎𝑚𝑚𝑎 1, 0.01 × 𝑉𝑎𝑟(𝑦) ~𝒩 0, 𝜎 𝜇 2 ~𝒩 0, 𝜎 𝛿 2 ~𝒩 0, 𝜎 𝜏 2

We let the model choose an appropriate set of controls by placing a spike and slab prior over coefficients 𝜷 𝑦 𝑡 = 𝜇 𝑡 + 𝜏 𝑡 + 𝑥 𝑡 𝑇 𝛽+ 𝜀 𝑡 𝜇 𝑡+1 = 𝜇 𝑡 + 𝛿 𝑡 + 𝜂 𝜇,𝑡 𝛿 𝑡+1 = 𝛿 𝑡 + 𝜂 𝛿,𝑡 𝜏 𝑡+1 =− 𝑖=0 𝑆−2 𝜏 𝑡−𝑖 + 𝜂 𝜏,𝑡 Priors ~𝒩 0, 𝜎 𝜀 2 𝑝 𝜚 ~ 𝑗=1 𝐽 𝜋 𝑗 𝜚 𝑗 (1− 𝜋 𝑗 ) 𝜚 𝑗 ~𝒩 0, 𝜎 𝜇 2 𝛽 𝛾 | 𝜎 𝜀 2 ~ 𝒩(0, 𝑛𝜎 𝜀 2 𝑋 𝑇 𝑋 −1 ) ~𝒩 0, 𝜎 𝛿 2 ~𝒩 0, 𝜎 𝜏 2

The inference can be performed in R with just 6 lines of code 1 library(CausalImpact) 2 pre.period <- as.Date(c("2011-01-03", "2015-09-14")) 3 post.period <- as.Date(c("2015-09-21", "2017-03-19")) 4 impact <- CausalImpact(data, pre.period, post.period) 5 plot(impact) 6 summary(impact)

Results can be plotted and summarised in a table Cumulative panel only makes sense when the metric is additive, such as clicks or the number of orders, but not in the case when it is a share price

The package can even write a report for you!

Additional considerations It is important that covariates included in the model are not themselves affected by the event. For each covariate included, it is critical to reason why this is the case. The model can be validated by running the Causal Impact analysis on an ‘imaginary event’ before the actual event. We should not be seeing any significant effect, and actual and predicted lines should match reasonably closely before the actual event.

References K.H. Brodersen, F. Gallusser, J. Koehler, N. Remy, S. L. Scott, (2015). Inferring Causal Impact Using Bayesian Structural Time- Series Models. https://research.google.com/pubs/pub41854.html. S. L. Scott, H. Varian, (2013). Predicting the Present with Bayesian Structural Time Series. https://people.ischool.berkeley.edu/~hal/Papers/2013/pred- present-with-bsts.pdf.

Thank you!