Download presentation
Presentation is loading. Please wait.
Published byJudith Holmes Modified over 9 years ago
1
Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad
2
Movie Buzz Can we predict the success of a movie? 1.Avatar (2009)$760,505,847 2.Titanic (1997)$658,672,302 3.The Avengers (2012)$623,279,547 4.The Dark Knight (2008)$533,316,061 5.Star Wars: Episode I – The Phantom Menace (1999)$474,544,677
3
Data Box_mil = First run U.S. box office (Millions of $) MPRating = 1 if movie is PG13 or R, 0 if the movie is G or PG. Budget = Production budget (Millions of $) Starpowr = Index of star power Sequel = 1 if movie is a sequel, 0 if not Action = 1 if action film, 0 if not Comedy = 1 if comedy film, 0 if not Animated = 1 if animated film, 0 if not Horror = 1 if horror film, 0 if not Addict = Trailer views at traileraddict.com Cmngsoon = Message board comments at comingsoon.net Fandango = Attention at fandango.com Cntwait3 = Percentage of Fandango votes that can't wait to see.
4
Statistics Course Outline P ART I. I NTRODUCTION AND R ESEARCH D ESIGN P ART II. D ESCRIBING DATA P ART III. D RAWING CONCLUSIONS FROM DATA : I NFERENTIAL S TATISTICS P ART IV. : C ORRELATION AND C AUSATION : T WO G ROUPS, R EGRESSION A NALYSIS Week 1 Weeks 2-4 Weeks 5-9 Weeks 10-14 Multivariate regression now! Estimating a parameter using sample statistics. Confidence Interval at 90%, 95%, 99% Testing a hypothesis using the CI method and the t method. Sample statistics: Mean, Median, SD, Variance, Percentiles, IQR, Empirical Rule Bivariate sample statistics: Correlation, Slope Four Steps of “Thinking Like a Statistician” Study Design: Simple Random Sampling, Cluster Sampling, Stratified Sampling Biases: Nonresponse bias, Response bias, Sampling bias
5
Coming up “Comparison of Two Groups” Last week. “Univariate Regression Analysis” Last Saturday, Section 9.5. “Association and Causality: Multivariate Regression” Last Saturday, Chapter 10. Today, Tomorrow, Chapter 11. “Randomized Experiments and ANOVA”. Wednesday. Chapter 12. “Robustness Checks and Wrap Up”. Last Thursday.
6
Outline 1.Multivariate regression 2.Interpreting coefficients Ceteris Paribus 3.Standardized Coefficient 4.Multiple Correlation and R Squared Next time:Multivariate regression: the F test (Continued)
7
Data: Variables yBox = First run U.S. box office ($) x 1 MPRating = 1 if movie is PG13 or R, 0 if the movie is G or PG. x 2 Budget = Production budget ($Mil) x 3 Starpowr = Index of star power x 4 Sequel = 1 if movie is a sequel, 0 if not x 5 Action = 1 if action film, 0 if not x 6 Comedy = 1 if comedy film, 0 if not x 7 Animated = 1 if animated film, 0 if not x 8 Horror = 1 if horror film, 0 if not x 9 Addict = Trailer views at traileraddict.com x 10 Cmngsoon = Message board comments at comingsoon.net x 11 Fandango = Attention at fandango.com x 12 Cntwait3 = Percentage of Fandango votes that can't wait to see.
8
Multivariate Regression With variables x 1, x 2, …, x 12. We are trying to get the true impact: 1 of variable x 1 on y. 2 of variable x 2 on y. …… 12 of variable x K on y. True model: y = + 1 x 1 + 2 x 2 + 3 x 3 + … + 12 x 12 + We would get those if we had the population of all possible movies.
9
Instead we estimate b 1, b 2, …, b K on the sample: – Minimizing the sum of the squared prediction error ! With these we can predict the success of a movie: Multivariate Regression
10
Sampling Distribution of b 3 We only observe one coefficient estimate b 3, because we have only one sample. But across all possible samples, the sampling distribution of b 3 is bell-shaped. Hence we can design a test: H 0 : “ 3 = 0 ” follows a t distribution with N – (K + 1) degrees of freedom. Under H 0,
11
Hypothesis testing for H 0 : “ 3 =0” Reject the null hypothesis at 95% if: – The absolute value of the t statistic is greater than the t score with N – (K+1) degrees of freedom at 95%. – Equivalently, if the p value is lower than 0.05. There are as many null hypothesis as there are coefficients to estimate : Here, there are
12
Outline 1.Multivariate regression 2.Interpreting coefficients Ceteris Paribus 3.Standardized Coefficient 4.Multiple Correlation and R Squared Next time:Multivariate regression (Continued)
13
Ceteris Paribus =“All other things equal” “All other things equal”, what is the impact of variable x 3 on box office outcome in millions of $? Increase in starpower (variable x 3 ) all other things equal. Keep x 1,x 2,x 4,x 5,x 6,x 7,x 8,x 9,x 10,x 12 constant ! And change x 3. Increase in x3 (Star power)
14
Ceteris Paribus =“All other things equal” “All other things equal”, what is the impact of variable x 3 on box office outcome in millions of $? Increase in budget(variable x 2 ) all other things equal. Keep x 1,x 3,x 4,x 5,x 6,x 7,x 8,x 9,x 10,x 12 constant ! And change x 3. Increase in x 2 (Budget) by 1 million $
16
Reading the coefficients An increase in budget by 1 million $ leads to a rise in box office $ of 0.144 million $, all other things equal. An action movie has on average all other things equal a lower box office outcome, by $12 million. An increase in the ‘Percentage of Fandango votes that can't wait to see’ (cntwait3) by 1 percentage point leads to a 0.01 * 32.15 = 0.3215 M$ increase in box office outcome in $. We multiply by 0.01 (1%) because cntwait3 ranges from 0 to 1.
17
Which coefficients are statistically significant? x 1 MPRating = 1 if movie is PG13 or R, 0 if the movie is G or PG. ❏❏❏ x 2 Budget = Production budget ($Mil) ❏❏❏ x 3 Starpowr = Index of star power ❏❏❏ x 4 Sequel = 1 if movie is a sequel, 0 if not ❏❏❏ x 5 Action = 1 if action film, 0 if not ❏❏❏ x 6 Comedy = 1 if comedy film, 0 if not ❏❏❏ x 7 Animated = 1 if animated film, 0 if not ❏❏❏ x 8 Horror = 1 if horror film, 0 if not ❏❏❏ x 9 Addict = Trailer views at traileraddict.com ❏❏❏ x 10 Cmngsoon = Message board comments at comingsoon.net ❏❏❏ x 11 Fandango = Attention at fandango.com ❏❏❏ x 12 Cntwait3 = Percentage of Fandango votes that can't wait to see. ❏❏❏ At 10%At 5% At 1% Read the p value !!! Or compare the t stat to the t score with N-13 degrees of freedom
18
With Budget
19
Without Budget
20
Budget and Can’t Wait to See the movie ! Without budget among the variables, the popularity cntwait3 has a bigger impact… Than with budget included. Budget Cntwait3 Box office (box_mil) We know that Budget and Cntwait3 are correlated (an arrow either in one direction or in the other, or both) because including Budget affects the coefficient of Cntwait3 Other variables
21
Outline 1.Multivariate regression 2.Interpreting coefficients Ceteris Paribus 3.Standardized Coefficient 4.Multiple Correlation and R Squared Next time:Multivariate regression (Continued)
22
Standardized Coefficient We just saw: An increase in budget by 1 million $ leads to a rise in box office $ of 0.144 million $, all other things equal. But is 1 million $ big? Is 0.144 million $ big?
23
“a 1 standard deviation increase in x 2, leads to a …. % standard deviation increase in y.” Standard deviation of x2 (budget): 42.9. Standard deviation of y (box office outcome): 17.5. Coefficient of budget: 0.144. Fill in the blank. Standardized Coefficient
24
We multiply by 0.01 (1%) because cntwait3 ranges from 0 to 1. An increase in budget by 1 million $ leads to a rise in box office $ of 0.144 million $, all other things equal. An action movie has on average all other things equal a lower box office outcome, by $12 million. An increase in the ‘Percentage of Fandango votes that can't wait to see’ (cntwait3) by 1 percentage point leads to a 0.01 * 32.15 = 0.3215 M$ increase in box office outcome in $.
25
Outline 1.Multivariate regression 2.Interpreting coefficients Ceteris Paribus 3.Standardized Coefficient 4.Multiple Correlation and R Squared Next time:Multivariate regression (Continued)
26
R Squared How good are we at predicting the success of a movie? The multiple correlation is 1 if we are absolutely correct in our predictions. e i =0 for every movie. The multiple correlation is 0 if we do not better than taking the average. e i =
27
ESS/TSS = 13356/18665 = 0.7156
28
Wrap up We can use a number of variables to explain a dependent variable. Multiple regression accounts for multiple causes. The coefficients minimize the sum of the squared residuals. Understand the t test and the p value. The coefficients should be understood “all other things equal” or “ceteris paribus”. The standardized coefficients express effects in terms of standard deviations. The R squared between 0 and 100% measures how accurate our predictions are.
29
Coming up: Schedule for next week: Chapter on “Association and Causality”, and “Multivariate Regression”. Make sure you come to sessions and recitations. SundayMonday Multivariate Regression Tuesday Multivariate Regression The F test Wednesday Randomized Experiments and ANOVA Thursday Wrap up RecitationEvening session 7.30pm West Administration 002 Usual class 12.45pm Usual room Evening session 7.30pm West Administration 001 Usual class 12.45pm Usual room
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.