Download presentation
Presentation is loading. Please wait.
Published byDaniel Washington Modified over 9 years ago
1
Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression R squared, F test, Chapter 11 Prof. Amine Ouazad
2
Data: Variables yBox = First run U.S. box office ($) x 1 MPRating = 1 if movie is PG13 or R, 0 if the movie is G or PG. x 2 Budget = Production budget ($Mil) x 3 Starpowr = Index of star power x 4 Sequel = 1 if movie is a sequel, 0 if not x 5 Action = 1 if action film, 0 if not x 6 Comedy = 1 if comedy film, 0 if not x 7 Animated = 1 if animated film, 0 if not x 8 Horror = 1 if horror film, 0 if not x 9 Addict = Trailer views at traileraddict.com x 10 Cmngsoon = Message board comments at comingsoon.net x 11 Fandango = Attention at fandango.com x 12 Cntwait3 = Percentage of Fandango votes that can't wait to see.
3
Statistics Course Outline P ART I. I NTRODUCTION AND R ESEARCH D ESIGN P ART II. D ESCRIBING DATA P ART III. D RAWING CONCLUSIONS FROM DATA : I NFERENTIAL S TATISTICS P ART IV. : C ORRELATION AND C AUSATION : T WO G ROUPS, R EGRESSION A NALYSIS Week 1 Weeks 2-4 Weeks 5-9 Weeks 10-14 Multivariate regression now! R Squared, F stat Estimating a parameter using sample statistics. Confidence Interval at 90%, 95%, 99% Testing a hypothesis using the CI method and the t method. Sample statistics: Mean, Median, SD, Variance, Percentiles, IQR, Empirical Rule Bivariate sample statistics: Correlation, Slope Four Steps of “Thinking Like a Statistician” Study Design: Simple Random Sampling, Cluster Sampling, Stratified Sampling Biases: Nonresponse bias, Response bias, Sampling bias
4
Coming up “Comparison of Two Groups” Last week. “Univariate Regression Analysis” Last Saturday, Section 9.5. “Association and Causality: Multivariate Regression” Last Saturday, Chapter 10. Yesterday, Today, Chapter 11 – R Squared, F test. “Randomized Experiments and ANOVA”. Wednesday. Chapter 12. “Robustness Checks and Wrap Up”. Last Thursday.
5
Instead we estimate b 1, b 2, …, b K on the sample: – Minimizing the sum of the squared prediction error ! With these we can predict the success of a movie: Multivariate Regression
6
Outline 1.Multiple Correlation and R Squared 2.F test 3.Partial correlation Next time:Multivariate regression: the F test (Continued)
7
R Squared How good are we at predicting the success of a movie? The R squared is 1 if we are absolutely correct in our predictions. e i =0 for every movie. The R squared is 0 if we do not better than taking the average. e i =
8
ESS/TSS = 13356/18665 = 0.7156
9
Graphically … Each point on this graph is a movie How would the graph look like if R 2 = 1 ?
10
Properties of the R Squared The larger the value of the R Squared, the better the (x 1, …., x 12 ) collectively predict y. Adding a variable on the right hand side raises the R squared. Warning A high R squared is not a sign that your linear regression measures causal effects. Adding a large number of variables will lead to a high R squared. It merely says that within the sample, your predictions are close to the actual values y i. Ask yourself: is there a reason to think that x 1, …., x 12 causes y? Warning A high R squared is not a sign that your linear regression measures causal effects. Adding a large number of variables will lead to a high R squared. It merely says that within the sample, your predictions are close to the actual values y i. Ask yourself: is there a reason to think that x 1, …., x 12 causes y?
11
Compare this…. (without web popularity variables)
12
with this… (with web popularity variables)
13
Adjusted R Squared What about an R squared that increases only if the variable that we add has a high enough t stat?? Such adjusted R Squared increases only if the absolute value of the t statistic is greater than 1.
14
Compare this … (without Star Power)
15
… with this (Star Power Added) What happened to the R squared when we added Star power? What happened to the adjusted R squared?
16
Outline 1.Multiple Correlation and R Squared 2.F test 3.Partial correlation Next time:Multivariate regression: the F test (Continued)
17
F test The t test checks that one particular coefficient, say b3, is statistically significant. But about all coefficients collectively? H 0 : “ 1 = 2 = 3 = … = 12 = 0”. The alternative hypothesis is that at least one k is non zero. H a : “For at least one k, k ≠ 0”.
18
F test F statistic: Under the null hypothesis, F follows an F distribution with df 1 = K and df 2 = N – (K+1) degrees of freedom. The F is always positive. Intuition: For N=∞, notice how the F stat is a way of comparing the R 2 to a threshold.
19
Can we reject the null hypothesis? H 0 : “ 1 = 2 = 3 = … = 12 = 0”. Notice the degrees of freedom of the F stat: df 1 = ? df 2 = ?
20
F test – Intuitions We could just check that at least one t-statistic is above the t score with df = N-(K+1). – This is time consuming. It may happen that the p value of the F stat is marginally above 0.05, while one p value of one t stat is marginally below 0.05. – Either the F test or the t test face Type I or Type II error. – Be conservative: trust the least favorable result.
21
Outline 1.Multiple Correlation and R Squared 2.F test 3.Partial correlation Next time:Multivariate regression: the F test (Continued)
22
Partial Correlation between y and x1 Measures the association between y and x1, controlling for all other variables x 2, x 3, …, x 12. The partial correlation is thus measuring “All other things equal” or “ceteris paribus” (See previous slides). r yx1x2 is between -1 and +1. r yx1x2 has the sign of b 1. What if r x1x2 = 0?
23
Correlation vs Partial Correlation Last time we saw that the budget has an impact on website popularity ! Correlation between y (box_mil) and cntwait3 (percentage that can’t wait) is 0.6511. Partial correlation between y (box_mil) and cntwait3 (percentage that can’t wait) is 0.3083. Correlations (corr) with box_milPartial Correlations (pcorr) with box_mil
24
Congratulations: You now fully understand regression output! You can use a number of variables to explain a dependent variable. ☞ Multiple regression accounts for multiple causes. The coefficients minimize the sum of the squared residuals. Understand the t test and the p value. ☞ The F test tests the null hypothesis that all coefficients (except the constant) are zero. The coefficients should be understood “all other things equal” or “ceteris paribus”. The standardized coefficients express effects in terms of standard deviations. The R squared between 0 and 100% measures how accurate our predictions are. ☞ The Adjusted R Squared corrects the R squared for the addition of variables with small t statistic.
25
Coming up: Coverage for the final just ends right after the F test. Chapter on “Association and Causality”, and “Multivariate Regression”. Make sure you come to sessions and the last recitation. SundayMonday Multivariate Regression Tuesday Multivariate Regression The F test Wednesday Randomized Experiments and ANOVA Thursday Wrap up RecitationEvening session 7.30pm West Administration 002 Usual class 12.45pm Usual room Evening session 7.30pm West Administration 001 Usual class 12.45pm Usual room
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.