Statistical Inference and Regression Analysis: GB

Slides:



Advertisements
Similar presentations
Part 24: Hypothesis Tests 24-1/33 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Advertisements

More on understanding variance inflation factors (VIFk)
Part 12: Asymptotics for the Regression Model 12-1/39 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Part 7: Estimating the Variance of b 7-1/53 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Part 22: Multiple Regression – Part /60 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics.
Chapter 11 Multiple Regression.
Multiple Regression and Correlation Analysis
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Part 7: Multiple Regression Analysis 7-1/54 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Contracts 3:C - 1(17) Entertainment and Media: Markets and Economics Contracts Between Talent and Entertainment Producers Appendix: A-Rod Deal.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Part 9: Model Building 9-1/43 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 0: Introduction 0-1/17 Regression and Forecasting Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Part 3: Regression and Correlation 3-1/41 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 24: Multiple Regression – Part /45 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
Introduction to Linear Regression
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
11/11/20151 The Demand for Baseball Tickets 2005 Frank Francis Brendan Kach Joseph Winthrop.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Multiple Regression II 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 2) Terry Dielman.
Multicollinearity. Multicollinearity (or intercorrelation) exists when at least some of the predictor variables are correlated among themselves. In observational.
1 Multiple Regression. 2 Model There are many explanatory variables or independent variables x 1, x 2,…,x p that are linear related to the response variable.
PO 141: INTRODUCTION TO PUBLIC POLICY Summer I (2015) Claire Leavitt Boston University.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
The statistics behind the game
Chapter 15 Multiple Regression Model Building
Chapter 20 Linear and Multiple Regression
Chapter 15 Multiple Regression and Model Building
Multiple Regression (1)
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Inference for Least Squares Lines
Multiple Regression Analysis
Statistical Inference and Regression Analysis: GB
Chapter 11 Multiple Regression
Modern Languages Projection Booth Screen Stage Lecturer’s desk broken
Multiple Regression Analysis and Model Building
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
John Loucks St. Edward’s University . SLIDES . BY.
9/19/2018 ST3131, Lecture 6.
Business Statistics, 4e by Ken Black
BPK 304W Correlation.
Statistical Inference and Regression Analysis: GB
The statistics behind the game
The Practice of Statistics in the Life Sciences Fourth Edition
Inference for Regression Lines
CHAPTER 29: Multiple Regression*
Solutions for Tutorial 3
Prepared by Lee Revere and John Large
Econometrics I Professor William Greene Stern School of Business
Statistical Inference and Regression Analysis: GB
Multiple Regression Chapter 14.
Multiple Regression Analysis
Entertainment and Media: Markets and Economics
Multiple Linear Regression
Regression Forecasting and Model Building
Econometrics I Professor William Greene Stern School of Business
Chapter Fourteen McGraw-Hill/Irwin
Introduction to Probability and Statistics Twelfth Edition
Essentials of Statistics for Business and Economics (8e)
Business Statistics, 4e by Ken Black
Introduction to Regression
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Statistical Inference and Regression Analysis: GB.3302.30 Professor William Greene Stern School of Business IOMS Department Department of Economics

Inference and Regression Not Perfect Collinearity

Variance Inflation and Multicollinearity When variables are highly but not perfectly correlated, least squares is difficult to compute accurately Variances of least squares slopes become very large. Variance inflation factors: For each xk, VIF(k) = 1/[1 – R2(k)] where R2(k) is the R2 in the regression of xk on all the other x variables in the data matrix

Gasoline Market Regression Analysis: logG versus logIncome, logPG The regression equation is logG = - 0.468 + 0.966 logIncome - 0.169 logPG Predictor Coef SE Coef T P Constant -0.46772 0.08649 -5.41 0.000 logIncome 0.96595 0.07529 12.83 0.000 logPG -0.16949 0.03865 -4.38 0.000 S = 0.0614287 R-Sq = 93.6% R-Sq(adj) = 93.4% Analysis of Variance Source DF SS MS F P Regression 2 2.7237 1.3618 360.90 0.000 Residual Error 49 0.1849 0.0038 Total 51 2.9086 R2 = 2.7237/2.9086 = 0.93643

Gasoline Market Regression Analysis: logG versus logIncome, logPG, ... The regression equation is logG = - 0.558 + 1.29 logIncome - 0.0280 logPG - 0.156 logPNC + 0.029 logPUC - 0.183 logPPT Predictor Coef SE Coef T P Constant -0.5579 0.5808 -0.96 0.342 logIncome 1.2861 0.1457 8.83 0.000 logPG -0.02797 0.04338 -0.64 0.522 logPNC -0.1558 0.2100 -0.74 0.462 logPUC 0.0285 0.1020 0.28 0.781 logPPT -0.1828 0.1191 -1.54 0.132 S = 0.0499953 R-Sq = 96.0% R-Sq(adj) = 95.6% Analysis of Variance Source DF SS MS F P Regression 5 2.79360 0.55872 223.53 0.000 Residual Error 46 0.11498 0.00250 Total 51 2.90858 R2 = 2.79360/2.90858 = 0.96047 logPG is no longer statistically significant when the other variables are added to the model.

Evidence of Multicollinearity: Regression of logPG on the other variables gives a very good fit.

Diagnostic Tools Look for incremental contributions to R2 when additional predictors are added Look for predictor variables not to be well explained by other predictors: (these are all the same) Look for “information” and independent sources of information Collinearity and influential observations can be related Removing influential observations can make it worse or better The relationship is far too complicated to say anything useful about how these two might interact.

NIST Statistical Reference Data Sets – Accuracy Tests

The Filipelli Problem

VIF for X10: R2 = .99999999999999630 VIF = .27294543196184830D+15

Other software: Minitab reports the correct answer Stata drops X10

Accurate and Inaccurate Computation of Filipelli Results Accurate computation requires not actually computing (X’X)-1. We (and others) use the QR method. See text for details.

Stata Filipelli Results

Even after dropping two (random columns), results are only correct to 1 or 2 digits.

Inference and Regression Testing Hypotheses

Testing Hypotheses

Hypothesis Testing: Criteria

The F Statistic has an F Distribution

Nonnormality or Large N Denominator of F converges to 1. Numerator converges to chi squared[J]/J. Rely on law of large numbers for the denominator and CLT for the numerator: JF  Chi squared[J] Use critical values from chi squared.

Significance of the Regression - R*2 = 0

Table of 95% Critical Values for F

+----------------------------------------------------+ | Ordinary least squares regression | | LHS=LOGBOX Mean = 16.47993 | | Standard deviation = .9429722 | | Number of observs. = 62 | | Residuals Sum of squares = 25.36721 | | Standard error of e = .6984489 | | Fit R-squared = .5323241 | | Adjusted R-squared = .4513802 | +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| |Constant| 11.9602*** .91818 13.026 .0000 | |LOGBUDGT| .38159** .18711 2.039 .0465 3.71468| |STARPOWR| .01303 .01315 .991 .3263 18.0316| |SEQUEL | .33147 .28492 1.163 .2500 .14516| |MPRATING| -.21185 .13975 -1.516 .1356 2.96774| |ACTION | -.81404** .30760 -2.646 .0107 .22581| |COMEDY | .04048 .25367 .160 .8738 .32258| |ANIMATED| -.80183* .40776 -1.966 .0546 .09677| |HORROR | .47454 .38629 1.228 .2248 .09677| |PCBUZZ | .39704*** .08575 4.630 .0000 9.19362| +--------+------------------------------------------------------------+ F = [(.6211405 - .5323241)/3] / [(1 - .6211405)/(62 – 13)] = 3.829; F* = 2.84

Inference and Regression A Case Study

Mega Deals for Stars A Capital Budgeting Computation Costs and Benefits Certainty: Costs Uncertainty: Benefits Long Term: Need for discounting

Baseball Story A Huge Sports Contract Alex Rodriguez hired by the Texas Rangers for something like $25 million per year in 2000. Costs – the salary plus and minus some fine tuning of the numbers Benefits – more fans in the stands. How to determine if the benefits exceed the costs? Use a regression model.

The Texas Deal for Alex Rodriguez 2001 Signing Bonus = 10M 2001 21 2002 21 2003 21 2004 21 2005 25 2006 25 2007 27 2008 27 2009 27 2010 27 Total: $252M ???

The Real Deal Year Salary Bonus Deferral 2001 21 2 5 to 2011 Deferrals accrue interest of 3% per year.

Costs Insurance: About 10% of the contract per year (Taxes: About 40% of the contract) Some additional costs in revenue sharing revenues from the league (anticipated, about 17.5% of marginal benefits – uncertain) Interest on deferred salary - $150,000 in first year, well over $1,000,000 in 2010. (Reduction) $3M it would cost to have a different shortstop. (Nomar Garciaparra)

PDV of the Costs Using 8% discount factor (They used) Accounting for all costs Roughly $21M to $28M in each year from 2001 to 2010, then the deferred payments from 2010 to 2020 Total costs: About $165 Million/Year in 2001 (Present discounted value)

Benefits More fans in the seats Gate Parking Merchandise Increased chance at playoffs and world series Sponsorships (Loss to revenue sharing) Franchise value

How Many New Fans? Projected 8 more wins per year. What is the relationship between wins and attendance? Not known precisely Many empirical studies (The Journal of Sports Economics) Use a regression model to find out.

Baseball Data 31 teams, 17 years (fewer years for 6 teams) Winning percentage: Wins = 162 * percentage Rank Average attendance. Attendance = 81*Average Average team salary Number of all stars Manager years of experience Percent of team that is rookies Lineup changes Mean player experience Dummy variable for change in manager

Baseball Data (Panel Data)

A Dynamic Equation

About 220,000 fans

The Regression Model

Marginal Value of One Win

Marginal Value of an A Rod 8 games * 63,734 fans = 509,878 fans 509,878 fans * $18 per ticket $2.50 parking etc. $1.80 stuff (hats, bobble head dolls,…) $11.3 Million per year !!!!! It’s not close. (Marginal cost is at least $16.5M / year)

The IPN Player A-Rod and Yankees – The Iconic Performance Network Player Attendance rose to 4M in 2005, 4.3M in 2007 MVP in 2005 and 2007 Huge growth in the YES network Seemed certain to break Bonds’ HR record (Asterisk?) New deal: $275M over 10 years Chicago Cubs offer included team ownership. Drug Problems probably derailed this career path.

The Ghosts of Seasons Past: Long Run Implications - The Shadow Cost The commitment to A-Rod limited the ability of the Texas Rangers to field a great team. The same problem now faces the Yankees. A-Rod is aging and becoming less likely to break the records. His steroid use has tarnished his reputation and reduced the value of his history. Why do teams do these long term mega deals for baseball players?

Kershaw vs. A Rod  Shorter term, risk shifting onto the team  Bargaining strength has shifted in favor of the player.