Statistical Inference and Regression Analysis: GB.3302.30 Professor William Greene Stern School of Business IOMS Department Department of Economics.

Slides:



Advertisements
Similar presentations
Part 17: Multiple Regression – Part /26 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
Advertisements

The Multiple Regression Model.
Part 17: Nonlinear Regression 17-1/26 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Part 12: Asymptotics for the Regression Model 12-1/39 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Part 1: Simple Linear Model 1-1/301-1 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
[Part 1] 1/15 Discrete Choice Modeling Econometric Methodology Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
Part 4: Prediction 4-1/22 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Econ 140 Lecture 81 Classical Regression II Lecture 8.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Objectives (BPS chapter 24)
Multiple regression analysis
The Simple Linear Regression Model: Specification and Estimation
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Our theory states.
The Simple Regression Model
Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.
Chapter 11 Multiple Regression.
Part 2: Projection and Regression 2-1/45 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Part 19: Residuals and Outliers 19-1/27 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 4: Partial Regression and Correlation 4-1/24 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
SIMPLE LINEAR REGRESSION
Part 5: Functional Form 5-1/36 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Correlation and Regression Analysis
Part 7: Multiple Regression Analysis 7-1/54 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Econometric Methodology. The Sample and Measurement Population Measurement Theory Characteristics Behavior Patterns Choices.
Part 3: Regression and Correlation 3-1/41 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Part 24: Multiple Regression – Part /45 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
SIMPLE LINEAR REGRESSION
Inference for regression - Simple linear regression
Discrete Choice Modeling William Greene Stern School of Business New York University.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
CHAPTER 14 MULTIPLE REGRESSION
Introduction to Linear Regression
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Lecture 7: What is Regression Analysis? BUEC 333 Summer 2009 Simon Woodcock.
Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Chapter 20 Linear and Multiple Regression
Inference for Least Squares Lines
William Greene Stern School of Business New York University
Correlation and Simple Linear Regression
Statistical Inference and Regression Analysis: GB
Statistical Inference and Regression Analysis: GB
Our theory states Y=f(X) Regression is used to test theory.
Econometrics Chengyaun yin School of Mathematics SHUFE.
Econometrics Analysis
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Econometrics I Professor William Greene Stern School of Business
Econometrics I Professor William Greene Stern School of Business
Econometrics I Professor William Greene Stern School of Business
Econometrics I Professor William Greene Stern School of Business
Presentation transcript:

Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics

2/97 2

3/97 3

Statistics and Data Analysis Part 7 – Regression Model-1 Regression Diagnostics

5/97 Using the Residuals How do you know the model is “good?” Various diagnostics to be developed over the semester. But, the first place to look is at the residuals.

6/97 Residuals Can Signal a Flawed Model Standard application: Cost function for output of a production process. Compare linear equation to a quadratic model (in logs) (123 American Electric Utilities)

7/97 Electricity Cost Function

8/97 Candidate Model for Cost Log c = a + b log q + e Most of the points in this area are above the regression line. Most of the points in this area are below the regression line. Most of the points in this area are above the regression line.

9/97 A Better Model? Log Cost = α + β 1 logOutput + β 2 [logOutput] 2 + ε

10/97 Candidate Models for Cost The quadratic equation is the appropriate model. Logc = a + b1 logq + b2 log 2 q + e

11/97 Missing Variable Included Residuals from the quadratic cost model Residuals from the linear cost model

12/97 Unusual Data Points Outliers have (what appear to be) very large disturbances, ε Wolf weight vs. tail length The 500 most successful movies

13/97 Outliers 99.5% of observations will lie within mean ± 3 standard deviations. We show (a+bx) ± 3s e below.) Titanic is 8.1 standard deviations from the regression! Only 0.86% of the 466 observations lie outside the bounds. (We will refine this later.) These observations might deserve a close look.

14/97 logPrice = a + b logArea + e Prices paid at auction for Monet paintings vs. surface area (in logs) Not an outlier: Monet chose to paint a small painting. Possibly an outlier: Why was the price so low?

15/97 What to Do About Outliers (1) Examine the data (2) Are they due to mismeasurement error or obvious “coding errors?” Delete the observations. (3) Are they just unusual observations? Do nothing. (4) Generally, resist the temptation to remove outliers. Especially if the sample is large. (500 movies is large. 10 wolves is not.) (5) Question why you think it is an outlier. Is it really?

16/97 Regression Options

17/97 Diagnostics

18/97 On Removing Outliers Be careful about singling out particular observations this way. The resulting model might be a product of your opinions, not the real relationship in the data. Removing outliers might create new outliers that were not outliers before. Statistical inferences from the model will be incorrect.

Statistics and Data Analysis Part 7 – Regression Model-2 Statistical Inference

20/97 b As a Statistical Estimator What is the interest in b?  = dE[y|x]/dx Effect of a policy variable on the expectation of a variable of interest. Effect of medication dosage on disease response … many others

21/97 Application: Health Care Data German Health Care Usage Data, There are altogether 27,326 observations on German households, DOCTOR = 1(Number of doctor visits > 0) HOSPITAL= 1(Number of hospital visits > 0) HSAT = health satisfaction, coded 0 (low) - 10 (high) DOCVIS = number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLIC = insured in public health insurance = 1; otherwise = 0 ADDON = insured by add-on insurance = 1; otherswise = 0 INCOME = household nominal monthly net income in German marks / HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years MARRIED = marital status EDUC = years of education

22/97 Regression? Population relationship Income =  +  Health +  (For this population, Income = Health +  E[Income | Health] = Health

23/97 Distribution of Health

24/97 Distribution of Income

25/97 Average Income | Health Health N j =

26/97 b is a statistic Random because it is a sum of the  ’s. It has a distribution, like any sample statistic

27/97 Sampling Experiment 500 samples of N=52 drawn from the 27,326 (using a random number generator to simulate N observation numbers from 1 to 27,326) Compute b with each sample Histogram of 500 values

28/97

29/97 Conclusions Sampling variability Seems to center on  Appears to be normally distributed

30/97 Distribution of slope estimator, b Assumptions: (Model) Regression: y i =  +  x i +  i (Crucial) Exogenous data: data x and noise  are independent; E[  |x]=0 or Cov( ,x)=0 (Temporary) Var[  |x] =  2, not a function of x (Homoscedastic) Results: What are the properties of b?

31/97

32/97 (1) b is unbiased and linear in 

33/97 (2) b is efficient Gauss – Markov Theorem: Like Rao Blackwell. (Proof in Greene) Variance of b is smallest among linear unbiased estimators.

34/97 (3) b is consistent

35/97 Consistency: N=52 vs. N=520

36/97 a is unbiased and consistent

37/97 Covariance of a and b

38/97 Inference about  Have derived expected value and variance of b. b is a ‘point’ estimator Looking for a way to form a confidence interval. Need a distribution and a pivotal statistic to use.

39/97 Normality

40/97 Confidence Interval

41/97 Estimating sigma squared

42/97 Usable Confidence Interval Use s instead of s. Use t distribution instead of normal. Critical t depends on degrees of freedom b - ts <  < b + ts

43/97 Slope Estimator

44/97 Regression Results Ordinary least squares regression LHS=BOX Mean = Standard deviation = No. of observations = 62 DegFreedom Mean square Regression Sum of Squares = Residual Sum of Squares = Total Sum of Squares = Standard error of e = Root MSE Fit R-squared = R-bar squared Model test F[ 1, 60] = Prob F > F* | Standard Prob. 95% Confidence BOX| Coefficient Error t |t|>T* Interval Constant| ** CNTWAIT3| *** Note: ***, **, * ==> Significance at 1%, 5%, 10% level

45/97 Hypothesis Test about  Outside the confidence interval is the rejection for hypothesis tests about  For the internet buzz regression, the confidence interval is to The hypothesis that  equals zero is rejected.

Statistics and Data Analysis Part 7-3 – Prediction

47/97 Predicting y Using the Regression Actual y 0 is  +  x 0 +  0 Prediction is y 0 ^ = a + bx Error is y 0 – y 0 ^ = (a-  ) + (b-  )x 0 +  0 Variance of the error is Var[a] + x 0 2 Var[b] + 2x 0 Cov[a,b] + Var[  0 ]

48/97 Prediction Variance

49/97 Quantum of Solace Actual Box = $ M a=-14.36, b= , N=62, s b = , s 2 = buzz = 0.76, prediction = Mean buzz =  (buzz – mean) 2 = S forecast = Confidence interval = / ( ) = to (Note: The confidence interval contains the value)

50/97 Forecasting Out of Sample Per Capita Gasoline Consumption vs. Per Capita Income, How to predict G for 2012? You would need first to predict Income for How should we do that? Regression Analysis: G versus Income The regression equation is G = Income Predictor Coef SE Coef T P Constant Income S = R-Sq = 88.0% R-Sq(adj) = 87.8%

51/97 The Extrapolation Penalty The interval is narrowest at x* =, the center of our experience. The interval widens as we move away from the center of our experience to reflect the greater uncertainty. (1) Uncertainty about the prediction of x (2) Uncertainty that the linear relationship will continue to exist as we move farther from the center.

52/97 Normality Necessary for t statistics and confidence intervals Residuals reveal whether disturbances are normal? Standard tests and devices

53/97 Normally Distributed Residuals? Kolmogorov-Smirnov test of F(E ) vs. Normal[.00000, ^2] ******* K-S test statistic = ******* 95% critical value = ******* 99% critical value = Normality hyp. should be rejected

54/97 Nonnormal Disturbances Appeal to the central limit theorem Use standard normal instead of t t is essentially normal if N > 50.

Statistics and Data Analysis Part 7-4 – Multiple Regression

56/97 Box Office and Movie Buzz 56

57/97 Box Office and Budget 57

58/97 Budget and Buzz Effects 58

59/97 An Enduring Art Mystery Why do larger paintings command higher prices? The Persistence of Memory. Salvador Dali, 1931 The Persistence of Statistics. Rice, 2007 Graphics show relative sizes of the two works.

60/97 The Data Note: Using logs in this context. This is common when analyzing financial measurements (e.g., price) and when percentage changes are more interesting than unit changes. (E.g., what is the % premium when the painting is 10% larger?)

61/97 Monet in Large and Small Log of $price = a + b log surface area + e Sale prices of 328 signed Monet paintings The residuals do not show any obvious patterns that seem inconsistent with the assumptions of the model.

62/97 Monet Regression: There seems to be a regression. Is there a theory?

63/97 How much for the signature? The sample also contains 102 unsigned paintings Average Sale Price Signed $3,364,248 Not signed $1,832,712 Average price of signed Monet’s is almost twice that of unsigned

64/97 Can we separate the two effects? Average Prices Small Large Unsigned 346,845 5,795,000 Signed 689,422 5,556,490 What do the data suggest? (1) The size effect is huge (2) The signature effect is confined to the small paintings.

65/97 A Multiple Regression Ln Price = a + b1 ln Area + b2 (0 if unsigned, 1 if signed) + e b2

66/97 Monet Multiple Regression Regression Analysis: ln (US$) versus ln (SurfaceArea), Signed The regression equation is ln (US$) = ln (SurfaceArea) Signed Predictor Coef SE Coef T P Constant ln (SurfaceArea) Signed S = R-Sq = 46.2% R-Sq(adj) = 46.0% Interpretation (to be explored as we develop the topic): (1) Elasticity of price with respect to surface area is – very large (2) The signature multiplies the price by exp(1.2618) (about 3.5), for any given size.

67/97 Ceteris Paribus in Theory Demand for gasoline: G = f(price,income) Demand (price) elasticity: e P = %change in G given %change in P holding income constant. How do you do that in the real world? The “percentage changes” How to change price and hold income constant?

68/97 The Real World Data

69/97 U.S. Gasoline Market,

70/97 Shouldn’t Demand Curves Slope Downward?

71/97 A Thought Experiment The main driver of gasoline consumption is income not price Income is growing over time. We are not holding income constant when we change price! How do we do that?

72/97 How to Hold Income Constant? Multiple Regression Using Price and Income Regression Analysis: G versus GasPrice, Income The regression equation is G = GasPrice Income Predictor Coef SE Coef T P Constant GasPrice Income It looks like the theory works.

Statistics and Data Analysis Linear Multiple Regression Model

74/97 Classical Linear Regression Model The model is y = f(x 1,x 2,…,x K,  1,  2,…  K ) +  = a multiple regression model. Important examples: Marginal cost in a multiple output setting Separate age and education effects in an earnings equation. Denote (x 1,x 2,…,x K ) as x. Boldface symbol = vector. Form of the model – E[y|x] = a linear function of x. ‘Dependent’ and ‘independent’ variables. Independent of what? Think in terms of autonomous variation. Can y just ‘change?’ What ‘causes’ the change?

75/97 Model Assumptions: Generalities Linearity means linear in the parameters. We’ll return to this issue shortly. Identifiability. It is not possible in the context of the model for two different sets of parameters to produce the same value of E[y|x] for all x vectors. (It is possible for some x.) Conditional expected value of the deviation of an observation from the conditional mean function is zero Form of the variance of the random variable around the conditional mean is specified Nature of the process by which x is observed. Assumptions about the specific probability distribution.

76/97 Linearity of the Model f(x 1,x 2,…,x K,  1,  2,…  K ) = x 1  1 + x 2  2 + … + x K  K Notation: x 1  1 + x 2  2 + … + x K  K = x . Boldface letter indicates a column vector. “x” denotes a variable, a function of a variable, or a function of a set of variables. There are K “variables” on the right hand side of the conditional mean “function.” The first “variable” is usually a constant term. (Wisdom: Models should have a constant term unless the theory says they should not.) E[y|x] =  1 *1 +  2 *x 2 + … +  K *x K. (  1 *1 = the intercept term).

77/97 Linearity Linearity means linear in the parameters, not in the variables E[y|x] =  1 f 1 (…) +  2 f 2 (…) + … +  K f K (…). f k () may be any function of data. Examples: Logs and levels in economics Time trends, and time trends in loglinear models – rates of growth Dummy variables Quadratics, power functions, log-quadratic, trig functions, interactions and so on.

78/97 Linearity Simple linear model, E[y|x] =x’β Quadratic model: E[y|x] = α + β 1 x + β 2 x 2 Loglinear model, E[lny|x] = α + Σ k lnx k β k Semilog, E[y|x] = α + Σ k lnx k β k All are “linear.” An infinite number of variations.

79/97 Matrix Notation 79

80/97 Notation Define column vectors of N observations on y and the K x variables. The assumption means that the rank of the matrix X is K. No linear dependencies => FULL COLUMN RANK of the matrix X.

81/97 Uniqueness of the Conditional Mean The conditional mean relationship must hold for any set of N observations, i = 1,…,N. Assume, that N  K (justified later) E[y 1 |x] = x 1  E[y 2 |x] = x 2  … E[y n |x] = x n  All N observations at once: E[y|X] = X  = E .

82/97 Uniqueness of E[y|X] Now, suppose there is a    that produces the same expected value, E[y|X] = X  = E . Let  =  - . Then, X  = X  - X  = E  - E  = 0. Is this possible? X is an N  K matrix (N rows, K columns). What does X  = 0 mean? We assume this is not possible. This is the ‘full rank’ assumption. Ultimately, it will imply that we can ‘estimate’ . This requires N  K.

83/97 An Unidentified (But Valid) Theory of Art Appreciation Enhanced Monet Area Effect Model: Height and Width Effects Log(Price) = β 1 + β 2 log Area + β 3 log Aspect Ratio + β 4 log Height + β 5 Signature + ε (Aspect Ratio = Height/Width)

84/97 Conditional Homoscedasticity and Nonautocorrelation Disturbances provide no information about each other. Var[  i | X ] =  2 Cov[  i,  j |X] = 0

85/97 Heteroscedasticity Regression of log of per capita gasoline use on log of per capita income, gasoline price and number of cars per capita for 18 OECD countries for 19 years. The standard deviation varies by country. Countries are ordered by the standard deviation of their 19 residuals.

86/97 Autocorrelation logG=β 1 + β 2 logPg + β 3 logY + β 4 logPnc + β 5 logPuc + ε

87/97 Autocorrelation Results from an Incomplete Model

88/97 Normal Distribution of ε Used to facilitate finite sample derivations of certain test statistics. Observations are independent Assumption will be unnecessary – we will use the central limit theorem for the statistical results we need.

89/97 The Linear Model y = X  +ε, N observations, K columns in X, (usually including a column of ones for the intercept). Standard assumptions about X Standard assumptions about ε|X E[ε|X]=0, E[ε]=0 and Cov[ε,x]=0 Regression: E[y|X] = X 

Statistics and Data Analysis Least Squares

91/97 Vocabulary Some terms to be used in the discussion. Population characteristics and entities vs. sample quantities and analogs Residuals and disturbances Population regression line and sample regression Objective: Learn about the conditional mean function. Estimate  and  2 First step: Mechanics of fitting a line to a set of data

92/97 Least Squares

93/97 Matrix Results

94/97

95/97 Moment Matrices

96/97 Least Squares Normal Equations

97/97 Second Order Conditions

98/97 Does b Minimize e’e?

99/97 Positive Definite Matrix