Some Topics In Multivariate Regression. Some Topics We need to address some small topics that are often come up in multivariate regression. I will illustrate.

Slides:



Advertisements
Similar presentations
CHOW TEST AND DUMMY VARIABLE GROUP TEST
Advertisements

Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
Heteroskedasticity The Problem:
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
INTERPRETATION OF A REGRESSION EQUATION
Objectives (BPS chapter 24)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Valuation 4: Econometrics Why econometrics? What are the tasks? Specification and estimation Hypotheses testing Example study.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Lecture 23: Tues., Dec. 2 Today: Thursday:
How Low Can House Prices Fall? (Quite a bit). Learning Outcomes 1.Expand the regression model to allow for multiple X variables 2.Formalise the hypothesis.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Multiple Regression Models
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Further Inference in the Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Interpreting Bi-variate OLS Regression
Back to House Prices… Our failure to reject the null hypothesis implies that the housing stock has no effect on prices – Note the phrase “cannot reject”
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy classification with more than two categories Original citation:
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
Multiple Linear Regression Analysis
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
Hypothesis Testing. Distribution of Estimator To see the impact of the sample on estimates, try different samples Plot histogram of answers –Is it “normal”
Returning to Consumption
4.2 One Sided Tests -Before we construct a rule for rejecting H 0, we need to pick an ALTERNATE HYPOTHESIS -an example of a ONE SIDED ALTERNATIVE would.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Serial Correlation and the Housing price function Aka “Autocorrelation”
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 3: Basic techniques for innovation data analysis. Part II: Introducing regression.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Chapter 5: Dummy Variables. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 We’ll now examine how you can include qualitative explanatory variables.
Environmental Modeling Basic Testing Methods - Statistics III.
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: exercise 6.13 Original citation: Dougherty, C. (2012) EC220 - Introduction.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
1 NONLINEAR REGRESSION Suppose you believe that a variable Y depends on a variable X according to the relationship shown and you wish to obtain estimates.
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Lecture #25 Tuesday, November 15, 2016 Textbook: 14.1 and 14.3
Chapter 20 Linear and Multiple Regression
QM222 Class 9 Section A1 Coefficient statistics
QM222 Class 8 Section A1 Using categorical data in regression
Chapter 7: The Normality Assumption and Inference with OLS
The Multiple Regression Model
Introduction to Econometrics, 5th edition
Presentation transcript:

Some Topics In Multivariate Regression

Some Topics We need to address some small topics that are often come up in multivariate regression. I will illustrate them using the Housing example.

Some Topics 1.Confidence intervals 2.Scale of data 3.Functional Form 4.Tests of multi-coefficient hypotheses

Woldridge refs to date Chapter 1 Chapter 2.1, 2.2,2.5 Chapter 3.1,3.2,3.3 Chapter 4.1, 4.2, 4.3, 4.4

Confidence Intervals (4.3) We can construct an interval within which the true value of the parameter lies We have seen that –P(-1.96 ≤ t ≤ 1.96)=0.95for large N-K More generally:

Interval b± tc *se(b) will contain  with (1-  )% confidence. –Where tc is “critical value” and is determined by the significance level (  ) and the degrees of freedom (df=N-K) –For the case where N-K is large (>100) and a is 5% then tc = 1.96 Same as the set of values of beta, which could not be rejected if they were null hypotheses –The range of possible values consistent with the data –A way of avoiding some of the ambiguity in the formulation of hypothesis tests Formally: A procedure which will generate an interval containing the true value (1-  )% times in repeated samples

Level Option Stata command: regress …, level(95) Note: in assignments I want you to do it manually regress price inc_pc hstock_pc if year<=1997 Source | SS df MS Number of obs = F( 2, 25) = Model | e e+09 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = price | Coef. Std. Err. t P>|t| [95% Conf. Interval] inc_pc | hstock_pc | _cons |

Scale (2.4 & 6.1) The scale of the data may matter –i.e. whether we measure house prices in € or €bn or even £ or $ Exercise: try this with housing or consumption examples Basic model: y i = b 1 + b 2 x i + u i

Change scale of x i : x i * = x i /c –Estimate: y i = b 1 * + b 2 * x i *+ u i b 2 *= c.b 2 se(b 2 ) = c.se(b 2 ) Slope coefficient and se change, all other statistics (t-stats, R 2, F, etc.) unchanged.

Change scale of y i : y i * = y i /c –Estimate y* i = b 1 * + b 2 * x i + u i b 2 *= b 2 /c b 1 *= b 1 /c se(b 2 ) = se(b 2 )/c se(b 1 ) = se(b 1 )/c t-stats, R 2, F unchanged Both X and Y rescaled y i * = y i /c, x i * = x i /c –Estimate y* i = b 1 * + b 2 * x* + u i –If rescaled by same amount: –b 1 *= b 1 /c se(b 1 ) = se(b 1 )/c –b 2 and se(b 2 ) unchanged –t-stats, R 2, F unchanged

Functional Form (6.2) Four common functional forms –Linear: q t =  +  p t + u t –Log-Log: lnq t =  +  lnp t + u t –Semilog: q t =  +  lnp t + u t or lnq t =  +  p t + u t How to choose? –Which fits the data best (cannot compare R2 unless y is same) –Which is most convenient (do we want elasticity, rate of return?) –How trade-off two goals

Elasticity and Marginal Effects

Two housing models The level variables: marginal effects regress price inc_pc hstock_pc if year<=1997 Source | SS df MS Number of obs = F( 2, 25) = Model | e e+09 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = price | Coef. Std. Err. t P>|t| [95% Conf. Interval] inc_pc | hstock_pc | _cons |

Log on log formulation regress lprice linc lh if year<=1997 Source | SS df MS Number of obs = F( 2, 25) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] linc | lh | _cons |

F-tests Often we will want to test joint hypotheses –i.e. hypotheses that involve more than one coefficient –Linear restrictions Three examples (using the log model) 1.H 0 :  H = 0&  I = 0 H 1 :  H ≠ 0 or  I ≠0 2.H 0 :  H = 0 &  I = 1H 1 :  H ≠ 0 or  I ≠1 3.H 0 :  H +  I = 1H 1 :  H +  I ≠ 1

1. Test of Joint Significance Example 1 is given the special name of “test of joint significance” Could do K-1 t-tests, one on each of the K- 1 variables This would not be a joint hypothesis but a series of K-1 individual hypotheses The two are not equivalent

Why Joint Hypotheses matter Recall the sampling makes the estimators random variables Estimators of different coefficients are correlated random variables All the coeff are estimated from same sample in any one regression Making statements about one coefficient implies a statement about another Formally P(b 2 =0).P(b 3 =0)  P(b 2 =b 3 =0)

So the set of regressions in which both are zero is smaller than the set in which either one are zero This intuition holds for more general hypotheses.

Testing Joint Significance

So we can reject the null hypothesis if the test statistic is greater than zero How much greater? Greater than a critical value got from the F-distribution tables with three parameters –Significance level –Df1=K-1 –Df2=N-K The test is so useful it is reported by stata

Formal Procedure

2. Test Linear Restriction H 0 :  H = 0 &  I = 1H 1 :  H ≠ 0 or  I ≠1 Could do 2 t-tests –This would not be a joint hypothesis but a series of 2 individual hypotheses –The two are not equivalent for the same reason as before Look at the formal procedure first and then explain the intuition –Similar but not the same as test of joint sig. –Common mistake on exam

Formal Procedure

5.Find the Critical Value: –Df1=r =the number of restrictions –Df2= N-K from the unrestricted model –Sig level: you choose 6.Reject the null if F>critical value 7.State conclusion: –We can(not) reject the null hypothesis at the  % significance level

The Housing Example

The Restricted Model To estimate the restricted model requires us to impose the hypothesis on the model –i.e. treat the hypothesis as true and re- estimate the model –This is true for a t-test also but trickier here The unrestricted model is: lp t =  0 +  I Linc t +  H Lh t +u t Imposing the restrictions gives lp t =  0 +  Linc t +  Lh t +u t lp t -  Linc t =  0 + u t

The zero restriction just means that the variable drops out A restriction that require coeff to be another number is more of a problem Trick is to bring it over to the LHS of equation We then generate a new variable for the right hand side and use that to estimate the restricted model

gen y=lprice-linc. regress y if year<=1997 Source | SS df MS Number of obs = F( 0, 27) = 0.00 Model | 0 0. Prob > F =. Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] _cons | Comment –This may seem like a silly regression after all it has no variables on the right side (just the constant) –The regression is of no interest itself. –It is merely the regression of the original model with the restriction imposed –The only thing we care about is the RSS (red)

Intuition of F-test Recall that the RSS is the variation in the Y variable that is not explained by the model The F-test compares the size of this unexplained bit before and after the restriction is imposed. If imposing the restriction causes the RSS to rise by a lot then that suggests that restriction is not supported by the data –model with the restriction explains a lot less of the variation in Y

Intuition cont. Look at the formula for the test statistic –It is basically the %increase in RSS brought about by the restriction –The % decline in explanatory power –The DF are just adjustments for statistical reasons (ensure test has F distribution) If the decline in explanatory power is large enough we reject the null How large? –Larger than critical value

Comments on F Almost any test can be formulated as linear restriction –Very general method T-test is a special case –Exercise: reformulate a t-test as f-test Test of joint significance is another special case Stata: test command –Use it to verify your results Related to R2 – can reformulate the f-test in terms of R2 (see book) Note that RSS R > RSS U –A restriction cannot improve the fit of the model –The question is if the deterioration is large –F is always positive

Recall the Learning Outcomes 1.Expand the regression model to allow for multiple X variables 2.Formalise the hypothesis test procedure using test statistics 3.Look at more general hypothesis tests a)Multiple coefficients b)Inequality hypotheses 4.Formalise a procedure for using regression for prediction

What’s Next? We now have all we need to analyse many questions Next (quick) topic will be lawyers fees But we are still missing two big items –A discussion of the theory of why OLS gives good estimators –A discussion of the circumstances which can lead to ols giving bad estimators. These will take up most of the rest of the course