Presentation is loading. Please wait.

Presentation is loading. Please wait.

Econ 140 Lecture 181 Multiple Regression Applications III Lecture 18.

Similar presentations


Presentation on theme: "Econ 140 Lecture 181 Multiple Regression Applications III Lecture 18."— Presentation transcript:

1 Econ 140 Lecture 181 Multiple Regression Applications III Lecture 18

2 Econ 140 Lecture 182 Dummy variables Include qualitative indicators into the regression: e.g. gender, race, regime shifts. So far, have only seen the change in the intercept for the regression line. Suppose now we wish to investigate if the slope changes as well as the intercept. This can be written as a general equation: W i = a + b 1 Age i + b 2 Married i + b 3 D i + b 4 (D i *Age i ) + b 5 (D i *Married i ) + e i Suppose first we wish to test for the difference between males and females.

3 Econ 140 Lecture 183 Interactive terms For females and males separately, the model would be: W i = a + b 1 Age i + b 2 Married i + e –in so doing we argue that would be different for males and females –we want to think about two sub-sample groups: males and females –we can test the hypothesis that the intercept and partial slope coefficients will be different for these 2 groups

4 Econ 140 Lecture 184 Interactive terms (2) To test our hypothesis we’ll estimate the regression equation above (W i = a + b 1 Age i + b 2 Married i + e) for the whole sample and then for the two sub-sample groups We test to see if our estimated coefficients are the same between males and females Our null hypothesis is: H 0 : a M, b 1M, b 2M = a F, b 1F, b 2F

5 Econ 140 Lecture 185 Interactive terms (3) We have an unrestricted form and a restricted form –unrestricted: used when we estimate for the sub-sample groups separately –restricted: used when we estimate for the whole sample What type of statistic will we use to carry out this test? –F-statistic: q = k, the number of parameters in the model n = n 1 + n 2 where n is complete sample size

6 Econ 140 Lecture 186 Interactive terms (4) The sum of squared residuals for the unrestricted form will be: SSR U = SSR M + SSR F L17_2.xls –the data is sorted according to the dummy variable “female” –there is a second dummy variable for marital status –there are 3 estimated regression equations, one each for the total sample, male sub-sample, and female sub- sample

7 Econ 140 Lecture 187 Interactive terms (5) The output allows us to gather the necessary sum of squared residuals and sample sizes to construct the test statistic: –Since F 0.05,3, 27 = 2.96 > F* we cannot reject the null hypothesis that the partial slope coefficients are the same for males and females

8 Econ 140 Lecture 188 Interactive terms (6) What if F* > F 0.05,3, 27 ? How to read the results? –There’s a difference between the two sub-samples and therefore we should estimate the wage equations separately –Or we could interact the dummy variables with the other variables To interact the dummy variables with the age and marital status variables, we multiply the dummy variable by the age and marital status variables to get: W t = a + b 1 Age i + b 2 Married i + b 3 D i + b 4 (D i *Age i ) + b 5 (D i *Married i ) + e i Irene O. Wong:

9 Econ 140 Lecture 189 Interactive terms (7) Using L17_2.xls you can construct the interactive terms by multiplying the FEMALE column by the AGE and MARRIED columns –one way to see if the two sub-samples are different, look at the t-ratios on the interactive terms –in this example, neither of the t-ratios are statistically significant so we can’t reject the null hypothesis

10 Econ 140 Lecture 1810 Interactive terms (8) If we want to estimate the equation for the first sub-sample (males) we take the expectation of the wage equation where the dummy variable for female takes the value of zero: E(W t |D i = 0) = a + b 1 Age i + b 2 Married i We can do the same for the second sub-sample (Females) E(W t |D i = 1) = (a + b 3 ) + (b 1 + b 4 )Age i + (b 2 + b 3 ) Married i We can see that by using only one regression equation, we have allowed the intercept and partial slope coefficients to vary by sub-sample

11 Econ 140 Lecture 1811 Phillips Curve example Phillips curve as an example of a regime shift. Data points from 1950 - 1970: There is a downward sloping, reciprocal relationship between wage inflation and unemployment W UNUN

12 Econ 140 Lecture 1812 Phillips Curve example (2) But if we look at data points from 1971 - 1996: From the data we can detect an upward sloping relationship ALWAYS graph the data between the 2 main variables of interest W UNUN

13 Econ 140 Lecture 1813 Phillips Curve example (3) There seems to be a regime shift between the two periods –note: this is an arbitrary choice of regime shift - it was not dictated by a specific change We will use the Chow Test (F-test) to test for this regime shift –the test will use a restricted form: –it will also use an unrestricted form: –D is the dummy variable for the regime shift, equal to 0 for 1950-1970 and 1 for 1971-1996

14 Econ 140 Lecture 1814 Phillips Curve example (4) L17_3.xls estimates the restricted regression equations and calculates the F-statistic for the Chow Test: The null hypothesis will be: H 0 : b 1 = b 3 = 0 –we are testing to see if the dummy variable for the regime shift alters the intercept or the slope coefficient The F-statistic is (* indicates restricted) Where q=2

15 Econ 140 Lecture 1815 Phillips Curve example (5) The expectation of wage inflation for the first time period: The expectation of wage inflation for the second time period: You can use the spreadsheet data to carry out these calculations

16 Econ 140 Lecture 1816 Relaxing Assumptions Lecture 18

17 Econ 140 Lecture 1817 Today’s Plan A review of what we have learned in regression so far and a look forward to what we will happen when we relax assumptions around the regression line Introduction to new concepts: –Heteroskedasticity –Serial correlation (also known as autocorrelation) –Non-independence of independent variables

18 Econ 140 Lecture 1818 CLRM Revision Calculating the linear regression model (using OLS) Use of the sum of square residuals: calculate the variance for the regression line and the mean squared deviation Hypothesis tests: t-tests, F-tests,  2 test. Coefficient of determination (R 2 ) and the adjustment. Modeling: use of log-linear, logs, reciprocal. Relationship between F and R 2 Imposing linear restrictions: e.g. H 0 : b 2 = b 3 = 0 (q = 2); H 0 :  +  = 1. Dummy variables and interactions; Chow test.

19 Econ 140 Lecture 1819 Relaxing assumptions What are the assumptions we have used throughout? Two assumptions about the population for the bi-variate case: 1. E(Y|X) = a + bX (the conditional expectation function is linear); 2. V(Y|X) = (conditional variances are constant) Assumptions concerning the sampling procedure (i= 1..n) 1. Values of X i (not all equal) are prespecified; 2. Y i is drawn from the subpopulation having X = X i ; 3. Y i ‘s are independent. Consequences are: 1. E(Y i ) = a + bX i ; 2. V(Y i ) =  2 ; 3. C(Y h, Y i ) = 0 –How can we test to see if these assumptions don’t hold? –What can we do if the assumptions don’t hold?

20 Econ 140 Lecture 1820 Homoskedasticity We would like our estimates to be BLUE We need to look out for three potential violations of the CLRM assumptions: heteroskedasticity, autocorrelation, and non-independence of X (or simultaneity bias). Heteroskedasticity: usually found in cross-section data (and longitudinal) In earlier lectures, we saw that the variance of is –This is an example of homoskedasticity, where the variance is constant

21 Econ 140 Lecture 1821 Homoskedasticity (2) Homoskedasticity can be illustrated like this: constant variance around the regression line Y X X1X1 X2X2 X3X3

22 Econ 140 Lecture 1822 Heteroskedasticity But, we don’t always have constant variance  2 –We may have a variance that varies with each observation, or When there is heteroskedasticty, the variance around the regression line varies with the values of X

23 Econ 140 Lecture 1823 Heteroskedasticity (2) The non-constant variance around the regression line can be drawn like this: X X1X1 X2X2 X3X3 Y

24 Econ 140 Lecture 1824 Serial (auto) correlation Serial correlation can be found in time series data (and longitudinal data) Under serial correlation, we have covariance terms –where Y i and Y h are correlated or each Y i is not independently drawn –This results in nonzero covariance terms

25 Econ 140 Lecture 1825 Serial (auto) correlation (2) Example: We can think of this using time series data such that unemployment at time t is related to unemployment in the previous time period t-1 If we have a model with unemployment as the dependent variable Y t then –Y t and Y t-1 are related –e t and e t-1 are also related

26 Econ 140 Lecture 1826 Non-independence The non-independence of independent variables is the third violation of the ordinary least squares assumptions Remember from the OLS derivation that we minimized the sum of the squared residuals –we needed independence between the X variable and the error term –if not, the values of X are not pre-specified –without independence, the estimates are biased

27 Econ 140 Lecture 1827 Summary Heteroskedasticity and serial correlation –make the estimates inefficient –therefore makes the estimated standard errors incorrect Non-independence of independent variables –makes estimates biased –instrumental variables and simultaneous equations are used to deal with this third type of violation Starting next lecture we’ll take a more in-depth look at the three violations of the CLRM assumptions


Download ppt "Econ 140 Lecture 181 Multiple Regression Applications III Lecture 18."

Similar presentations


Ads by Google