Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Regression Analysis: OLS Asymptotics

Similar presentations


Presentation on theme: "Multiple Regression Analysis: OLS Asymptotics"— Presentation transcript:

1 Multiple Regression Analysis: OLS Asymptotics
So far we focused on properties of OLS that hold for any sample Properties of OLS that hold for any sample/sample size Expected values/unbiasedness under MLR.1 – MLR.4 Variance formulas under MLR.1 – MLR.5 Gauss-Markov Theorem under MLR.1 – MLR.5 Exact sampling distributions/tests under MLR.1 – MLR.6 Properties of OLS that hold in large samples Consistency under MLR.1 – MLR.4 Asymptotic normality/tests under MLR.1 – MLR.5 Without assuming nor-mality of the error term!

2 Multiple Regression Analysis: OLS Asymptotics
Practical consequences In large samples, the t-distribution is close to the N(0,1) distribution As a consequence, t-tests are valid in large samples without MLR.6 The same is true for confidence intervals and F-tests Important: MLR.1 – MLR.5 are still necessary, esp. homoscedasticity Asymptotic analysis of the OLS sampling errors Converges to Converges to Converges to a fixed number

3 Multiple Regression Analysis: OLS Asymptotics
Asymptotic analysis of the OLS sampling errors (cont.) This is why large samples are better Example: Standard errors in a birth weight equation shrinks with the rate shrinks with the rate Use only the first half of observations

4 Multiple Regression Analysis: Further Issues
Using quadratic functional forms Example: Wage equation Marginal effect of experience Concave experience profile The first year of experience increases the wage by some .30$, the second year by (.0061)(1) = .29$ etc.

5 Multiple Regression Analysis: Further Issues
Wage maximum with respect to work experience Does this mean the return to experience becomes negative after 24.4 years? Not necessarily. It depends on how many observations in the sample lie right of the turnaround point. In the given example, these are about 28% of the observations. There may be a speci-fication problem (e.g. omitted variables).

6 Multiple Regression Analysis: Further Issues
Nitrogen oxide in air, distance from em-ployment centers, student/teacher ratio Example: Effects of pollution on housing prices Does this mean that, at a low number of rooms, more rooms are associated with lower prices?

7 Multiple Regression Analysis: Further Issues
Calculation of the turnaround point Turnaround point: This area can be ignored as it concerns only 1% of the observations. Increase rooms from 5 to 6: Increase rooms from 6 to 7:

8 Multiple Regression Analysis: Further Issues
Other possibilities Higher polynomials

9 Multiple Regression Analysis: Further Issues
Models with interaction terms Interaction effects complicate interpretation of parameters Interaction term The effect of the number of bedrooms depends on the level of square footage Effect of number of bedrooms, but for a square footage of zero

10 Multiple Regression Analysis: Further Issues
Reparametrization of interaction effects Advantages of reparametrization Easy interpretation of all parameters Standard errors for partial effects at the mean values available If necessary, interaction may be centered at other interesting values Population means; may be replaced by sample means Effect of x2 if all variables take on their mean values

11 Multiple Regression Analysis: Further Issues
More on goodness-of-fit and selection of regressors General remarks on R-squared A high R-squared does not imply that there is a causal interpretation A low R-squared does not preclude precise estimation of partial effects Adjusted R-squared What is the ordinary R-squared supposed to measure? is an estimate for Population R-squared

12 Multiple Regression Analysis: Further Issues
Correct degrees of freedom of nominator and denominator Adjusted R-squared (cont.) A better estimate taking into account degrees of freedom would be The adjusted R-squared imposes a penalty for adding new regressors The adjusted R-squared increases if, and only if, the t-statistic of a newly added regressor is greater than one in absolute value Relationship between R-squared and adjusted R-squared The adjusted R-squared may even get negative

13 Multiple Regression Analysis: Further Issues
Using adjusted R-squared to choose between nonnested models Models are nonnested if neither model is a special case of the other A comparison between the R-squared of both models would be unfair to the first model because the first model contains fewer parameters In the given example, even after adjusting for the difference in degrees of freedom, the quadratic model is preferred

14 Multiple Regression Analysis: Further Issues
Comparing models with different dependent variables R-squared or adjusted R-squared must not be used to compare models which differ in their definition of the dependent variable Example: CEO compensation and firm performance There is much less variation in log(salary) that needs to be explained than in salary

15 Multiple Regression Analysis: Further Issues
Controlling for too many factors in regression analysis In some cases, certain variables should not be held fixed In a regression of traffic fatalities on state beer taxes (and other factors) one should not directly control for beer consumption In a regression of family health expenditures on pesticide usage among farmers one should not control for doctor visits Different regressions may serve different purposes In a regression of house prices on house characteristics, one would only include price assessments if the purpose of the regression is to study their validity; otherwise one would not include them

16 Multiple Regression Analysis: Further Issues
Adding regressors to reduce the error variance Adding regressors may excarcerbate multicollinearity problems On the other hand, adding regressors reduces the error variance Variables that are uncorrelated with other regressors should be added because they reduce error variance without increasing multicollinearity However, such uncorrelated variables may be hard to find Example: Individual beer consumption and beer prices Including individual characteristics in a regression of beer consumption on beer prices leads to more precise estimates of the price elasticity

17 Multiple Regression Analysis: Further Issues
Predicting y when log(y) is the dependent variable Under the additional assumption that is independent of : Prediction for y

18 Multiple Regression Analysis: Further Issues
Comparing R-squared of a logged and an unlogged specification These are the R-squareds for the predictions of the unlogged salary variable (although the second regression is originally for logged salaries). Both R-squareds can now be directly compared.

19 Multiple Regression Analysis: Qualitative Information
Examples: gender, race, industry, region, rating grade, … A way to incorporate qualitative information is to use dummy variables They may appear as the dependent or as independent variables A single dummy independent variable = the wage gain/loss if the person is a woman rather than a man (holding other things fixed) Dummy variable: =1 if the person is a woman =0 if the person is man

20 Multiple Regression Analysis: Qualitative Information
Graphical Illustration Alternative interpretation of coefficient: i.e. the difference in mean wage between men and women with the same level of education. Intercept shift

21 Multiple Regression Analysis: Qualitative Information
Dummy variable trap This model cannot be estimated (perfect collinearity) When using dummy variables, one category always has to be omitted: The base category are men The base category are women Alternatively, one could omit the intercept: Disadvantages: 1) More difficult to test for diffe-rences between the parameters 2) R-squared formula only valid if regression contains intercept

22 Multiple Regression Analysis: Qualitative Information
Estimated wage equation with intercept shift Does that mean that women are discriminated against? Not necessarily. Being female may be correlated with other produc-tivity characteristics that have not been controlled for. Holding education, experience, and tenure fixed, women earn 1.81$ less per hour than men

23 Multiple Regression Analysis: Qualitative Information
Comparing means of subpopulations described by dummies Discussion It can easily be tested whether difference in means is significant The wage difference between men and women is larger if no other things are controlled for; i.e. part of the difference is due to differ-ences in education, experience and tenure between men and women Not holding other factors constant, women earn 2.51$ per hour less than men, i.e. the difference between the mean wage of men and that of women is 2.51$.

24 Multiple Regression Analysis: Qualitative Information
Further example: Effects of training grants on hours of training This is an example of program evaluation Treatment group (= grant receivers) vs. control group (= no grant) Is the effect of treatment on the outcome of interest causal? Hours training per employee Dummy indicating whether firm received training grant

25 Multiple Regression Analysis: Qualitative Information
Using dummy explanatory variables in equations for log(y) Dummy indicating whether house is of colonial style As the dummy for colonial style changes from 0 to 1, the house price increases by 5.4 percentage points

26 Multiple Regression Analysis: Qualitative Information
Using dummy variables for multiple categories 1) Define membership in each category by a dummy variable 2) Leave out one category (which becomes the base category) Holding other things fixed, married women earn 19.8% less than single men (= the base category)

27 Multiple Regression Analysis: Qualitative Information
Incorporating ordinal information using dummy variables Example: City credit ratings and municipal bond interest rates Municipal bond rate Credit rating from 0-4 (0=worst, 4=best) This specification would probably not be appropriate as the credit rating only contains ordinal information. A better way to incorporate this information is to define dummies: Dummies indicating whether the particular rating applies, e.g. CR1=1 if CR=1 and CR1=0 otherwise. All effects are measured in comparison to the worst rating (= base category).

28 Multiple Regression Analysis: Qualitative Information
Interactions involving dummy variables Allowing for different slopes Interesting hypotheses Interaction term = intercept men = slope men = intercept women = slope women The return to education is the same for men and women The whole wage equation is the same for men and women

29 Multiple Regression Analysis: Qualitative Information
Graphical illustration Interacting both the intercept and the slope with the female dummy enables one to model completely independent wage equations for men and women

30 Multiple Regression Analysis: Qualitative Information
Estimated wage equation with interaction term Does this mean that there is no significant evidence of lower pay for women at the same levels of educ, exper, and tenure? No: this is only the effect for educ = 0. To answer the question one has to recenter the interaction term, e.g. around educ = 12.5 (= average education). No evidence against hypothesis that the return to education is the same for men and women

31 Multiple Regression Analysis: Qualitative Information
Testing for differences in regression functions across groups Unrestricted model (contains full set of interactions) Restricted model (same regression for both groups) College grade point average Standardized aptitude test score High school rank percentile Total hours spent in college courses

32 Multiple Regression Analysis: Qualitative Information
Null hypothesis Estimation of the unrestricted model All interaction effects are zero, i.e. the same regression coefficients apply to men and women Tested individually, the hypothesis that the interaction effects are zero cannot be rejected

33 Multiple Regression Analysis: Qualitative Information
Joint test with F-statistic Alternative way to compute F-statistic in the given case Run separate regressions for men and for women; the unrestricted SSR is given by the sum of the SSR of these two regressions Run regression for the restricted model and store SSR If the test is computed in this way it is called the Chow-Test Important: Test assumes a constant error variance accross groups Null hypothesis is rejected

34 Multiple Regression Analysis: Qualitative Information
A Binary dependent variable: the linear probability model Linear regression when the dependent variable is binary If the dependent variable only takes on the values 1 and 0 Linear probability model (LPM) In the linear probability model, the coefficients describe the effect of the explanatory variables on the probability that y=1

35 Multiple Regression Analysis: Qualitative Information
Example: Labor force participation of married women =1 if in labor force, =0 otherwise Non-wife income (in thousand dollars per year) If the number of kids under six years increases by one, the pro- probability that the woman works falls by 26.2% Does not look significant (but see below)

36 Multiple Regression Analysis: Qualitative Information
Example: Female labor participation of married women (cont.) Graph for nwifeinc=50, exper=5, age=30, kindslt6=1, kidsge6=0 The maximum level of education in the sample is educ=17. For the gi-ven case, this leads to a predicted probability to be in the labor force of about 50%. Negative predicted probability but no problem because no woman in the sample has educ < 5.

37 Multiple Regression Analysis: Qualitative Information
Disadvantages of the linear probability model Predicted probabilities may be larger than one or smaller than zero Marginal probability effects sometimes logically impossible The linear probability model is necessarily heteroskedastic Heterosceasticity consistent standard errors need to be computed Advantanges of the linear probability model Easy estimation and interpretation Estimated effects and predictions often reasonably good in practice Variance of Ber-noulli variable

38 Multiple Regression Analysis: Qualitative Information
More on policy analysis and program evaluation Example: Effect of job training grants on worker productivity Percentage of defective items =1 if firm received training grant, =0 otherwise No apparent effect of grant on productivity Treatment group: grant reveivers, Control group: firms that received no grant Grants were given on a first-come, first-served basis. This is not the same as giving them out randomly. It might be the case that firms with less productive workers saw an opportunity to improve productivity and applied first.

39 Multiple Regression Analysis: Qualitative Information
Self-selection into treatment as a source for endogeneity In the given and in related examples, the treatment status is probably related to other characteristics that also influence the outcome The reason is that subjects self-select themselves into treatment depending on their individual characteristics and prospects Experimental evaluation In experiments, assignment to treatment is random In this case, causal effects can be inferred using a simple regression The dummy indicating whether or not there was treatment is unrelated to other factors affecting the outcome.

40 Multiple Regression Analysis: Qualitative Information
Further example of an endogenuous dummy regressor Are nonwhite customers discriminated against? It is important to control for other characteristics that may be important for loan approval (e.g. profession, unemployment) Omitting important characteristics that are correlated with the non-white dummy will produce spurious evidence for discriminiation Dummy indicating whether loan was approved Race dummy Credit rating


Download ppt "Multiple Regression Analysis: OLS Asymptotics"

Similar presentations


Ads by Google