Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief introduction into linear regression 2. How to do a regression 3. How to interpret the output in STATA
linear regression 1.A brief introduction into linear regression models In order to find out the relationship between one aspect (variable1) and another aspect (variable2), one may run a regression model. e.g. what is the impact of work experience on the level of wages A regression measures whether and by which extent an exogenous (independent) variable affects an endogenous (dependent) variable.
linear regression 1.A brief introduction into linear regression A regression indicates how much and in which way a independent variable influences a dependent variable. One can distinguish between: A positive and negative correlation A high and low correlation A significant or insignificant impact Definition of significance: You test at which significance level (e.g. 0, 5%, 10%-level) you can reject the hypothesis that the variable has zero impact (so-called “Null- Hypothesis”) or H0.
linear regression 1.A brief introduction into linear regression the general multivariate model (with many explantory variables): γ i indicates the dependent/endogenous variable x 1i,ki exogenous variable, explaining/independent variable β 0 constant, y- axis intercept (if x = 0) β 1,2,k regression coefficient, parameters of regression ε i residual, disturbance term (should be normally distributed, expected value of 0, constant variance)
linear regression 1.short introduction into linear regression In a simple linear regression model, there is beside the constant only one regression coefficient: γ i indicates the dependent/endogenous variable x 1i exogenous variable, explaining/independent variable β 0 constant, y- axis intercept (if x = 0) β 1 regression coefficient, parameter of regression ε i residual, disturbance term
linear regression 1.short introduction into linear regression In a simple linear regression model, there is beside the constant only one regression coefficient: Thus, in this simple linear regression model γ i is explained by the variable x 1i. Moreover there is a constant variable β 0 and x 1i is weighted by β 1. β 1 can be interpreted e.g. as the effect of an e.g. increase of x 1i by one unit on the output (or wage) γ i.
linear regression 1.short introduction into linear regression In a simple linear regression model, there is beside the constant only one regression coefficient: ε i is the disturbance variable and indicates the difference between the result of our estimation done by the regression model and “reality”, the true observed value. The regression is done with the of Ordinary Least Squares (OLS) estimator, which minimizes the squared value of the residual. Nevertheless there is still deviation between the true and the estimated values since we have a stochastic and not a (purely) deterministic relationship.
linear regression 1.short introduction into linear regression β0β0 εiεi β 1i γiγi x 1i
linear regression 1.short introduction into linear regression Models with fixed effects/dummy variables. You consider beyond one constant more constants (intercept) terms for each group. Where α are the constants and dummy variables, and i = 1, 2, … N is the group (e.g. education or experience) index. Thus, you consider N-1 dummy variables which creates a different intercept (constant) for each group. The slope parameter remains however uniform for all groups.
linear regression 2. How to do a regression with STATA > regress depvar [indepvars] [if] [, options ] after the command you first set the dependent variable (=endogenous variable, the variable you want to explain), after that you put the independent variables (exogenous variables). example:
linear regression 3. How to interpret the output of a regression fitting of the model analysis of the coefficients analysis of the variance of the model β0β0 β1β1
linear regression 3. How to interpret the output of a regression analysis of the coefficients β 0 indicates the output if there is no x 1i, i.e. if there is no income the outcome variable (sqm) would be equal to the value of coefficient β 0. β 1 describes how much the output changes if there would be an increase of hhinc by one. Can be positive or negative (-> positive or negative correlation) β0β0 β1β1
linear regression 3. How to interpret the output of a regression analysis of the coefficients Moreover the output gives us the standard error, the t- value and the p-value. The standard error is a measure for the precision of the parameter estimate. The t-value is the coefficient divided by the standard error. As a rule of the thumb, a t-value of 2.0 indicated that the coefficient is significantly different from zero at the 5% level, a t-value of 2.64 that it is differently from zero at the 1% level. β0β0 β1β1
linear regression 3. How to interpret the output of a regression analysis of the coefficients The p-value provides the accurate significance level for the rejection of the Null Hypothesis, i.e. that the estimated parameter is different from zero. A p-value < 0.05 indicates that the significance level is 5%. Reporting: Your report the coefficient and either the standard error or the t-statistics, and indicate the significance levels by stars behind the coefficient. E.g. *** suggest a significance level of 1%, ** of 5%, * of 10%. β0β0 β1β1
linear regression 3. How to interpret the output of a regression fit of the model A good value about the fit of our regression is R-squared, R 2 indicates how good our model can explain the “real” values of y. R 2 = 1 -> perfect fit, our model can explain every single value R 2 = 0 -> no fit, our model is rather useless
linear regression 3. How to interpret the output of a regression fit of the model The adjusted R squared corrects the R squared for the number of variables considered. It’s a slightly better measure than the R squared. Report either the R squared or the adjusted R squared. The other measures are usually not reported.
linear regression 3. How to interpret the output of a regression Analysis of the variance of the model This part of the output indicates the variation of the model and of the residual.