Presentation is loading. Please wait.

Presentation is loading. Please wait.

Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Similar presentations


Presentation on theme: "Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques"— Presentation transcript:

1 Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

2 Introduction to Regression  Typically, the social scientist is dealing with multiple and complex webs of interactions between variables. An immediate and appealing extension to simple linear regression is to extend the set of explanatory variable to other variables.  Multiple regressions include several explanatory variables in the empirical model

3 Introduction to Regression  Typically, the social scientist is dealing with multiple and complex webs of interactions between variables. An immediate and appealing extension to simple linear regression is to extend the set of explanatory variable to other variables.  Multiple regressions include several explanatory variables in the empirical model

4 To minimize the sum of squared errors

5 Multivariate Least Square Estimator Usually, the multivariate is described by matrix notation: With the following least square solution:

6 Assumption OLS 1 It is possible to operate non linear transformation of the variables (e.g. log of x ) but not of the parameters like the following : Linearity The model is linear in its parameters OLS can not estimate this

7 Assumption OLS 2 There is no selection bias in the sample. The results pertain to the whole population All observations are independent from one another (no serial nor cross-sectional correlation) Random Sampling The n observations are a random sample of the whole population

8 Assumption OLS 3 No independent variable is constant. Each variable has variance which can be used with the variance of the dependent variable to compute the parameters. No exact linear relationships amongst independent variables No perfect Collinearity There is no collinearity between independent variables

9 Assumption OLS 4 Given any values of the independent variables (IV), the error term must have an expected value of zero. In this case, all independent variables are exogenous. Otherwise, at least one IV suffers from an endogeneity problem. Zero Conditional Mean The error term u has an expected value of zero

10 Sources of endogeneity Wrong specification of the model Omitted variable correlated with one RHS. Measurement errors of RHS Mutual causation between LHS and RHS Simultaneity

11 Assumption OLS 5 Homoskedasticity The variance of the error term, u, conditional on RHS, is the same for all values of RHS. Otherwise we speak of heteroskedasticity.

12 Assumption OLS 6 Normality of error term The error term is independent of all RHS and follows a normal distribution with zero mean and variance

13 Assumptions OLS OLS1 Linearity OLS2 Random Sampling OLS3 No perfect Collinearity OLS4 Zero Conditional Mean OLS5 Homoskedasticity OLS6 Normality of error term

14 Theorem 1  OLS1 - OLS4 : Unbiasedness of OLS. The set of estimated parameters is equal to the true unknown values of

15 Theorem 2  OLS1 – OLS5 : Variance of OLS estimate. The variance of the OLS estimator is … where R² j is the R-squared from regressing x j on all other independent variables. But how can we measure ?

16 Theorem 3  OLS1 – OLS5 : The standard error of the regression is defined as This is also called the standard error of the estimate or the root mean squared errors (RMSE)

17 Standard Error of Each Parameter  Combining theorems 2 and 3 yields:

18 Theorem 4  Under assumptions OLS1 – OLS5, estimators are the best linear unbiased estimators (BLUE) of Assumptions OLS1 – OLS5 are known as the Gauss- Markov Theorem, which stipulates that under OLS1-5, the OLS are the best estimation method  The estimates are unbiased (OLS1-4)  The estimates have the smallest variance (OLS5)

19 Theorem 5  Under assumptions OLS1 – OLS6, the OLS estimates follows a t distribution:

20 Extension of theorem 5: Inference  We can define de confidence interval of β, at 95% : If the 95% CI does not include 0, then β is significantly different than 0.

21 Student t Test for H 0 : β j =0  We are also in the position to infer on β j  H 0 : β j = 0  H 1 : β j ≠ 0 Rule of decision Accept H 0 is | t | < t α/2 Reject H 0 is | t | ≥ t α/2

22 Summary OLS1 Linearity OLS2 Random Sampling OLS3 No perfect Collinearity OLS4 Zero Conditional Mean OLS5 Homoskedasticity OLS6 Normality of error term T1 Unbiasedness T2-T4 BLUE T5 β ~ t

23 The knowledge production function Application 1: Seminal model

24 The knowledge production function Application 2: Changing specification

25 The knowledge production function Application 3: Adding variables

26 The knowledge production function Application 4: Dummy variables

27

28 Patent (lnpatent) Size (lnasset)

29 The knowledge production function Application 5: Interacting Variables

30

31 Application 5: Interacting variables Patent (lnpatent) Size (lnasset)


Download ppt "Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques"

Similar presentations


Ads by Google