Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Introduction to Regression  Typically, the social scientist is dealing with multiple and complex webs of interactions between variables. An immediate and appealing extension to simple linear regression is to extend the set of explanatory variable to other variables.  Multiple regressions include several explanatory variables in the empirical model

To minimize the sum of squared errors

Multivariate Least Square Estimator Usually, the multivariate is described by matrix notation: With the following least square solution:

Assumption OLS 1 It is possible to operate non linear transformation of the variables (e.g. log of x ) but not of the parameters like the following : Linearity The model is linear in its parameters OLS can not estimate this

Assumption OLS 2 There is no selection bias in the sample. The results pertain to the whole population All observations are independent from one another (no serial nor cross-sectional correlation) Random Sampling The n observations are a random sample of the whole population

Assumption OLS 3 No independent variable is constant. Each variable has variance which can be used with the variance of the dependent variable to compute the parameters. No exact linear relationships amongst independent variables No perfect Collinearity There is no collinearity between independent variables

Assumption OLS 4 Given any values of the independent variables (IV), the error term must have an expected value of zero. In this case, all independent variables are exogenous. Otherwise, at least one IV suffers from an endogeneity problem. Zero Conditional Mean The error term u has an expected value of zero

Sources of endogeneity Wrong specification of the model Omitted variable correlated with one RHS. Measurement errors of RHS Mutual causation between LHS and RHS Simultaneity

Assumption OLS 5 Homoskedasticity The variance of the error term, u, conditional on RHS, is the same for all values of RHS. Otherwise we speak of heteroskedasticity.

Assumption OLS 6 Normality of error term The error term is independent of all RHS and follows a normal distribution with zero mean and variance

Assumptions OLS OLS1 Linearity OLS2 Random Sampling OLS3 No perfect Collinearity OLS4 Zero Conditional Mean OLS5 Homoskedasticity OLS6 Normality of error term

Theorem 1  OLS1 - OLS4 : Unbiasedness of OLS. The set of estimated parameters is equal to the true unknown values of

Theorem 2  OLS1 – OLS5 : Variance of OLS estimate. The variance of the OLS estimator is … where R² j is the R-squared from regressing x j on all other independent variables. But how can we measure ?

Theorem 3  OLS1 – OLS5 : The standard error of the regression is defined as This is also called the standard error of the estimate or the root mean squared errors (RMSE)

Standard Error of Each Parameter  Combining theorems 2 and 3 yields:

Theorem 4  Under assumptions OLS1 – OLS5, estimators are the best linear unbiased estimators (BLUE) of Assumptions OLS1 – OLS5 are known as the Gauss- Markov Theorem, which stipulates that under OLS1-5, the OLS are the best estimation method  The estimates are unbiased (OLS1-4)  The estimates have the smallest variance (OLS5)

Theorem 5  Under assumptions OLS1 – OLS6, the OLS estimates follows a t distribution:

Extension of theorem 5: Inference  We can define de confidence interval of β, at 95% : If the 95% CI does not include 0, then β is significantly different than 0.

Student t Test for H 0 : β j =0  We are also in the position to infer on β j  H 0 : β j = 0  H 1 : β j ≠ 0 Rule of decision Accept H 0 is | t | < t α/2 Reject H 0 is | t | ≥ t α/2

Summary OLS1 Linearity OLS2 Random Sampling OLS3 No perfect Collinearity OLS4 Zero Conditional Mean OLS5 Homoskedasticity OLS6 Normality of error term T1 Unbiasedness T2-T4 BLUE T5 β ~ t

The knowledge production function Application 1: Seminal model

The knowledge production function Application 2: Changing specification

The knowledge production function Application 3: Adding variables

The knowledge production function Application 4: Dummy variables

Patent (lnpatent) Size (lnasset)

The knowledge production function Application 5: Interacting Variables

Application 5: Interacting variables Patent (lnpatent) Size (lnasset)

Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Similar presentations

Presentation on theme: "Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Similar presentations

Presentation on theme: "Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques"— Presentation transcript:

Similar presentations

About project

Feedback