Chapter 7: The Normality Assumption and Inference with OLS

Slides:



Advertisements
Similar presentations
Inference in the Simple Regression Model
Advertisements

The Multiple Regression Model.
1 Javier Aparicio División de Estudios Políticos, CIDE Otoño Regresión.
Hypothesis Testing Steps in Hypothesis Testing:
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Econ 488 Lecture 5 – Hypothesis Testing Cameron Kaplan.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Chapter 10 Simple Regression.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
The Simple Regression Model
Stat 112 – Notes 3 Homework 1 is due at the beginning of class next Thursday.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
Chapter 11 Multiple Regression.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Inference about a Mean Part II
Chapter 9 Hypothesis Testing.
BCOR 1020 Business Statistics
ECONOMETRICS I CHAPTER 5: TWO-VARIABLE REGRESSION: INTERVAL ESTIMATION AND HYPOTHESIS TESTING Textbook: Damodar N. Gujarati (2004) Basic Econometrics,
AM Recitation 2/10/11.
Confidence Intervals and Hypothesis Testing - II
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
4.2 One Sided Tests -Before we construct a rule for rejecting H 0, we need to pick an ALTERNATE HYPOTHESIS -an example of a ONE SIDED ALTERNATIVE would.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
May 2004 Prof. Himayatullah 1 Basic Econometrics Chapter 5: TWO-VARIABLE REGRESSION: Interval Estimation and Hypothesis Testing.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
3-1 MGMG 522 : Session #3 Hypothesis Testing (Ch. 5)
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
11 Chapter 5 The Research Process – Hypothesis Development – (Stage 4 in Research Process) © 2009 John Wiley & Sons Ltd.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Chapter 13 Understanding research results: statistical inference.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 11: Between-Subjects Designs 1.
Multiple Regression Analysis: Inference
Chapter Nine Hypothesis Testing.
Multiple Regression Analysis: Inference
Chapter 4: Basic Estimation Techniques
Chapter 14 Introduction to Multiple Regression
Chapter 4 Basic Estimation Techniques
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Chapter 5 STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESES TESTING
Basic Estimation Techniques
Inference and Tests of Hypotheses
Multiple Regression Analysis: Inference
Hypothesis Testing: Hypotheses
Further Inference in the Multiple Regression Model
Basic Estimation Techniques
Chapter 9 Hypothesis Testing.
Hypothesis tests for the difference between two means: Independent samples Section 11.1.
Chapter 6: MULTIPLE REGRESSION ANALYSIS
Chapter 9 Hypothesis Testing.
Problems: Q&A chapter 6, problems Chapter 6:
Discrete Event Simulation - 4
Multiple Regression Models
Chapter Nine Part 1 (Sections 9.1 & 9.2) Hypothesis Testing
Interval Estimation and Hypothesis Testing
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE
Seminar in Economics Econ. 470
Presentation transcript:

Chapter 7: The Normality Assumption and Inference with OLS Econometrics Econ. 504 Chapter 7: The Normality Assumption and Inference with OLS

I. Role of Normality Assumption  

 

Different Distributions

 

It is assumed that the unobserved factors are normally distributed around the population regression function. The form and the variance of the distribution does not depend on any of the explanatory variables.

Since this assumption of normality is so crucial, then we have to add it to the CLRM assumptions. This assumption is stronger than any other previous assumptions as it already contains CLRM assumptions by default (zero conditional mean of u , & homoskedasticity).

 

Remember, normality assumption is not required to perform OLS estimation, but it is necessary only when you need to produce confidence intervals and/or perform hypothesis tests with OLS estimates.

II. Estimations under the Normality Assumption The error term contains the influence of many different forces (random variables) that affect DV (Y) and are not captured by IV (Xs). The assumption of normality in (u) indicates that the sum of random variables is normally distributed as long as many random variables are present and the influence of any one random variable is small.

In some applications, the assumption of normality for the error term is difficult to be justified. For example, Y has limited or skewed values (i.e wage, prices), then you can use log values to obtain a distribution that’s approximately normal. Under normality, OLS is the best unbiased estimator. Also, for the purposes of statistical inference, the assumption of normality can be replaced by a large sample size

 

 

   

III. OLS Standard Errors & the t-distribution  

Therefore, the appropriate probability distribution becomes “t “ instead of standard normal: Note the t-distribution is close to the standard normal distribution if n-k-1 is large.

Testing the Significance of Individual Regression Coefficient: Once you estimate a regression and have your OLS estimates, you need to know what conclusion can be obtained from your results. You should know what the selected variables suggest about your hypothesized relationship? What is the probability that results like the ones you produced were the result of chance? To address these issues, you need to test the individual significance of your regression coefficients.

A regression coefficient is statistically significant ( meaning the results did not happen just by chance) if you can provide solid evidence that the true parameter value isn’t zero. In order to provide strong evidence that the true parameter value isn’t zero, you need to show that it is highly unlikely that the (X) variable associated with that coefficient has no effect on your dependent (Y) variable.

 

 

The statistical significance of the coefficient does not determine the importance of the variable and the magnitude of its effect. Keep in mind that statistical significance provides only evidence of a positive or negative effect ( in case of one-sided test). For magnitude and importance, you need to focus on the value of the coefficient.

IV. Testing of Significance Approach You can report the statistical significance of your coefficients ( result of your hypothesis test) with either the test of significance approach or the confidence interval approach. The test of significance approach gives you a test statistic that is used to determine the liklihood of your hypothesis. The confidence interval approach provides a range of possible values for your estimator.

First: Confidence Interval Approach Provides a range ( lower and upper limit) of values that would contain the true value (parameter) . If you’re testing a hypothesis, the values of your estimated interval relative to the assumed value of the parameter determine whether you reject the null hypothesis or do not reject the null hypothesis.

Lower limit Upper limit If the hypothesized value of your parameter of interest is in the critical region, you fail to reject the null hypothesis. If it is in the confidence interval, you reject the null hypothesis. Critical region αl 2 Confidence interval 1-α Critical region αl 2

 

 

. Testing against one-sided alternatives (greater than zero) Reject the null hypothesis in favour of the alternative hypothesis if the estimated coefficient is too large (i.e. larger than a critical value). Construct the critical value so that, if the null hypothesis is true, it is rejected in, for example, 5% of the cases. In the given example, this is the point of the t-distribution with 28 degrees of freedom that is exceeded in 5% of the cases. Conclusion: Reject if t-statistic greater than 1.701 .  

 

  Critical values for the 5% and the 1% significance level (these are conventional significance levels). The null hypothesis is rejected because the t-statistic exceeds the critical value. The effect of experience on hourly wage is statistically greater than zero at the 5% (and even at the 1%) significance level.“

. Testing against one-sided alternatives (less than zero) Reject the null hypothesis in favour of the alternative hypothesis if the estimated coefficient is too small (i.e. Smaller than a critical value). Construct the critical value so that, if the null hypothesis is true, it is rejected in, for example, 5% of the cases. In the given example, this is the point of the t-distribution with 18 degrees of freedom so that 5% of the cases are below the point. Conclusion: Reject if t-statistic less than -1.734. .  

 

Example: Math test = 2.𝟐𝟕𝟒+ 0.00046 Tech-W +𝟎.𝟎𝟒𝟖 𝒔𝒕𝒂𝒇𝒇-𝟎.𝟎𝟎𝟎𝟐 𝒆𝒏𝒓oll (6.113) (.00010) (.040) (.00022) t-Statistic = Degrees of freedom = Critical values for the 5% and the 15% significance level. The null hypothesis is not rejected because the t-statistic is not smaller than the critical value. One cannot reject the hypothesis that there is no effect of school size on student performance (not even for a lax significance level of 15%).

. Testing against two-sided alternatives Reject the null hypothesis in favour of the alternative hypothesis if the absolute value of the estimated coefficient is too large. Construct the critical value so that, if the null hypothesis is true, it is rejected in, for example, 5% of the cases. In the given example, these are the points of the t-distribution so that 5% of the cases lie in the two tails. Conclusion: Reject if absolute value of t-statistic is less than -2.06 or greater than 2.06. .  

 

Example: Coll_GPA = 1.𝟑𝟗+ 0.421 hsGPA +𝟎.𝟎𝟏𝟓 𝑨𝑪𝑻- 𝟎.𝟎𝟖𝟑 𝒔𝒌𝒊𝒑𝒑𝒆𝒅 (0.33) (.094) (.011) (.026) t-Statistic : The effects of hsGPA and skipped are significantly different from zero at the 1% significance level. The effect of ACT is not significantly different from zero, not even at the 10% significance level.

General Conclusion For Testing: If a regression coefficient is different from zero in a two-sided test, the corresponding variable is said to be ”statistically significant” If the number of degrees of freedom is large enough ( greater than 120) so that the normal approximation applies, the following rules of thumb apply: statistically significant at 10 % level statistically significant at 5% level statistically significant at 1 % level

If a variable is statistically significant, discuss the magnitude of the coefficient to get an idea of its economic or practical importance. The fact that a coefficient is statistically significant does not necessarily mean it is economically or practically significant.. HOW? If a variable is statistically and economically important but has the „wrong“ sign, the regression model might be misspecified.

If a variable is statistically insignificant at the usual levels (10%, 5%, 1%), one may think of dropping it from the regression. If the sample size is small, effects might be imprecisely estimated so that the case for dropping insignificant variables is less strong. If on the basis of a test of significance in accepting H0, do not say we accept H0. It is preferable to say “do not reject” rather than “accept.”

Computing p-values for t-tests If the significance level is made smaller and smaller, there will be a point where the null hypothesis cannot be rejected anymore. P-value is the smallest level of significance at which the null hypothesis can be rejected. The reason is that, by lowering the significance level, one wants to avoid more and more to make the error of rejecting a correct H0.

The smallest significance level at which the null hypothesis is still rejected, is called the p-value of the hypothesis test. A small p-value is evidence against the null hypothesis because one would reject the null hypothesis even at small significance levels. A large p-value is evidence in favor of the null hypothesis’ P-values are more informative than tests at fixed significance levels.

V. Testing multiple linear restrictions: The F test Model 1- Joint Significance Test Testing exclusion restrictions Salary of major league base ball player Years in the league Average number of games per year Batting average Home runs per year Runs batted in per year against Test whether performance measures have no effect/can be excluded from regression.

Estimation of the unrestricted model None of these variabels is statistically significant when tested individually Idea: How would the model fit be if these variables were dropped from the regression?

Estimation of the restricted model The sum of squared residuals necessarily increases, but is the increase statistically significant? Number of restrictions The relative increase of the sum of squared residuals when going from H1 to H0 follows a F-distribution (if the null hypothesis H0 is correct)

Rejection rule The F-distributed variable only takes on positive values. This corresponds to the fact that the sum of squared residuals can only increase if one moves from H1 to H0. Choose the critical value so that the null hypo-thesis is rejected in, for example, 5% of the cases, although it is true.

Test decision in example Discussion The three variables are “jointly significant“ They were not significant when tested individually The likely reason is multicollinearity between them Number of restrictions to be tested Degrees of freedom in the unrestricted model The null hypothesis is overwhel-mingly rejected (even at very small significance levels).

2- Overall Significance Test Test of overall significance of a regression The test of overall significance is reported in most regression packages; the null hypothesis is usually overwhelmingly rejected The null hypothesis states that the explanatory variables are not useful at all in explaining the dependent variable Restricted model (regression on constant)