Hypothesis Tests and Confidence Intervals in Multiple Regressors

Slides:



Advertisements
Similar presentations
3.3 Hypothesis Testing in Multiple Linear Regression
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Econ 488 Lecture 5 – Hypothesis Testing Cameron Kaplan.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
8. Heteroskedasticity We have already seen that homoskedasticity exists when the error term’s variance, conditional on all x variables, is constant: Homoskedasticity.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Chapter 7 Hypothesis Tests and Confidence Intervals in Multiple Regression.
Introduction to Econometrics The Statistical Analysis of Economic (and related) Data.
Introduction to Econometrics The Statistical Analysis of Economic (and related) Data.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
Chapter 11 Multiple Regression.
Topic 3: Regression.
EC Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Introduction to Regression Analysis, Chapter 13,
Multiple Linear Regression Analysis
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
Nonlinear Regression Functions
Chapter 8 Nonlinear Regression Functions.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Chapter 13: Inference in Regression
Hypothesis Testing in Linear Regression Analysis
Regression Method.
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Interval Estimation and Hypothesis Testing
CHAPTER 14 MULTIPLE REGRESSION
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Five Ending Wednesday, September 26 (Note: Exam 1 is on September 27)
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.
1 Hypothesis Tests & Confidence Intervals (SW Ch. 7) 1.For a single coefficient 2.For multiple coefficients 3.Other types of hypotheses involving multiple.
如何解釋迴歸估計結果. Class sizes and test scores Empirical problem: Class size and educational output.  Policy question: What is the effect of reducing class.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
May 2004 Prof. Himayatullah 1 Basic Econometrics Chapter 5: TWO-VARIABLE REGRESSION: Interval Estimation and Hypothesis Testing.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
1 Inferences About The Pearson Correlation Coefficient.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
3-1 MGMG 522 : Session #3 Hypothesis Testing (Ch. 5)
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
11 Chapter 5 The Research Process – Hypothesis Development – (Stage 4 in Research Process) © 2009 John Wiley & Sons Ltd.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Chapter 8 Interval Estimation. 2 Chapter Outline  Population Mean: Known  Population Mean: Unknown  Population Proportion.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Multiple Regression Analysis: Inference
Chapter 4. Inference about Process Quality
Introduction to Econometrics
Interval Estimation and Hypothesis Testing
Simple Linear Regression
Chapter 7: The Normality Assumption and Inference with OLS
Presentation transcript:

Hypothesis Tests and Confidence Intervals in Multiple Regressors

Outline Hypothesis Testing Joint Hypotheses Single Restriction Test Test Score Data

Hypothesis Tests and Confidence Intervals for a Single Coefficient is approximately distributed N(0, 1). (CLT) Thus hypotheses on can be tested using the usual t-statistic, and confidence intervals are constructed as So too for and are generally not independently distributed - so neither are their t-statistics (more on this later).

Example: The California class size data The coefficient on STR in (2) is the effect on Test Score of a unit change in STR, holding constant the percentage of English Learners in the district. Coefficient on STR falls by one-half. 95% confidence interval for coefficient on STR in (2) is {−1.10 ± 1.96  0.43} = (− 1.95, − 0.26) The t-statistic for STR = − 1.10/0.43 = − 2.54, so we reject the hypothesis at the 5 % significance level.

Tests of Joint Hypotheses Let Expn = expenditures per pupil and consider the population regression model The null hypothesis that “school resources don’t matter,” and the alternative that they do, corresponds to

A joint hypothesis specifies a value for two or more coefficients, that is, it imposes a restriction on two or more coefficients. A “common sense” test is to reject if either of the individual t-statistics exceeds 1.96 in absolute value. But this “common sense” approach doesn’t work. The resulting test doesn’t have the right significance level.

Here’s why: Calculate the probability of incorrectly rejecting the null using the “common sense” test based on the two individual t-statistics. To simplify the calculation, suppose that and are independently distributed. Let t1 and t2 be the t-statistics. The “common sense” test is reject if |t1|>1.96 and/or |t2| > 1.96. What is the probability that this “common sense” test rejects H0 when H0 is actually true? (It should be 5%.)

Probability of incorrectly rejecting the null

which is not the desired 5%.

The size of a test is the actual rejection rate under the null hypothesis. The size of the “common sense” test isn’t 5%. Its size actually depends on the correlation between t1 and t2 (and thus on the correlation between and ). Two Solutions. Use a different critical value in this procedure - not 1.96 (this is the “Bonferroni method” – see App. 7.1). This is rarely used in practice. Use a different test statistic that test both and at once— the F-statistic.

The F-statistic The F-statistic tests all parts of a joint hypothesis at once. Formula for the special case of the joint hypothesis and in a regression with two regressors. where estimates the correlation between t1 and t2. Reject when F is “large.”

The F-statistic testing and The F-statistic is large when t1 and/or t2 is large. The F-statistic corrects (in just the right way) for the correlation between t1 and t2.

Large-sample distribution of the F-statistic Consider a special case that t1 and t2 are independent, so . In large samples the formula becomes Under the null, t1 and t2 have standard normal distributions that, in this special case, are independent. The large-sample distribution of the F-statistic is the distribution of the average of two independently distributed squared standard normal random variables.

The chi-squared distribution with q degrees of freedom ( ) is defined to be the distribution of the sum of q independent squared standard normal random variables. In large samples, F-statistic is distributed as . Selected large-sample critical values of

Compute p-value using the F-statistic: p-value = tail probability of the distribution beyond the F-statistic actually computed. Implementation in STATA Use the “test” command after the regression. Example: Test the joint hypothesis that the population coefficients on STR and expenditures per pupil (expn_stu) are both zero, against the alternative that at least one of the population coefficients is nonzero.

The homoskedasticity-only F-statistic To compute the homoskedasticity-only F-statistic Use the previous formulas, but using homoskedasticity-only standard errors. Or Run two regressions, one under the null hypothesis (the “restricted” regression) and one under the alternative hypothesis (the “unrestricted” regression). The second method gives a simple formula.

The “restricted” and “unrestricted” regressions Example: are the coefficients on STR and Expn zero? Restricted population regression (that is, under H0): Unrestricted population regression (under H1): The number of restrictions under H0 = q = 2. The fit will be better (R2 will be higher) in the unrestricted regression (why?)

By how much must the R2 increase for the coefficients on Expn and Pct EL to be judged statistically significant? Simple formula for the homoskedasticity-only F-statistic where = the R2 for the restricted regression = the R2 for the unrestricted regression q = the number of restrictions under the null = the number of regressors in the unrestricted regression.

Example: Restricted regression: Unrestricted regression:

The homoskedasticity-only F-statistic-summary The homoskedasticity-only F-statistic rejects when adding the two variables increased the R2 by “enough” - that is, when adding the two variables improves the fit of the regression by “enough.” If the errors are homoskedastic, then the homoskedasticity-only F-statistic has a large-sample distribution that is . But if the errors are heteroskedastic, the large-sample distribution is a mess and is not .

Joint confidence sets ctd.

Confidence set based on inverting the F-statistic

Summary: testing joint hypotheses The “common-sense” approach of rejecting if either of the t-statistics exceeds 1.96 rejects more than 5% of the time under the null (the size exceeds the desired significance level). The heteroskedasticity-robust F-statistic is built in to STATA (“test” command). This tests all q restrictions at once. For large n, F is distributed as . The homoskedasticity-only F-statistic is important historically (and thus in practice), and is intuitively appealing, but invalid when there is heteroskedasticity.

Testing Single Restrictions on Multiple Coefficients Consider the null and alternative hypothesis, This null imposes a single restriction (q = 1) on multiple coefficients - it is not a joint hypothesis with multiple restrictions (compare with = 0 and = 0).

Two methods for testing single restrictions on multiple coefficients: Rearrange (“transform”) the regression. Rearrange the regressors so that the restriction becomes a restriction on a single coefficient in an equivalent regression. Perform the test directly. Some software, including STATA, lets you test restrictions using multiple coefficients directly.

Method 1: Rearrange (”transform”) the regression Method 1: Rearrange (”transform”) the regression. Add and subtract where

(a) Original system: (b) Rearranged (”transformed”) system: so The testing problem is now a simple one: test whether γ1 = 0 in specification (b).

Method 2: Perform the test directly Example: To test, using STATA, whether = : regress testscore str expn pctel, r test str=expn

Analysis of the Test Score Data A general approach to variable selection and “model specification” Specify a “base” or “benchmark” model. Specify a range of plausible alternative models, which include additional candidate variables. Does a candidate variable change the coefficient of interest ( )?

Is a candidate variable statistically significant? Use judgment, not a mechanical recipe. And don’t just maximize R2.

Digression about measures of fit…

Variables we would like to see in the California data set School characteristics student-teacher ratio teacher quality computers (non-teaching resources) per student measures of curriculum design Student characteristics English proficiency availability of extracurricular enrichment home learning environment parent’s education level

Variables actually in the California class size data set student-teacher ratio (STR) percent English learners in the district (PctEL) percent eligible for subsidized/free lunch percent on public income assistance average district income

A look at more of the California data

Digression: presentation of regression results in a table Listing regressions in “equation” form can be cumbersome with many regressors and many regressions. Tables of regression results can present the key information compactly. Information to include: variables in the regression (dependent and independent). estimated coefficients. standard errors. results of F-tests of joint hypotheses. some measure of fit (adjusted R2). number of observations.

Summary: Multiple Regression Multiple regression allows you to estimate the effect on Y of a change in X1, holding X2 constant. If you can measure a variable, you can avoid omitted variable bias from that variable by including it. There is no simple recipe for deciding which variables belong in a regression - you must exercise judgment. One approach is to specify a base model - relying on a-priori reasoning - then explore the sensitivity of the key estimate(s) in alternative specifications.