Chapter 13: Inference in Regression

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 12 Simple Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Chapter 11 Multiple Regression.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
SIMPLE LINEAR REGRESSION
Ch. 14: The Multiple Regression Model building
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 5 Correlation and Regression
Chapter 12: Analysis of Variance
SIMPLE LINEAR REGRESSION
Correlation and Linear Regression
Chapter 14: Nonparametric Statistics
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
Chapter 8: Confidence Intervals
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Econ 3790: Business and Economics Statistics
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
CHAPTER 14 MULTIPLE REGRESSION
+ Chapter 12: Inference for Regression Inference for Linear Regression.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 13 Multiple Regression
Lecture 10: Correlation and Regression Model.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 3 – Slide 1 of 27 Chapter 11 Section 3 Inference about Two Population Proportions.
1 1 Slide © 2011 Cengage Learning Assumptions About the Error Term  1. The error  is a random variable with mean of zero. 2. The variance of , denoted.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Lecture 11: Simple Linear Regression
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Essentials of Modern Business Statistics (7e)
Statistics for Business and Economics (13e)
Slides by JOHN LOUCKS St. Edward’s University.
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
St. Edward’s University
Presentation transcript:

Chapter 13: Inference in Regression Basic Practice of Statistics - 3rd Edition Discovering Statistics 2nd Edition Daniel T. Larose Chapter 13: Inference in Regression Lecture PowerPoint Slides Chapter 5

Chapter 13 Overview 13.1 Inference About the Slope of the Regression Line 13.2 Confidence Intervals and Prediction Intervals 13.3 Multiple Regression

The Big Picture Where we are coming from and where we are headed… In the later chapters of Discovering Statistics, we have been studying more advanced methods in statistical inference. Here in Chapter 13, we return to regression analysis, first discussed in Chapter 4. At that time, we learned descriptive methods for regression analysis; now it is time to learn how to perform statistical inference in regression. In the last chapter, we will explore nonparametric statistics.

13.1: Inference About the Slope of the Regression Line Objectives: Explain the regression model and the regression model assumptions. Perform the hypothesis test for the slope b1 of the population regression equation. Construct confidence intervals for the slope b1. Use confidence intervals to perform the hypothesis test for the slope b1.

The Regression Model Recall that the regression line approximates the relationship between two continuous variables and is described by the regression equation y-hat = b1x + b0. Regression Model The population regression equation is defined as: where b0 is the y-intercept of the population regression line, b1 is the slope, and e is the error term. Regression Model Assumptions 1. Zero Mean: The error term is a random variable with mean = 0. 2. Constant Variance: The variance of e is the same regardless of the value of x. 3. Independence: The values of e are independent of each other. 4. Normality: The error term e is a normal random variable.

Hypothesis Tests for b1 To test whether there is a relationship between x and y, we begin with the hypothesis test to determine whether or not b1 equals 0. H0: b1 = 0 There is no linear relationship between x and y. Ha: b2 ≠ 0 There is a linear relationship between x and y. Test Statistic tdata where b1 represents the slope of the regression line represents the standard error of the estimate, and represents the numerator of the sample variance of the x data.

Hypothesis Tests for b1 H0: b1 = 0 There is no linear relationship between x and y. Ha: b2 ≠ 0 There is a linear relationship between x and y. Hypothesis Test for Slope b1 If the conditions for the regression model are met: Step 1: State the hypotheses. Step 2: Find the t critical value and the rejection rule. Step 3: Calculate the test statistic and p-value. Step 4: State the conclusion and the interpretation.

Example Ten subjects were given a set of nonsense words to memorize within a certain amount of time and were later scored on the number of words they could remember. The results are in Table 13.4. Test whether there is a relationship between time and score using level of significance a = 0.01. Note the graphs on page 640, indicating the conditions for the regression model have been met. H0: 1 = 0 There is no linear relationship between time and score. Ha: 1 ≠ 0 There is a linear relationship between time and score. Reject H0 if the p-value is less than a = 0.01.

Example

Example

Example Since the p-value of about 0.000 is less than a = 0.01, we reject H0. There is evidence for a linear relationship between time and score.

Confidence Interval for b1 Confidence Interval for Slope b1 When the regression assumptions are met, a 100(1 – a)% confidence interval for b1 is given by: where t has n – 2 degrees of freedom. Margin of Error The margin of error for a 100(1 – a)% confidence interval for b1 is given by: As in earlier sections, we may use a confidence interval for the slope to perform a two-tailed test for b1. If the interval does not contain 0, we would reject the null hypothesis.

13.2: Confidence Intervals and Prediction Intervals Objectives: Construct confidence intervals for the mean value of y for a given value of x. Construct prediction intervals for a randomly chosen value of y for a given value of x.

Confidence Interval for the Mean Value of y for a Given x A (100 – a)% confidence interval for the mean response, that is, for the population mean of all values of y, given a value of x, may be constructed using the following lower and upper bounds: where x* represents the given value of the predictor variable. The requirements are that the regression assumptions are met or the sample size is large.

Prediction Interval for an Individual Value of y for a Given x A (100 – a)% confidence interval for a randomly selected value of y given a value of x may be constructed using the following lower and upper bounds: where x* represents the given value of the predictor variable. The requirements are that the regression assumptions are met or the sample size is large.

13.3: Multiple Regression Objectives: Find the multiple regression equation, interpret the multiple regression coefficients, and use the multiple regression equation to make predictions. Calculate and interpret the adjusted coefficient of determination. Perform the F test for the overall significance of the multiple regression. Conduct t tests for the significance of individual predictor variables. Explain the use and effect of dummy variables in multiple regression. Apply the strategy for building a multiple regression model.

Multiple Regression Thus far, we have examined the relationship between the response variable y and a single predictor variable x. In our data-filled world, however, we often encounter situations where we can use more than one x variable to predict the y variable. Multiple regression describes the linear relationship between one response variable y and more than one predictor variable x1, x2, …. The multiple regression equation is an extension of the regression equation where k represents the number of x variables in the equation and b0, b1, … represent the multiple regression coefficients. The interpretation of the regression coefficients is similar to the interpretation of the slope in simple linear regression, except that we add that the other x variables are held constant.

Adjusted Coefficient of Determination We measure the goodness of a regression equation using the coefficient of determination r2 = SSR/SST. In multiple regression, we use the same formula for the coefficient of determination (though the letter r is promoted to a capital R). Multiple Coefficient of Determination R2 The multiple coefficient of determination is given by: R2 = SSR/SST 0 ≤ R2 ≤ 1 where SSR is the sum of squares regression and SST is the total sum of squares. The multiple coefficient of determination represents the proportion of the variability in the response y that is explained by the multiple regression equation.

Adjusted Coefficient of Determination Unfortunately, when a new x variable is added to the multiple regression equation, the value of R2 always increases, even when the variable is not useful for predicting y. So, we need a way to adjust the value of R2 as a penalty for having too many unhelpful x variables in the equation. Adjusted Coefficient of Determination R2adj The adjusted coefficient of determination is given by: where n is the number of observations, k is the number of x variables, and R2 is the multiple coefficient of determination.

F Test for Multiple Regression The multiple regression model is an extension of the model from Section 13.1, and approximates the relationship between y and the collection of x variables. Multiple Regression Model The population multiple regression equation is defined as: where b1, b2, …, bk are the parameters of the population regression equation, k is the number of x variables, and e is the error term that follows a normal distribution with mean 0 and constant variance. The population parameters are unknown, so we must perform inference to learn about them. We begin by asking: Is our multiple regression useful? To answer this, we perform the F test for the overall significance of the multiple regression.

F Test for Multiple Regression The hypotheses for the F test are: H0: b1 = b2 = … = bk = 0 Ha: At least one of the b’s ≠ 0. The F test is not valid if there is strong evidence that the regression assumptions have been violated. F Test for Multiple Regression If the conditions for the regression model are met Step 1: State the hypotheses and the rejection rule. Step 2: Find the F statistic and the p-value. (Located in the ANOVA table of computer output.) Step 3: State the conclusion and the interpretation.

t Test for Individual Predictor Variables To determine whether a particular x variable has a significant linear relationship with the response variable y, we perform the t test that was used in Section 13.1 to test for the significance of that x variable. t Test for Individual Predictor Variables One may perform as many t tests as there are predictor variables in the model, which is k. If the conditions for the regression model are met: Step 1: For each hypothesis test, state the hypotheses and the rejection rule. Step 2: For each hypothesis test, find the t statistic and the p-value. Step 3: For each hypothesis test, state the conclusion and the interpretation.

Dummy Variables It is possible to include binomial categorical variables in multiple regression by using a “dummy variable.” A dummy variable is a predictor variable used to recode a binomial categorical variable in regression by taking values 0 or 1. Recoding the multiple regression equation will result in two different regression equations, one for one value of the categorical variable and one for the other. These two regression equations will have the same slope, but different y-intercepts.

Building a Multiple Regression Model Strategy for Building a Multiple Regression Model Step 1: The F Test – Construct the multiple regression equation using all relevant predictor variables. Apply the F test in order to make sure that a linear relationship exists between the response y and at least one of the predictor variables. Step 2: The t Tests – Perform the t tests for the individual predictors. If at least one of the predictors is not significant, then eliminate the x variable with the largest p-value from the model. Repeat until all remaining predictors are significant. Step 3: Verify the Assumptions – For your final model, verify the regression assumptions. Step 4: Report and Interpret Your Final Model – Provide the multiple regression equation, interpret the multiple regression coefficients, and report and interpret the standard error of the estimate and the adjusted coefficient of determination.

Chapter 13 Overview 13.1 Inference About the Slope of the Regression Line 13.2 Confidence Intervals and Prediction Intervals 13.3 Multiple Regression