Presentation is loading. Please wait.

Presentation is loading. Please wait.

McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics.

Similar presentations


Presentation on theme: "McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics."— Presentation transcript:

1 McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics in Business & Economics, 4 th edition David P. Doane and Lori E. Seward Prepared by Lloyd R. Jaisingh

2 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation Analysis 12.2 Simple Regression 12.3 Regression Terminology 12.4 Ordinary Least Squares Formulas 12.5 Tests for Significance 12.6 Analysis of Variance: Overall Fit 12.7 Confidence and Prediction Intervals for Y 12.8 Residual Tests 12.9 Unusual Observations 12.10 Other Regression Problems Chapter 12

3 12-3 Chapter Learning Objectives LO12-1: Calculate and test a correlation coefficient for significance. LO12-2: Interpret the slope and intercept of a regression equation. LO12-3: Make a prediction for a given x value using a regression equation. LO12-4: Fit a simple regression on an Excel scatter plot. LO12-5: Calculate and interpret confidence intervals for regression coefficients. LO12-6: Test hypotheses about the slope and intercept by using t tests. LO12-7: Perform regression with Excel or other software. LO12-8: Interpret the standard error, R 2, ANOVA table, and F test. LO12-9: Distinguish between confidence and prediction intervals. LO12-10: Test residuals for violations of regression assumptions. LO12-11: Identify unusual residuals and high-leverage observations. Chapter 12 Simple Regression

4 12-4 Begin the analysis of bivariate data (i.e., two variables) with a scatter plot.Begin the analysis of bivariate data (i.e., two variables) with a scatter plot. A scatter plot - displays each observed data pair (x i, y i ) as a dot on an X/Y grid. - indicates visually the strength of the relationship between the two variables.A scatter plot - displays each observed data pair (x i, y i ) as a dot on an X/Y grid. - indicates visually the strength of the relationship between the two variables. Visual Displays Visual Displays Chapter 12 Sample Scatter Plot 12.1 Visual Displays and Correlation Analysis LO12-1

5 12-5 Correlation Coefficient -1 ≤ r ≤ +1 The sample correlation coefficient (r) measures the degree of linearity in the relationship between X and Y. r = 0 indicates no linear Relationship. Chapter 12 LO12-1: Calculate and test a correlation coefficient for significance. LO12-1 12.1 Visual Displays and Correlation Analysis

6 12-6 Scatter Plots Showing Various Correlation Values Strong Positive Correlation Weak Positive Correlation Weak Negative Correlation Strong Negative CorrelationNo CorrelationNonlinear Relation Chapter 12 12.1 Visual Displays and Correlation Analysis LO12-1

7 12-7 Step 1: State the Hypotheses. Determine whether you are using a one or two-tailed test and the level of significance (  ). H 0 :  = 0 H 1 :  ≠ 0Step 1: State the Hypotheses. Determine whether you are using a one or two-tailed test and the level of significance (  ). H 0 :  = 0 H 1 :  ≠ 0 Step 2: Specify the Decision Rule. For degrees of freedom d.f. = n -2, look up the critical value t  in Appendix D.Step 2: Specify the Decision Rule. For degrees of freedom d.f. = n -2, look up the critical value t  in Appendix D. Steps in Testing if  = 0 (Tests for Significance) Note: r is an estimate of the population Note: r is an estimate of the population correlation coefficient  (rho). correlation coefficient  (rho). Chapter 12 Step 3: Calculate the Test Statistic.Step 3: Calculate the Test Statistic. Step 4: Make the Decision. If the sample correlation coefficient r exceeds the critical value r , then reject H 0.Step 4: Make the Decision. If the sample correlation coefficient r exceeds the critical value r , then reject H 0. If using the t statistic method, reject H 0 if t > t  or if the p-value ≤ .If using the t statistic method, reject H 0 if t > t  or if the p-value ≤ . 12.1 Visual Displays and Correlation Analysis LO12-1

8 12-8 Equivalently, you can calculate the critical value for the correlation coefficient usingEquivalently, you can calculate the critical value for the correlation coefficient using This method gives a benchmark for the correlation coefficient.This method gives a benchmark for the correlation coefficient. However, there is no p-value and is inflexible if you change your mind about .However, there is no p-value and is inflexible if you change your mind about . Critical Value for Correlation Coefficient (Tests for Significance) Chapter 12 Quick Rule for Significance A quick test for significance A quick test for significance of a correlation of a correlation at  =.05 is |r| > 2/  n at  =.05 is |r| > 2/  n 12.1 Visual Displays and Correlation Analysis LO12-1

9 12-9 Simple Regression analyzes the relationship between two variables. It specifies one dependent (response) variable and one independent (predictor) variable. This hypothesized relationship here will be linear. What is Simple Regression? What is Simple Regression? Chapter 12 12.2 Simple Regression LO12-2 LO12-2: Interpret the slope and intercept of a regression equation.

10 12-10 Chapter 12 Models and Parameters Models and Parameters The assumed model for a linear relationship is y =  0 +  1 x +  The relationship holds for all pairs (x i, y i ). The error term is not observable, is assumed normally distributed with mean of 0 and standard deviation . The unknown parameters are  0 Intercept  1 Slope The fitted model used to predict the expected value of Y for a given value of X is The fitted model used to predict the expected value of Y for a given value of X is The fitted coefficients areThe fitted coefficients are b 0 the estimated intercept b 1 the estimated slope 12.2 Simple Regression LO12-2

11 12-11 Chapter 12 LO12-4: Fit a simple regression on an Excel scatter plot. LO12-4 A more precise method is to let Excel calculate the estimates. We enter observations on the independent variable x 1, x 2,..., x n and the dependent variable y 1, y 2,..., y n into separate columns, and let Excel fi t the regression equation, as illustrated in Figure 12.6. Excel will choose the regression coefficients so as to produce a good fit. 12.3 Regression Terminology

12 12-12 The ordinary least squares method (OLS) estimates the slope and intercept of the regression line so that the residuals are small. The ordinary least squares method (OLS) estimates the slope and intercept of the regression line so that the residuals are small. Slope and Intercept Slope and Intercept Chapter 12 12.4 Ordinary Least Squares (OLS) Formulas Coefficient of Determination (Assessing the Fit) Coefficient of Determination (Assessing the Fit) R 2 is a measure of relative fit based on a comparison of SSR ( regression sum of squares) and SST (Total Sums of Squares). One can use technology to compute.R 2 is a measure of relative fit based on a comparison of SSR ( regression sum of squares) and SST (Total Sums of Squares). One can use technology to compute. Often expressed as a percent, an R 2 = 1 (i.e., 100%) indicates perfect fit.Often expressed as a percent, an R 2 = 1 (i.e., 100%) indicates perfect fit. In a bivariate regression, R 2 = (r) 2In a bivariate regression, R 2 = (r) 2 or

13 12-13 Confidence interval for the true slope and intercept:Confidence interval for the true slope and intercept: Confidence Intervals for Slope and Intercept Confidence Intervals for Slope and Intercept Note: One can use Excel, Minitab, MegaStat or other technologies to compute these intervals and do hypothesis tests relating to linear regression. Chapter 12 12.5 Test For Significance LO12-5: Calculate and interpret confidence intervals for regression coefficients. LO12-5

14 12-14 If  1 = 0, then X cannot influence Y and the regression model collapses to a constant  0 plus random error.If  1 = 0, then X cannot influence Y and the regression model collapses to a constant  0 plus random error. The hypotheses to be tested are:The hypotheses to be tested are: Hypothesis Tests Hypothesis Tests Chapter 12 LO12-6: Test hypotheses about the slope and intercept by using t tests. 12.5 Test For Significance LO12-6 Reject H 0 if t calc > t  or if p-value  . d.f. = n -2

15 12-15 12B-15 To test a regression for overall significance, we use an F test to compare the explained (SSR) and unexplained (SSE) sums of squares. F Test for Overall Fit F Test for Overall Fit Chapter 12 12.6 Analysis of Variance: Overall Fit Reject H 0 of a significant relationship if F calc > F  or if p-value ≤ .Reject H 0 of a significant relationship if F calc > F  or if p-value ≤ . LO12-8 LO12-8: Interpret the standard error, R 2, ANOVA table, and F test.

16 12-16 12.7 Confidence and Prediction Intervals for Y conditional mean of Y.Confidence Interval for the conditional mean of Y. f Y.Prediction intervals are wider than confidence intervals because individual Y values vary more than the mean of Y. How to Construct an Interval Estimate for Y How to Construct an Interval Estimate for Y Chapter 12 LO12-9: Distinguish between confidence and prediction intervals for Y. LO12-9

17 12-17 12.8 Residual Tests Three Important Assumptions Three Important Assumptions 1.The errors are normally distributed. 2.The errors have constant variance (i.e., they are homoscedastic). 3.The errors are independent (i.e., they are nonautocorrelated). 4.Note: One can use the appropriate technology (MINITAB, EXCEL, etc.) to test for violations of the assumptions. Chapter 12 LO12-10: Test residuals for violations of regression assumptions. LO12-10

18 12-18 12.9 Unusual Observations Standardized Residuals Standardized Residuals One can use Excel, Minitab, MegaStat or other technologies to compute standardized residuals. If the absolute value of any standardized residual is at least 2, then it is classified as unusual. Chapter 12 Leverage and Influence Leverage and Influence A high leverage statistic indicates the observation is far from the mean of X.A high leverage statistic indicates the observation is far from the mean of X. These observations are influential because they are at the “ end of the lever.”These observations are influential because they are at the “ end of the lever.” The leverage for observation i is denoted h i.The leverage for observation i is denoted h i. LO12-11 LO12-11: Identify unusual residuals and high leverage observations. A leverage that exceeds 3/n is unusual.

19 12-19 12B-19 12.10 Other Regression Problems Outliers Outliers To fix the problem, - delete the observation(s) - delete the data - formulate a multiple regression model that includes the lurking variable. Outliers may be caused by - an error in recording data - impossible data - an observation that has been influenced by an unspecified “lurking” variable that should have been controlled but wasn’t. data - impossible data - an observation that has been influenced by an unspecified “lurking” variable that should have been controlled but wasn’t. Chapter 12

20 12-20 Model Misspecification Model Misspecification If a relevant predictor has been omitted, then the model is misspecified. Use multiple regression instead of bivariate regression. Ill-Conditioned Data Ill-Conditioned Data Well-conditioned data values are of the same general order of magnitude. Ill-conditioned data have unusually large or small data values and can cause loss of regression accuracy or awkward estimates..Avoid mixing magnitudes by adjusting the magnitude of your data before running the regression. Chapter 12 12.10 Other Regression Problems

21 12-21 Spurious Correlation Spurious Correlation In a spurious correlation two variables appear related because of the way they are defined. This problem is called the size effect or problem of totals. Chapter 12 12.10 Other Regression Problems Model Form and Variable Transforms Model Form and Variable Transforms Sometimes a nonlinear model is a better fit than a linear model.Sometimes a nonlinear model is a better fit than a linear model. Excel offers many model forms.Excel offers many model forms. Variables may be transformed (e.g., logarithmic or exponential functions) in order to provide a better fit.Variables may be transformed (e.g., logarithmic or exponential functions) in order to provide a better fit. Log transformations reduce heteroscedasticity.Log transformations reduce heteroscedasticity. Nonlinear models may be difficult to interpret.Nonlinear models may be difficult to interpret.


Download ppt "McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics."

Similar presentations


Ads by Google