Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.

Slides:



Advertisements
Similar presentations
Chapter 4 Describing the Relation Between Two Variables
Advertisements

 Coefficient of Determination Section 4.3 Alan Craig
Chapter 4 Describing the Relation Between Two Variables 4.3 Diagnostics on the Least-squares Regression Line.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
Chapter 4 The Relation between Two Variables
Chapter Describing the Relation between Two Variables © 2010 Pearson Prentice Hall. All rights reserved 3 4.
Lesson Diagnostics on the Least- Squares Regression Line.
Chapter 4 Describing the Relation Between Two Variables
© 2010 Pearson Prentice Hall. All rights reserved Scatterplots and Correlation Coefficient.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Math 227 Elementary Statistics Math 227 Elementary Statistics Sullivan, 4 th ed.
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
CHAPTER 3 Describing Relationships
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Simple Linear Regression Analysis
Correlation & Regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Chapter 11 Simple Regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Describing the Relation Between Two Variables
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Correlation & Regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Examining Relationships in Quantitative Research
Chapter 4 Describing the Relation Between Two Variables 4.1 Scatter Diagrams; Correlation.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
SWBAT: Calculate and interpret the residual plot for a line of regression Do Now: Do heavier cars really use more gasoline? In the following data set,
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
CHAPTER 3 Describing Relationships
Chapter Describing the Relation between Two Variables © 2010 Pearson Prentice Hall. All rights reserved 3 4.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables
Chapter Describing the Relation between Two Variables © 2010 Pearson Prentice Hall. All rights reserved 3 4.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-1 Overview Overview 10-2 Correlation 10-3 Regression-3 Regression.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
CHAPTER 3 Describing Relationships
Inference for Least Squares Lines
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Describing the Relation between Two Variables
STATISTICS INFORMED DECISIONS USING DATA
Describing the Relation between Two Variables
3 9 Chapter Describing the Relation between Two Variables
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
3 4 Chapter Describing the Relation between Two Variables
3 4 Chapter Describing the Relation between Two Variables
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
3 4 Chapter Describing the Relation between Two Variables
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
STATISTICS INFORMED DECISIONS USING DATA
CHAPTER 3 Describing Relationships
Presentation transcript:

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Diagnostics on the Least-squares Regression Line 4.3

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Objectives 1.Compute and interpret the coefficient of determination 2.Perform residual analysis on a regression model 3.Identify influential observations 4-3

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 4-4 Objective 1 Compute and Interpret the Coefficient of Determination

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 4-5 The coefficient of determination, R 2, measures the proportion of total variation in the response variable that is explained by the least-squares regression line. The coefficient of determination is a number between 0 and 1, inclusive. That is, 0 < R 2 < 1. If R 2 = 0 the line has no explanatory value If R 2 = 1 means the line explains 100% of the variation in the response variable.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 4-6 The data to the right are based on a study for drilling rock. The researchers wanted to determine whether the time it takes to dry drill a distance of 5 feet in rock increases with the depth at which the drilling begins. So, depth at which drilling begins is the predictor variable, x, and time (in minutes) to drill five feet is the response variable, y. Source: Penner, R., and Watts, D.G. “Mining Information.” The American Statistician, Vol. 45, No. 1, Feb. 1991, p. 6.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 4-7

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 4-8 Regression Analysis The regression equation is Time = Depth Sample Statistics Mean Standard Deviation Depth Time Correlation Between Depth and Time: 0.773

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 4-9 Suppose we were asked to predict the time to drill an additional 5 feet, but we did not know the current depth of the drill. What would be our best “guess”?

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Suppose we were asked to predict the time to drill an additional 5 feet, but we did not know the current depth of the drill. What would be our best “guess”? ANSWER: The mean time to drill an additional 5 feet: 6.99 minutes

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Now suppose that we are asked to predict the time to drill an additional 5 feet if the current depth of the drill is 160 feet? ANSWER: Our “guess” increased from 6.99 minutes to 7.39 minutes based on the knowledge that drill depth is positively associated with drill time.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 4-12

Copyright © 2013, 2010 and 2007 Pearson Education, Inc The difference between the observed value of the response variable and the mean value of the response variable is called the total deviation and is equal to

Copyright © 2013, 2010 and 2007 Pearson Education, Inc The difference between the predicted value of the response variable and the mean value of the response variable is called the explained deviation and is equal to

Copyright © 2013, 2010 and 2007 Pearson Education, Inc The difference between the observed value of the response variable and the predicted value of the response variable is called the unexplained deviation and is equal to

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Total Deviation Unexplained Deviation Explained Deviation +=

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Total Deviation Unexplained Deviation Explained Deviation +=

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Total Variation = Unexplained Variation + Explained Variation 1 = Unexplained VariationExplained Variation Unexplained Variation Explained Variation Total Variation + = 1 – R 2 =

Copyright © 2013, 2010 and 2007 Pearson Education, Inc To determine R 2 for the linear regression model simply square the value of the linear correlation coefficient. Squaring the linear correlation coefficient to obtain the coefficient of determination works only for the least-squares linear regression model The method does not work in general.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc EXAMPLE Determining the Coefficient of Determination Find and interpret the coefficient of determination for the drilling data. Because the linear correlation coefficient, r, is 0.773, we have that R 2 = = = 59.75%. So, 59.75% of the variability in drilling time is explained by the least-squares regression line.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Draw a scatter diagram for each of these data sets. For each data set, the variance of y is

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Data Set A Data Set B Data Set C Data Set A:99.99% of the variability in y is explained by the least-squares regression line Data Set B:94.7% of the variability in y is explained by the least-squares regression line Data Set C:9.4% of the variability in y is explained by the least-squares regression line

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Objective 2 Perform Residual Analysis on a Regression Model

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Residuals play an important role in determining the adequacy of the linear model. In fact, residuals can be used for the following purposes: To determine whether a linear model is appropriate to describe the relation between the predictor and response variables. To determine whether the variance of the residuals is constant. To check for outliers.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc If a plot of the residuals against the predictor variable shows a discernable pattern, such as a curve, then the response and predictor variable may not be linearly related.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 4-26

Copyright © 2013, 2010 and 2007 Pearson Education, Inc A chemist has a gram sample of a radioactive material. She records the amount of radioactive material remaining in the sample every day for a week and obtains the following data. DayWeight (in grams) EXAMPLE Is a Linear Model Appropriate?

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Linear correlation coefficient: – 0.994

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 4-29

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Linear model not appropriate

Copyright © 2013, 2010 and 2007 Pearson Education, Inc If a plot of the residuals against the explanatory variable shows the spread of the residuals increasing or decreasing as the explanatory variable increases, then a strict requirement of the linear model is violated. This requirement is called constant error variance. The statistical term for constant error variance is homoscedasticity.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 4-32

Copyright © 2013, 2010 and 2007 Pearson Education, Inc A plot of residuals against the explanatory variable may also reveal outliers. These values will be easy to identify because the residual will lie far from the rest of the plot.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 4-34

Copyright © 2013, 2010 and 2007 Pearson Education, Inc EXAMPLE Residual Analysis Draw a residual plot of the drilling time data. Comment on the appropriateness of the linear least-squares regression model.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 4-36

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Boxplot of Residuals for the Drilling Data

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Objective 3 Identify Influential Observations

Copyright © 2013, 2010 and 2007 Pearson Education, Inc An influential observation is an observation that significantly affects the least-squares regression line’s slope and/or y-intercept, or the value of the correlation coefficient.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Explanatory, x Influential observations typically exist when the point is an outlier relative to the values of the explanatory variable. So, Case 3 is likely influential.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc Suppose an additional data point is added to the drilling data. At a depth of 300 feet, it took minutes to drill 5 feet. Is this point influential? EXAMPLE Influential Observations

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 4-42

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 4-43

Copyright © 2013, 2010 and 2007 Pearson Education, Inc With influential Without influential

Copyright © 2013, 2010 and 2007 Pearson Education, Inc As with outliers, influential observations should be removed only if there is justification to do so. When an influential observation occurs in a data set and its removal is not warranted, there are two courses of action: (1) Collect more data so that additional points near the influential observation are obtained, or (2) Use techniques that reduce the influence of the influential observation (such as a transformation or different method of estimation - e.g. minimize absolute deviations).