Chapter 5 Residuals, Residual Plots, Coefficient of determination, & Influential points.

Slides:



Advertisements
Similar presentations
Linear Regression (LSRL)
Advertisements

2nd Day: Bear Example Length (in) Weight (lb)
Warm up Use calculator to find r,, a, b. Chapter 8 LSRL-Least Squares Regression Line.
CHAPTER 3 Describing Relationships
Chapter 3 concepts/objectives Define and describe density curves Measure position using percentiles Measure position using z-scores Describe Normal distributions.
AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Chapter 5 Residuals, Residual Plots, & Influential points.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
 Chapter 3! 1. UNIT 7 VOCABULARY – CHAPTERS 3 & 14 2.
CHAPTER 3 Describing Relationships
Unit 4 Lesson 3 (5.3) Summarizing Bivariate Data 5.3: LSRL.
Chapter 7 Linear Regression. Bivariate data x – variable: is the independent or explanatory variable y- variable: is the dependent or response variable.
Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 3: Describing Relationships Section 3.2 Least-Squares Regression.
CHAPTER 5: Regression ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Unit 3 Correlation. Homework Assignment For the A: 1, 5, 7,11, 13, , 21, , 35, 37, 39, 41, 43, 45, 47 – 51, 55, 58, 59, 61, 63, 65, 69,
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Chapter 3 LSRL. Bivariate data x – variable: is the independent or explanatory variable y- variable: is the dependent or response variable Use x to predict.
Chapter 5 LSRL. Bivariate data x – variable: is the independent or explanatory variable y- variable: is the dependent or response variable Use x to predict.
Residuals, Residual Plots, & Influential points. Residuals (error) - The vertical deviation between the observations & the LSRL always zerothe sum of.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Unit 4 LSRL.
LSRL.
Least Squares Regression Line.
Statistics 101 Chapter 3 Section 3.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 5 LSRL.
The scatterplot shows the advertised prices (in thousands of dollars) plotted against ages (in years) for a random sample of Plymouth Voyagers on several.
Chapter 3.2 LSRL.
Residuals, Residual Plots, and Influential points
Least Squares Regression Line LSRL Chapter 7-continued
EQ: How well does the line fit the data?
residual = observed y – predicted y residual = y - ŷ
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 2 Looking at Data— Relationships
Residuals, Residual Plots, & Influential points
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Influential points.
Chapter 5 LSRL.
Chapter 5 LSRL.
Chapter 5 LSRL.
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Chapter 5 Residuals, Residual Plots, Coefficient of determination, & Influential points

Residuals (error) - The vertical deviation between the observations & the LSRL always zerothe sum of the residuals from the LSRL is always zero error = observed - expected

Residual plot A scatterplot of the (x, residual) pairs. Residuals can be graphed against other statistics besides x linear associationPurpose is to tell if a linear association exist between the x & y variables

Consider a population of adult women. Let’s examine the relationship between their height and weight. Height Weight

Suppose we now take a random sample from our population of women. Height Weight Residuals

Residual plot A scatterplot of the (x, residual) pairs. Residuals can be graphed against other statistics besides x linear associationPurpose is to tell if a linear association exist between the x & y variables no pattern linearIf no pattern exists between the points in the residual plot, then the association is linear.

Linear Not linear

AgeRange of Motion One measure of the success of knee surgery is post-surgical range of motion for the knee joint following a knee dislocation. Is there a linear relationship between age & range of motion? Predicted range of motion = (age) Graph the data and find the LSRL:

AgeRange of Motion Predicted range of motion = (age) Find the predicted y’s: Find the residuals:

AgeRange of Motion One measure of the success of knee surgery is post-surgical range of motion for the knee joint following a knee dislocation. Is there a linear relationship between age & range of motion? Sketch a residual plot. Since there is no pattern in the residual plot, there is a linear relationship between age and range of motion x Residuals

AgeRange of Motion Plot the residuals against the y- hats. How does this residual plot compare to the previous one? Residuals

Residual plots are the same no matter if plotted against x or y-hat. x Residuals

Coefficient of determination- r 2 variationygives the approximate proportion of variation in y that can be attributed to a linear relationship between x & y remains the same no matter which variable is labeled x

Interpretation of r 2 r 2 % y xy Approximately r 2 % of the variation in y can be explained by the LSRL of x & y.

AgeRange of Motion How well does age predict the range of motion after knee surgery? Approximately 30.6% of the variation in range of motion after knee surgery can be explained by the linear regression of age and range of motion.

AgeRange of Motion Let’s examine r 2. Suppose you were going to predict a future y but you didn’t know the x-value. Your best guess would be the overall mean of the existing y’s. SS y = Sum of the squared residuals (errors) using the mean of y.

AgeRange of Motion Now suppose you were going to predict a future y but you DO know the x-value. Your best guess would be the point on the LSRL for that x-value (y-hat). Sum of the squared residuals (errors) using the LSRL. SS y =

AgeRange of Motion By what percent did the sum of the squared error go down when you went from just an “overall mean” model to the “regression on x” model? SS y = SS y = This is r 2 – the amount of the variation in the y-values that is explained by the x-values.

Computer-generated regression analysis of knee surgery data: PredictorCoefStdevTP Constant Age s = 10.42R-sq = 30.6%R-sq(adj) = 23.7% What is the equation of the LSRL? Find the slope & y-intercept. NEVER use adjusted r 2 ! before Be sure to convert r 2 to decimal before taking the square root! What are the correlation coefficient and the coefficient of determination?

Outlier – largeIn a regression setting, an outlier is a data point with a large residual

Influential point- A point that influences where the LSRL is located If removed, it will significantly change the slope of the LSRL Usually small residual (or 0)

RacketResonance Acceleration (Hz) (m/sec/sec) One factor in the development of tennis elbow is the impact-induced vibration of the racket and arm at ball contact. Sketch a scatterplot of these data. Calculate the LSRL & correlation coefficient. Does there appear to be an influential point? If so, remove it and then calculate the new LSRL & correlation coefficient.

(189,30) could be influential. Remove & recalculate LSRL Predicted acceleration = (resonance) r = -.775r 2 = 60.1%

(189,30) was influential since it moved the LSRL Predicted acceleration = (resonance) r = -.174r 2 = 3%

Which of these measures are resistant? LSRL Correlation coefficient Coefficient of determination NONE NONE – all are affected by outliers

YearTuition 2002$ $ $ $ $ $ $ $6459 Find the correlation coefficient and describe the relationship. r =.9861 There is a strong, positive, linear relationship between tuition and year at the UofA. Find the LSRL: Predicted tuition = (year) Interpret the slope. For each 1 year increase, UA tuition goes up by an average of $ Find the coefficient of determination. Interpret in context of problem. r 2 = 97.2% 97.2% of the variation in tuition can be explained by the linear relationship between tuition and year at the UofA.

YearTuition 2002$ $ $ $ $ $ $ $6459 Make a residual plot of (x, residuals) and (, residuals). Sketch and compare. x Linear not best model. Definite curved pattern in residual plot!