Residuals, Residual Plots, & Influential points

Slides:



Advertisements
Similar presentations
Linear Regression (LSRL)
Advertisements

2nd Day: Bear Example Length (in) Weight (lb)
CHAPTER 3 Describing Relationships
AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION.
Chapter 5 Residuals, Residual Plots, & Influential points.
Chapter 5 Residuals, Residual Plots, Coefficient of determination, & Influential points.
Verbal SAT vs Math SAT V: mean=596.3 st.dev=99.5 M: mean=612.2 st.dev=96.1 r = Write the equation of the LSRL Interpret the slope of this line Interpret.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
CHAPTER 3 Describing Relationships
Unit 4 Lesson 3 (5.3) Summarizing Bivariate Data 5.3: LSRL.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 3: Describing Relationships Section 3.2 Least-Squares Regression.
Unit 3 Correlation. Homework Assignment For the A: 1, 5, 7,11, 13, , 21, , 35, 37, 39, 41, 43, 45, 47 – 51, 55, 58, 59, 61, 63, 65, 69,
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Residuals, Residual Plots, & Influential points. Residuals (error) - The vertical deviation between the observations & the LSRL always zerothe sum of.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Unit 4 LSRL.
LSRL.
Least Squares Regression Line.
Statistics 101 Chapter 3 Section 3.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
A little VOCAB.
Warm-up: This table shows a person’s reported income and years of education for 10 participants. The correlation is .79. State the meaning of this correlation.
Unit 4 Lesson 4 (5.4) Summarizing Bivariate Data
Chapter 5 Lesson 5.3 Summarizing Bivariate Data
Chapter 5 LSRL.
LSRL Least Squares Regression Line
The scatterplot shows the advertised prices (in thousands of dollars) plotted against ages (in years) for a random sample of Plymouth Voyagers on several.
Chapter 4 Correlation.
Chapter 3.2 LSRL.
Describe the association’s Form, Direction, and Strength
Regression and Residual Plots
Residuals, Residual Plots, and Influential points
AP Stats: 3.3 Least-Squares Regression Line
Least-Squares Regression
Least Squares Regression Line LSRL Chapter 7-continued
residual = observed y – predicted y residual = y - ŷ
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Least-Squares Regression
Chapter 2 Looking at Data— Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Influential points.
Chapter 5 LSRL.
Chapter 5 LSRL.
Chapter 5 LSRL.
Chapter 3: Describing Relationships
Least-Squares Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Section 3.2: Least Squares Regressions
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Homework: PG. 204 #30, 31 pg. 212 #35,36 30.) a. Reading scores are predicted to increase by for each one-point increase in IQ. For x=90: 45.98;
CHAPTER 3 Describing Relationships
Presentation transcript:

Residuals, Residual Plots, & Influential points Chapter 5 Residuals, Residual Plots, & Influential points

Residuals (error) - The vertical deviation between the observations & the LSRL the sum of the residuals is always zero error = observed - expected

Residual plot A scatterplot of the (x, residual) pairs. Residuals can be graphed against other statistics besides x Purpose is to tell if a linear association exist between the x & y variables If no pattern exists between the points in the residual plot, then the association is linear.

Linear Not linear

Residuals x Age Range of Motion 35 154 24 142 40 137 31 133 28 122 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 One measure of the success of knee surgery is post-surgical range of motion for the knee joint following a knee dislocation. Is there a linear relationship between age & range of motion? Sketch a residual plot. x Residuals Since there is no pattern in the residual plot, there is a linear relationship between age and range of motion

Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 Plot the residuals against the y-hats. How does this residual plot compare to the previous one? Residuals

Residual plots are the same no matter if plotted against x or y-hat. Residuals Residuals Residual plots are the same no matter if plotted against x or y-hat.

Coefficient of determination- gives the proportion of variation in y that can be attributed to an approximate linear relationship between x & y remains the same no matter which variable is labeled x

Sum of the squared residuals (errors) using the mean of y. Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 Let’s examine r2. Suppose you were going to predict a future y but you didn’t know the x-value. Your best guess would be the overall mean of the existing y’s. Sum of the squared residuals (errors) using the mean of y. SSy = 1564.917

Sum of the squared residuals (errors) using the LSRL. Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 Now suppose you were going to predict a future y but you DO know the x-value. Your best guess would be the point on the LSRL for that x-value (y-hat). Sum of the squared residuals (errors) using the LSRL. SSy = 1085.735

Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 SSy = 1564.917 SSy = 1085.735 By what percent did the sum of the squared error go down when you went from just an “overall mean” model to the “regression on x” model? This is r2 – the amount of the variation in the y-values that is explained by the x-values.

How well does age predict the range of motion after knee surgery? Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 How well does age predict the range of motion after knee surgery? Approximately 30.6% of the variation in range of motion after knee surgery can be explained by the linear regression of age and range of motion.

Interpretation of r2 Approximately r2% of the variation in y can be explained by the LSRL of x & y.

Be sure to convert r2 to decimal before taking the square root! Computer-generated regression analysis of knee surgery data: Predictor Coef Stdev T P Constant 107.58 11.12 9.67 0.000 Age 0.8710 0.4146 2.10 0.062 s = 10.42 R-sq = 30.6% R-sq(adj) = 23.7% Be sure to convert r2 to decimal before taking the square root! NEVER use adjusted r2! What is the equation of the LSRL? Find the slope & y-intercept. What are the correlation coefficient and the coefficient of determination?

Outlier – In a regression setting, an outlier is a data point with a large residual

Influential point- A point that influences where the LSRL is located If removed, it will significantly change the slope of the LSRL

Racket Resonance Acceleration (Hz) (m/sec/sec) 1 105 36.0 2 106 35.0 3 110 34.5 4 111 36.8 5 112 37.0 6 113 34.0 7 113 34.2 8 114 33.8 9 114 35.0 10 119 35.0 11 120 33.6 12 121 34.2 13 126 36.2 14 189 30.0 One factor in the development of tennis elbow is the impact-induced vibration of the racket and arm at ball contact. Sketch a scatterplot of these data. Calculate the LSRL & correlation coefficient. Does there appear to be an influential point? If so, remove it and then calculate the new LSRL & correlation coefficient.

(189,30) could be influential. Remove & recalculate LSRL

(189,30) was influential since it moved the LSRL

Which of these measures are resistant? LSRL Correlation coefficient Coefficient of determination NONE – all are affected by outliers