Residuals, Residual Plots, and Influential points

Slides:



Advertisements
Similar presentations
Linear Regression (LSRL)
Advertisements

Lesson Diagnostics on the Least- Squares Regression Line.
2nd Day: Bear Example Length (in) Weight (lb)
Scatter Diagrams and Linear Correlation
CHAPTER 3 Describing Relationships
Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel.
Regression, Residuals, and Coefficient of Determination Section 3.2.
C HAPTER 3: E XAMINING R ELATIONSHIPS. S ECTION 3.3: L EAST -S QUARES R EGRESSION Correlation measures the strength and direction of the linear relationship.
AP Statistics Chapter 8 & 9 Day 3
3.3 Least-Squares Regression.  Calculate the least squares regression line  Predict data using your LSRL  Determine and interpret the coefficient of.
Chapter 5 Residuals, Residual Plots, & Influential points.
Chapter 5 Residuals, Residual Plots, Coefficient of determination, & Influential points.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
3.2 Least-Squares Regression Objectives SWBAT: INTERPRET the slope and y intercept of a least-squares regression line. USE the least-squares regression.
Linear Regression Day 1 – (pg )
^ y = a + bx Stats Chapter 5 - Least Squares Regression
CHAPTER 3 Describing Relationships
1.5 Linear Models Warm-up Page 41 #53 How are linear models created to represent real-world situations?
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Unit 3 Correlation. Homework Assignment For the A: 1, 5, 7,11, 13, , 21, , 35, 37, 39, 41, 43, 45, 47 – 51, 55, 58, 59, 61, 63, 65, 69,
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Residuals, Residual Plots, & Influential points. Residuals (error) - The vertical deviation between the observations & the LSRL always zerothe sum of.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Unit 4 LSRL.
LSRL.
Least Squares Regression Line.
Sections Review.
Statistics 101 Chapter 3 Section 3.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
A little VOCAB.
Unit 4 Lesson 4 (5.4) Summarizing Bivariate Data
Chapter 5 Lesson 5.3 Summarizing Bivariate Data
Chapter 5 LSRL.
LSRL Least Squares Regression Line
The scatterplot shows the advertised prices (in thousands of dollars) plotted against ages (in years) for a random sample of Plymouth Voyagers on several.
Chapter 3.2 LSRL.
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
Regression and Residual Plots
Examining Relationships
AP Stats: 3.3 Least-Squares Regression Line
Least-Squares Regression
Least Squares Regression Line LSRL Chapter 7-continued
Chapter 8 Linear Regression
residual = observed y – predicted y residual = y - ŷ
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
^ y = a + bx Stats Chapter 5 - Least Squares Regression
CHAPTER 3 Describing Relationships
Residuals, Residual Plots, & Influential points
Least Squares Regression
Influential points.
Chapter 5 LSRL.
Chapter 5 LSRL.
Chapter 5 LSRL.
Least-Squares Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Section 3.2: Least Squares Regressions
CHAPTER 3 Describing Relationships
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Correlation/Regression - part 2
CHAPTER 3 Describing Relationships
Presentation transcript:

Residuals, Residual Plots, and Influential points Chapter 5 Residuals, Residual Plots, and Influential points

Residuals Vertical distance between data & LSRL Sum of residuals is always zero Residual = observed – expected

Residual Plot Scatterplot of (x, residual) points Purpose: See if x and y have a linear relationship No pattern in the residual plot  linear relationship

Linear Not Linear

Residuals Get the regression equation on the home screen (Stat  Calc  8) Stat  Edit Highlight "L3" List (2nd  Stat) , choose 7: RESID Bottom of screen says "L3 = LRESID"  Hit Enter    Residual Plot: With residuals in L3, make a scatterplottwith:   Xlist: L1  Ylist: L3 

Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 One measure of the success of knee surgery is post-surgical range of motion for the knee joint following a knee dislocation. Is there a linear relationship between age & range of motion? Sketch a residual plot. x Residuals No pattern in the residual plot, so there is a linear relationship between age and range of motion.

Sketch a residual plot using the y-hat values on the horizontal axis. Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 Sketch a residual plot using the y-hat values on the horizontal axis. Residuals

Residual plots are the same plotted against x or y-hat. Residuals Residuals Residual plots are the same plotted against x or y-hat.

Coefficient of Determination Proportion of variation in y that can be explained by the linear relationship between x & y Remains the same no matter which variable is called x

Sum of the squared residuals (errors) using the mean of y Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 Let’s examine r2. Suppose we want to predict a y-value, but we don't have an x-value. Best guess: the mean of the y’s we have. Sum of the squared residuals (errors) using the mean of y SSEy = 1564.917

Sum of the squared residuals (errors) using the LSRL Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 Now suppose we want to predict a y-value and we DO have an x-value  we can use y-hat on the LSRL for that x-value. Sum of the squared residuals (errors) using the LSRL SSEy = 1085.735

Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 SSEy = 1564.917 SSEy = 1085.735 How much does the sum of the squared error go down when we go from the "mean model" to the "regression model"? This is r2 – the amount of the variation in the y-values that is explained by the x-values

Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 How well does age predict the range of motion after knee surgery? Approximately 30.6% of the variation in range of motion after knee surgery can be explained by the LSRL of age and range of motion.

Interpretation of r2 Approximately r2% of the variation in y can be explained by the LSRL of x & y.

What is the equation of the LSRL? Find the slope & y-intercept. Computer-generated regression analysis of knee surgery data: Predictor Coef Stdev T P Constant 107.58 11.12 9.67 0.000 Age 0.8710 0.4146 2.10 0.062 s = 10.42 R-sq = 30.6% R-sq(adj) = 23.7% NEVER use adjusted r2! What are the correlation coefficient and the coefficient of determination? What is the equation of the LSRL? Find the slope & y-intercept.

Outlier In regression, an outlier is a point with a large residual

Influential Point Influences where the LSRL is located If removed, slope of LSRL changes significantly

Racket Resonance Acceleration (Hz) (m/sec/sec) 1 105 36.0 2 106 35.0 3 110 34.5 4 111 36.8 5 112 37.0 6 113 34.0 7 113 34.2 8 114 33.8 9 114 35.0 10 119 35.0 11 120 33.6 12 121 34.2 13 126 36.2 14 189 30.0 One factor in the development of tennis elbow is the impact-induced vibration of the racket and arm at ball contact. Sketch a scatterplot of these data. Calculate the LSRL & correlation coefficient. Does there appear to be an influential point? If so, remove it and calculate the new LSRL & correlation coefficient.

(189, 30) could be influential. Remove & recalculate LSRL.

(189, 30) was influential since it moved the LSRL

Which of these measures are resistant? LSRL Correlation coefficient (r) Coefficient of determination (r2) NONE – all are affected by outliers