Outliers and Influential points Imagine a scatter plot of the heights and weights of adult men….would it be a positive or negative association? Should.

Slides:



Advertisements
Similar presentations
Correlation and Regression Statistics Introduction Means etc are of course useful We might also wonder, “how do variables go together?” IQ is a.
Advertisements

ADULT MALE % BODY FAT. Background  This data was taken to see if there are any variables that impact the % Body Fat in males  Height (inches)  Waist.
LSRLs: Interpreting r vs. r2
Chapter 3 Bivariate Data
Regression Wisdom.
Statistics for the Social Sciences
Statistical Relationship Between Quantitative Variables
Introduction to Probability and Statistics Linear Regression and Correlation.
Introduction to Linear Regression.  You have seen how to find the equation of a line that connects two points.
Basic Practice of Statistics - 3rd Edition
C HAPTER 2 S CATTER PLOTS, C ORRELATION, L INEAR R EGRESSION, I NFERENCES FOR R EGRESSION By: Tasha Carr, Lyndsay Gentile, Darya Rosikhina, Stacey Zarko.
Least Squares Regression Line (LSRL)
Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions.
Linear Regression.
Relationship of two variables
Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator.
Chapter 3: Examining relationships between Data
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Linear Regression. Simple Linear Regression Using one variable to … 1) explain the variability of another variable 2) predict the value of another variable.
Lesson Least-Squares Regression. Knowledge Objectives Explain what is meant by a regression line. Explain what is meant by extrapolation. Explain.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
AP Statistics Chapter 8 & 9 Day 3
Stat 13, Thur 5/24/ Scatterplot. 2. Correlation, r. 3. Residuals 4. Def. of least squares regression line. 5. Example. 6. Extrapolation. 7. Interpreting.
Linear Regression Chapter 8.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
3.2 Least Squares Regression Line. Regression Line Describes how a response variable changes as an explanatory variable changes Formula sheet: Calculator.
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
Regression Regression relationship = trend + scatter
Chapters 8 & 9 Linear Regression & Regression Wisdom.
PS 225 Lecture 20 Linear Regression Equation and Prediction.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
3.2 - Least- Squares Regression. Where else have we seen “residuals?” Sx = data point - mean (observed - predicted) z-scores = observed - expected * note.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
Line of Best Fit 4-8 Warm Up Lesson Presentation Lesson Quiz
Simple Linear Regression In the previous lectures, we only focus on one random variable. In many applications, we often work with a pair of variables.
Section 2.6 – Draw Scatter Plots and Best Fitting Lines A scatterplot is a graph of a set of data pairs (x, y). If y tends to increase as x increases,
5.4 Line of Best Fit Given the following scatter plots, draw in your line of best fit and classify the type of relationship: Strong Positive Linear Strong.
Lesson Correlation and Regression Wisdom. Knowledge Objectives Recall the three limitations on the use of correlation and regression. Explain what.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
AP Statistics Semester One Review Part 1 Chapters 1-3 Semester One Review Part 1 Chapters 1-3.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Least Squares Regression Remember y = mx + b? It’s time for an upgrade… A regression line is a line that describes how a response variable y changes as.
 Chapter 3! 1. UNIT 7 VOCABULARY – CHAPTERS 3 & 14 2.
Example: set E #1 p. 175 average ht. = 70 inchesSD = 3 inches average wt. = 162 lbs.SD = 30 lbs. r = 0.47 a)If ht. = 73 inches, predict wt. b)If wt. =
LSRLs: Interpreting r vs. r 2 r – “the correlation coefficient” tells you the strength and direction between two variables (x and y, for example, height.
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
Chapter 3: Describing Relationships
CHAPTER 5: Regression ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
MATH 2311 Section 5.4. Residuals Examples: Interpreting the Plots of Residuals The plot of the residual values against the x values can tell us a lot.
Correlation & Linear Regression Using a TI-Nspire.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Lecture 9 Sections 3.3 Objectives:
Statistics 200 Lecture #6 Thursday, September 8, 2016
LSRL Least Squares Regression Line
Cautions about Correlation and Regression
Regression Inference.
Outliers… Leverage… Influential points….
Least-Squares Regression
Review of Chapter 3 Examining Relationships
Algebra Review The equation of a straight line y = mx + b
Chapter 3 Vocabulary Linear Regression.
Lesson 2.2 Linear Regression.
Homework: PG. 204 #30, 31 pg. 212 #35,36 30.) a. Reading scores are predicted to increase by for each one-point increase in IQ. For x=90: 45.98;
Review of Chapter 3 Examining Relationships
Presentation transcript:

Outliers and Influential points Imagine a scatter plot of the heights and weights of adult men….would it be a positive or negative association? Should there be correlation?

Vocabulary Review Remember – Association refers to a scatter plot’s visual pattern (not necessarily linear) – Correlation is a mathematical calculation that measures the direction (positive or negative) and strength [ -1 ≤ |r| ≤ 1] of linear association

Shaquille O’Neal As a general rule, the taller a person is, the greater the weight (on average). Now add Shaquille O'Neal (7'1", 325lbs) to the plot. He's an outlier in height and an outlier in weight, but not an outlier with respect to the bivariate relationship – rather, he's an example of it. His point might probably lies pretty close to the regression line determined by the others, so with or without him there the line probably looks pretty much the same. And that means he's not influential.

Sumo Wrestler Instead, imagine adding a sumo wrestler, average height but very high in weight. The sumo wrestler is an outlier because his point doesn't fit the pattern well. As evidence, the fact that it's not close to the line will show up in a large residual. Since this guy's point is directly above the mean point [ (x-bar, y-bar) where the line is pinned], his residual stays the same regardless of the slope of the line. That means the other points determine the slope, and the sumo wrestler isn't influential. {recall correlation formula…note x – x-bar = 0! So this wouldn’t show up in “r”}

Manute Bol Finally, imagine adding Manute Bol, perhaps the skinniest center in NBA history at 7'6" and only 200 pounds. Very tall, but surprisingly low weight -- so he doesn't fit the pattern, an outlier. He'd be very far from the line determined by the rest of the data (potentially a huge residual), but if we include him when doing the regression, his presence will tip the line down in order to minimize the sum of squared residuals. That leverage to make the line's slope change a lot makes Manute an influential point.