Regression Wisdom.  Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can’t take it for granted.)

Slides:



Advertisements
Similar presentations
Statistical Methods Lecture 28
Advertisements

Chapter 9. * No regression analysis is complete without a display of the residuals to check the linear model is reasonable. * The residuals are what is.
 Objective: To identify influential points in scatterplots and make sense of bivariate relationships.
Correlation and Linear Regression
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Extrapolation: Reaching Beyond the Data
Regression Wisdom Chapter 9.
CHAPTER 8: LINEAR REGRESSION
Chapter 17 Understanding Residuals © 2010 Pearson Education 1.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.4 Cautions in Analyzing.
Regression Wisdom.
Chapter 9: Regression Wisdom
Getting to Know Your Scatterplot and Residuals
Chapter 9 Regression Wisdom
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 1 Chapter 8 Regression Wisdom.
Class 7: Thurs., Sep. 30. Outliers and Influential Observations Outlier: Any really unusual observation. Outlier in the X direction (called high leverage.
Chapter 9: Regression Alexander Swan & Rafey Alvi.
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
2.4: Cautions about Regression and Correlation. Cautions: Regression & Correlation Correlation measures only linear association. Extrapolation often produces.
The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.
Copyright © 2010 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Chapter 9 Regression Wisdom
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 9 Regression Wisdom.
Linear Regression Chapter 8.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Chapter 3.3 Cautions about Correlations and Regression Wisdom.
WARM-UP Do the work on the slip of paper (handout)
Copyright © 2010 Pearson Education, Inc. Chapter 9 Regression Wisdom.
AP STATISTICS LESSON 4 – 2 ( DAY 1 ) Cautions About Correlation and Regression.
Slide 9-1 Copyright © 2004 Pearson Education, Inc.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Copyright © 2010 Pearson Education, Inc. Slide The lengths of individual shellfish in a population of 10,000 shellfish are approximately normally.
Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
Chapter 9 Regression Wisdom math2200. Sifting residuals for groups Residuals: ‘left over’ after the model How to examine residuals? –Residual plot: residuals.
Stats Project Chapter 9 By: Rebecca Scott and Sarah Musgrave.
Chapter 9 Regression Wisdom
Regression Wisdom Chapter 9. Getting the “Bends” Linear regression only works for linear models. (That sounds obvious, but when you fit a regression,
Regression Wisdom. Getting the “Bends”  Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can’t.
Residuals, Influential Points, and Outliers
Individual observations need to be checked to see if they are: –outliers; or –influential observations Outliers are defined as observations that differ.
Chapter 9 Regression Wisdom. Getting the “Bends” Linear regression only works for data with a linear association. Curved relationships may not be evident.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Regression Wisdom Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Do Now Examine the two scatterplots and determine the best course of action for other countries to help increase the life expectancies in the countries.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 3: Describing Relationships Section 3.2 Least-Squares Regression.
Statistics 9 Regression Wisdom. Getting the “Bends” Linear regression only works for linear models. (That sounds obvious, but when you fit a regression,
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 1 Chapter 8 Regression Wisdom.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 9 Regression Wisdom.
AP Statistics.  Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can’t take it for granted.)
Unit 4 Lesson 4 (5.4) Summarizing Bivariate Data
Chapter 5 Lesson 5.3 Summarizing Bivariate Data
Chapter 9 Regression Wisdom Copyright © 2010 Pearson Education, Inc.
Cautions about Correlation and Regression
Regression Wisdom Chapter 9.
Chapter 8 Regression Wisdom.
a= b= WARM - UP Variable Coef StDev T P
1. Describe the Form and Direction of the Scatterplot.
Residuals, Influential Points, and Outliers
Chapter 3.2 Regression Wisdom.
Chapter 9 Regression Wisdom.
Chapter 9 Regression Wisdom.
Presentation transcript:

Regression Wisdom

 Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can’t take it for granted.)  A curved relationship between two variables might not be apparent when looking at a scatterplot alone, but will be more obvious in a plot of the residuals. ◦ Remember, we want to see “nothing” in a plot of the residuals.

 Here’s an important unstated condition for fitting models: All the data must come from the same group.  The figure shows regression lines fit to calories and sugar for each of the three cereal shelves in a supermarket :

 We cannot assume that a linear relationship in the data exists far beyond the range of the data.  Once we venture into new x territory, such a prediction is called an extrapolation.  Extrapolation is always dangerous. But, when the x-variable in the model is Time, extrapolation becomes an attempt to peer into the future.

 Outlying points can strongly influence a regression. Even a single point far from the body of the data can dominate the analysis.  The following scatterplot shows that something was awry in Palm Beach County, Florida, during the 2000 presidential election…

 The red line shows the effects that one unusual point can have on a regression:

 A data point can also be unusual if its x-value is far from the mean of the x-values. Such points are said to have high leverage.  A point with high leverage has the potential to change the regression line.  We say that a point is influential if omitting it from the analysis gives a very different model.

The extraordinarily large shoe size gives the data point high leverage. Wherever the IQ is, the line will follow! Points with high leverage pull the line close to them, so they often have small residuals.

 The following scatterplot shows that the average life expectancy for a country is related to the number of doctors per person in that country:

 This new scatterplot shows that the average life expectancy for a country is related to the number of televisions per person in that country:

 Since televisions are cheaper than doctors, send TVs to countries with low life expectancies in order to extend lifetimes. Right?  How about considering a lurking variable? That makes more sense… ◦ Countries with higher standards of living have both longer life expectancies and more doctors (and TVs!). ◦ If higher living standards cause changes in these other variables, improving living standards might be expected to prolong lives and increase the numbers of doctors and TVs.

 No matter how strong the association, no matter how large the R 2 value, no matter how straight the line, there is no way to conclude from a regression alone that one variable causes the other. ◦ There’s always the possibility that some third variable is driving both of the variables you have observed.  With observational data, as opposed to data from a designed experiment, there is no way to be sure that a lurking variable is not the cause of any apparent association.

 There are many ways in which a data set may be unsuitable for a regression analysis: ◦ Watch out for subsets in the data. ◦ Examine the residuals to re-check the Straight Enough Condition. ◦ The Outlier Condition means two things:  Points with large residuals or high leverage (especially both) can influence the regression model significantly.  Even a good regression doesn’t mean we should believe the model completely: ◦ Extrapolation far from the mean can lead to silly and useless predictions. ◦ An R 2 value near 100% doesn’t indicate that there is a causal relationship between x and y.  Watch out for lurking variables.

 Page 244 – 249  Problem # 1, 5, 11(part b,c), 15, 19, 21.