Chapter 9 Regression Wisdom

Slides:



Advertisements
Similar presentations
 Objective: To identify influential points in scatterplots and make sense of bivariate relationships.
Advertisements

Linear Regression.  The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu:  The model won’t be perfect, regardless.
Linear Regression (C7-9 BVD). * Explanatory variable goes on x-axis * Response variable goes on y-axis * Don’t forget labels and scale * Statplot 1 st.
Chapter 8 Linear regression
Chapter 8 Linear regression
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression.
Regression Wisdom Chapter 9.
Chapter 4 The Relation between Two Variables
CHAPTER 8: LINEAR REGRESSION
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.4 Cautions in Analyzing.
Regression Wisdom.
Chapter 9: Regression Wisdom
Getting to Know Your Scatterplot and Residuals
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 1 Chapter 8 Regression Wisdom.
Chapter 9: Regression Alexander Swan & Rafey Alvi.
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Linear Regression.
Correlation with a Non - Linear Emphasis Day 2.  Correlation measures the strength of the linear association between 2 quantitative variables.  Before.
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3.
Copyright © 2010 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Regression Wisdom.  Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can’t take it for granted.)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 9 Regression Wisdom.
Linear Regression Chapter 8.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Chapter 3.3 Cautions about Correlations and Regression Wisdom.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Copyright © 2010 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Slide 9-1 Copyright © 2004 Pearson Education, Inc.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Copyright © 2010 Pearson Education, Inc. Slide The lengths of individual shellfish in a population of 10,000 shellfish are approximately normally.
Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?
CHAPTER 8 Linear Regression. Residuals Slide  The model won’t be perfect, regardless of the line we draw.  Some points will be above the line.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
Chapter 9 Regression Wisdom math2200. Sifting residuals for groups Residuals: ‘left over’ after the model How to examine residuals? –Residual plot: residuals.
Linear Regression Chapter 8. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Chapter 9 Regression Wisdom
Regression Wisdom Chapter 9. Getting the “Bends” Linear regression only works for linear models. (That sounds obvious, but when you fit a regression,
Regression Wisdom. Getting the “Bends”  Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can’t.
Residuals, Influential Points, and Outliers
Chapter 9 Regression Wisdom. Getting the “Bends” Linear regression only works for data with a linear association. Curved relationships may not be evident.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Regression Wisdom Copyright © 2010, 2007, 2004 Pearson Education, Inc.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 1 Chapter 8 Regression Wisdom.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Statistics 8 Linear Regression. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 9 Regression Wisdom.
AP Statistics.  Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can’t take it for granted.)
Ch. 10 – Linear Regression (Day 2)
Chapter 9 Regression Wisdom Copyright © 2010 Pearson Education, Inc.
Chapter 8 Regression Wisdom.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3.2 Regression Wisdom.
Chapter 9 Regression Wisdom.
Honors Statistics Review Chapters 7 & 8
CHAPTER 3 Describing Relationships
Presentation transcript:

Chapter 9 Regression Wisdom *Subsets *Extrapolation *Outliers, Leverage, and Influence Points *Lurking Variables

Subsets The data should be homogeneous (of the same or a similar kind or nature) If the data is made up of two or more groups that have been thrown together, it is usually best to fit different linear models to each group Residual plots can help find subsets in the data

Cereal – without subgroups

Cereal – with subgroups

Extrapolation Although linear models provide an easy way to predict values of y for a given value of x, it is unsafe to predict for values of x far from the ones used to find the linear model Such extrapolation may pretend to see into the future, but the predictions should not be trusted Example: data was collected from 1945 – 2000 in Massachusetts of the number of women in elected positions. We should NOT use the model to predict how many women will hold office in 2015

Homework a: 1900 – 1940 there is a linear pattern; 1940 – 1970 the data is curved up; 1970 – 2000 there is a strong linear pattern b: relatively strong from 1970 – 2000 c: no, on the whole graph. If we look at 1970 – 2000 then yes there would be a high correlation d: no. its not straight enough

Homework a: plot the data from 1955 – 1995. The scatterplot has a slight curve. Check the residual plot!! Residual plot has a pattern to it, so it is not a good place to use a linear model. If you did find an equation the predicted value would be 25.3 years. b: not too much. The data is not straight enough to use a linear regression. c: 50 years is too far from the data to make a prediction

Homework a: knowing only the R2 value is not enough to use a linear regression. We need to check a residual plot and the 3 conditions (straight enough, quantitative variables, and outliers) b: no, a linear model might not even fit

Homework a: for every degree the temp rises the cost will go down $2.13 b: The cost when the temp is 00F c: Too high, the residual is negative showing the model overestimates the cost. d: cost = $111. 70 e: actual = $106.70 f: No, the residual plot has a curve to it. The data are probably not linear g: no, there would be no change. The relationship does not depend on the units

Outlier Any data point that stands away from the others In regression, outliers can be extraordinary in two ways having a large residual having a high leverage

Remember Linear models do not fit values with large residuals well Large residuals always need a second look

Leverage Data points whose x-values are far from the mean of x are said to exert leverage of a linear model. High leverage points pull the line close to them large effect of the line can completely determine the slope and the y-intercept with a high enough leverage their residuals can be deceptively small

Leverage Points A linear regression goes through the point Think of this point as the fulcrum of a lever The father away a point is from the fulcrum the more leverage it has High Leverage has the potential to change the regression line

High Leverage Points How to decide if the point will change the regression model Find the regression model with and without the leverage point The point is influential if there is a big change in the model

Influence Depends on both leverage and residual high leverage point whose y value is on the line the point is NOT influential moderate leverage point with a very high residual the point is influential YOU HAVE TO CHECK THE MODELS!!

Unusual Points Unusual points can sometimes tell us more about a model or data than any other point A model based on 1 point is unlikely to be helpful to understand the rest of the data Looking at 1 point against the rest of the data is the best way to understand the point

Warning! Do NOT throw away points!!!! Take out unusual points to look at the model without them Throwing them away can give us a false sense of how accurate the model is Look for the unusual points in the scatterplot they can hide in the residual plots

Checking In Each of these scatterplots shows an unusual point. For each, tell whether the point is a high leverage point, would have a rage residual, or is influential.

Causation No matter how strong the association… No matter how large the R2 value… No matter how straight the line is… you can NOT conclude from the regression alone that one variable CAUSES another

Lurking Variable Only for observational data opposed to data from a designed experiment We can not be sure that a lurking variable is not the cause of a strong or weak association

Life Expectancy The relationship between life expectance (years) and availability of doctors (measured as √(doctors/person)) for the countries of the world

Life Expectancy The relationship between life expectancy (years) and the availability of TVs ((measured as √(TVs/person)) for the countries of the world

Means vary less than individual values Warning!! Summarized Data: can give a false sense of how good an association is Means vary less than individual values Weight (lb) against height (in) for a sample of men. R2 = 41.5% Mean weight (lb) against height (in). R2 = 80.1%

Homework # 10 slope = -.1 for every mph you increase your mpg decreases by .1 y-int = 32 the y-int would be your mpg at 0 mph. the residuals are negative, so the model is overestimating mpg 27 mpg predicted = 27.5 mpg + 1 (residual) = 28.5 mpg strong but not linear no. the residual plot shows the data is not linear

Homework # 11 a high leverage, low residual no, not influential to the slope correlation would decrease the slope would stay about the same because the point is on the line

Homework # 11 b 1.high leverage, small residual (remember the point is pulling the line towards it) 2. yes, influential 3. correlation would weekend and become less negative 4. the slope would increase toward 0

Homework # 11 c some leverage, high residual slightly influential correlation would increase because scatter would decrease slope would increase

Homework # 11 d low/no leverage, high residual not influential correlation would become stronger slope would increase

Homework # 15 stronger, the point has high leverage and is influential so its pulling the line toward it. slope and correlation would both increase you could take the humans out. Now your data is for non-human mammals. moderately strong for every year an animal is expected to line it has to live 15.5 days in its mother before being born 270.4 days

Homework # 16 hippos would make the association stronger because it is farther from the pattern increase no, there must be a good reason to take out points yes, the slope changed from 15.5 to 11.6. that is a big difference

Homework # 19 No! There is a high leverage point with point: without point: There is a large change in R2 and the slope

Homework #20 only 7% of the variation in time is accounted for by the regression on year we can’t say with such a bad regression probably not, the point doesn’t have much leverage 15.9% is better, it appears that swimmers are taking 14 minutes off there time each year

Homework # 22 2 subgroups: 1965 – 1985; linear and positive 1994 – 1998; linear and flat (horizontal)

Homework # 23 a) the graph is clearly nonlinear, however from about 1972 and on appears to be a positive linear relationship b) In 2010 CPI = $218.60

Homework # 24 not including Costa Rica the data has a strong negative linear association Costa Rica has 25 babies/woman. It has to be a mistake, because it is impossible r = .814 and R2 = 66.4% without Costa Rica w/Costa Rica w/out C.R. e) the model with C.R. is not appropriate, the residual plot has some pattern. Without C.R. the residual plot has an even amount of scatter with no pattern f) slope: the life expectancy goes down 4.36 years for every baby a woman has. the y-intercept says a woman with no children should live to be 86.8 years old. g) there could be a lurking variable also effecting life expectancy