Week 5 Lecture 2 Chapter 8. Regression Wisdom.

Slides:



Advertisements
Similar presentations
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Advertisements

Chapter 9: Regression Alexander Swan & Rafey Alvi.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Copyright © 2010 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 9 Regression Wisdom.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Chapter 3.3 Cautions about Correlations and Regression Wisdom.
Copyright © 2010 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Slide 9-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Slide The lengths of individual shellfish in a population of 10,000 shellfish are approximately normally.
Regression Wisdom Chapter 9. Getting the “Bends” Linear regression only works for linear models. (That sounds obvious, but when you fit a regression,
Chapter 9 Regression Wisdom. Getting the “Bends” Linear regression only works for data with a linear association. Curved relationships may not be evident.
Regression Wisdom Copyright © 2010, 2007, 2004 Pearson Education, Inc.
AP Statistics.  Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can’t take it for granted.)
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 3: Describing Relationships
Let’s Get It Straight! Re-expressing Data Curvilinear Regression
Chapter 3: Describing Relationships
Unit 4 Lesson 4 (5.4) Summarizing Bivariate Data
Chapter 5 Lesson 5.3 Summarizing Bivariate Data
Chapter 9 Regression Wisdom Copyright © 2010 Pearson Education, Inc.
Correlation – Regression
Regression.
Chapter 12: Regression Diagnostics
(Residuals and
Lecture Slides Elementary Statistics Thirteenth Edition
Chapter 8 Part 2 Linear Regression
Scatterplots, Association, and Correlation
Lecture 14 Review of Lecture 13 What we’ll talk about today?
CHAPTER 29: Multiple Regression*
Chapter 3: Describing Relationships
Chapter 12 Regression.
Regression.
Regression.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
The greatest blessing in life is
Chapter 3: Describing Relationships
Examining Relationships
Regression.
Regression Chapter 8.
Regression.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Regression.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
Summarizing Bivariate Data
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3.2 Regression Wisdom.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 9 Regression Wisdom.
Honors Statistics Review Chapters 7 & 8
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Presentation transcript:

Week 5 Lecture 2 Chapter 8. Regression Wisdom

Percentage of Men Smokers (18 – 24 years of age) from 1965 through 2009 The centre for Disease Control and Prevention track cigarette smoking in the US. How has the percentage of people who smoke changed since the danger became clear during the last half of the 20th century?

Percentage of Men Smokers (18 – 24 years of age) from 1965 through 2009 The scatterplot shows percentage of smokers among men 18-24 years of age, as estimated by surveys, from 1965 through 2009. The percent of men age 18–24 who are smokers decreased dramatically between 1965 and 1990, but the trend has not been consistent since then. The association between percent of men age 18–24 who smoke and year is very strong from 1965 to 1990, but is erratic after 1990. A linear model is not an appropriate model for the trend in the percent of males age 18–24 who are smokers. The relationship is not straight. The regression equation is: male smoking % = 986.99552 - 0.47919438 Year R-sq = 0.7047499 (70.47%)

Checking the Assumptions of Regression Model Residual points are normally distributed.

Checking the Assumptions of Regression Model Plot: Residuals vs. Predictor Variable (Year) Nonlinearity is more prominent. Residual points are not randomly plotted around the zero line; they are not evenly spread out. Residual points form a curvature pattern. Regression model is not correct.

Checking the Assumptions of Regression Model No regression analysis is complete without a display of the residuals to check that the linear model is reasonable. Residuals often reveal subtleties that were not clear from a plot of the original data (e.g. scatterplot of y vs. x) Sometimes they reveal violations of the regression conditions that require our attention. It is good to look at both a histogram of residual (or histogram of standardized residuals or the normal QQ plot of residuals) and a scatterplot of the residuals vs. predictor variable.

Percentage of Both Men and Women Smokers (18 – 24 years of age) from 1965 through 2009 The centre for Disease Control and Prevention track cigarette smoking in the US. How have the percentages of men and women who smoke changed since the danger became clear during the last half of the 20th century?

Scatterplot for Men and Women Smokers (18 – 24 years of age) from 1965 through 2009 Smoking rates for both men and women in the US have decreased significantly over the time period from 1965 to 2009. Smoking rates are generally lower for women than for men. The trend in the smoking rates for women seems a bit straighter than the trend for men. The apparent curvature in the scatterplot for the men could possibly be due to just a few points, and not an indication of a serious violation of the linearity condition.

Scatterplot for Men and Women Smokers (18 – 24 years of age) from 1965 through 2009 StatCrunch Command: Graph > Scatter Plot X-variable: Year Y-Variable: Smoking % Group by: Sex Grouping Options: Color points by group Overlay polynomial order: 1 Group properties: Color scheme: Alternate – 7 colors Click Compute

Men and Women Smokers (18 – 24 years of age) from 1965 through 2009 Graph on the left: Not taking group into account Graph on the right: Identify by group (male or female)

Men and Women Smokers (18 – 24 years of age) from 1965 through 2009 Not taking group into account Smoking % = 953.31052 - 0.46382114 Year Sample size: 34 R (correlation coefficient) = -0.80476796 R-sq = 0.64765148

Analysis of Residual Points Looks like we have two groups.

Analysis of Residual Points An examination of residuals often leads us to discover groups of observations that are different from the rest. Histogram might show multiple modes. When we discover there is more than one group in a regression, we may decide to analyze the groups separately using a different model for each group.

Outliers Any point that stands away from the others can be called an outlier and deserves your special attention. Outlying points can strongly influence a regression. Even a single point far from the body of the data can dominate the analysis.

High Leverage Points A data point that has an x-value far from the mean of the x-values is called a high leverage point. Examples:

Influential Observations A data point is influential if omitting from the analysis gives a very different model. Examples: Relationship between Murder rate and poverty level for 51 state (including the state: DC) Note: DC is far from the rest of the data (overall pattern) and is observed in a different direction than the rest. Dependent Variable: Murder Rate Independent Variable: Poverty Rate  Murder Rate = -3.6792483 + 0.68731484 Poverty Rate Sample size: 51 R (correlation coefficient) = 0.4735608 R-sq = 0.22425983 Estimate of error standard deviation: 3.9143851

Omitting the Observation for DC Examples: Relationship between Murder rate and poverty level for 50 state (excluding DC) Dependent Variable: Murder Rate Independent Variable: Poverty Rate Murder Rate = -0.65671571 + 0.41331907 Poverty Rate Sample size: 50 R (correlation coefficient) = 0.53936435 R-sq = 0.29091391

High Leverage Point BUT Not An Influential Observation

Restricted-range Problem When one of the variables is restricted (you only look at some of the values), the correlation can be surprisingly low. We will visit an example from the web, from David Lane: http://davidmlane.com/hyperstat/A68809.html The demo video is found here: http://onlinestatbook.com/2/describing_bivariate_data/restriction_demo.html

Working with Summary Statistics Graph below shows that there appears to be a strong, positive, linear association between weight (in pounds) and height (in inches) for men. Graph below shows that if instead of data on individuals we only had the mean weight for each height value, we would see an even stronger association. We see less scattered points. It can give a false impression of how well a line summarizes the data. We have a problem of overestimating or underestimating.