(Residuals and

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
Chapter 12 Simple Linear Regression
Inference for Regression
Chapter 8 Linear Regression © 2010 Pearson Education 1.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Statistics for the Social Sciences
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
Regression Regression relationship = trend + scatter
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
 Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression.
CHAPTER 8 Linear Regression. Residuals Slide  The model won’t be perfect, regardless of the line we draw.  Some points will be above the line.
Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 17 Simple Linear Regression and Correlation.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
The simple linear regression model and parameter estimation
Lecture 9 Sections 3.3 Objectives:
Inference for Regression
CHAPTER 3 Describing Relationships
Statistical Data Analysis - Lecture /04/03
Sections Review.
Statistics 101 Chapter 3 Section 3.
CHAPTER 3 Describing Relationships
Correlation and Simple Linear Regression
Warm-up: This table shows a person’s reported income and years of education for 10 participants. The correlation is .79. State the meaning of this correlation.
Regression and Residual Plots
Day 13 Agenda: DG minutes.
Correlation and Simple Linear Regression
Lecture Slides Elementary Statistics Thirteenth Edition
Regression model Y represents a value of the response variable.
Week 5 Lecture 2 Chapter 8. Regression Wisdom.
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
No notecard for this quiz!!
CHAPTER 3 Describing Relationships
Least-Squares Regression
^ y = a + bx Stats Chapter 5 - Least Squares Regression
CHAPTER 3 Describing Relationships
GET OUT p.161 HW!.
Correlation and Simple Linear Regression
Chapter 3 Describing Relationships Section 3.2
Warm-up: This table shows a person’s reported income and years of education for 10 participants. The correlation is .79. State the meaning of this correlation.
M248: Analyzing data Block D UNIT D2 Regression.
Least-Squares Regression
Simple Linear Regression and Correlation
Linear Regression and Correlation
Product moment correlation
CHAPTER 3 Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Linear Regression and Correlation
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Algebra Review The equation of a straight line y = mx + b
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Linear Regression and Correlation
CHAPTER 3 Describing Relationships
Presentation transcript:

(Residuals and 𝑹 𝟐 - The Variation Accounted For) Week 5 Lecture 1 Chapter 7. Linear Regression (Residuals and 𝑹 𝟐 - The Variation Accounted For)

Residuals A residual is the difference between an observed value of the response and the value predicted by the regression line. That is, residual = observed y – predicted y = 𝒚 - 𝒚 We denote residual with the lower-case letter e e = 𝒚 - 𝒚 Some residuals are negative and some are positive Some residuals are really close to zero or sometimes zero (when we have no error of prediction). The mean (and the sum) of residuals is zero. The residual value is positive when 𝒚 > 𝒚 , which means that we underestimated our prediction. The residual value is negative when 𝒚 < 𝒚 , which means that we overestimated our prediction. The residual value is zero when 𝒚 = 𝒚 (no error of prediction)

Example A survey was conducted in the United States and 10 countries of Western Europe determined the percentage of teenagers who had used marijuana and other drugs. The results are summarized in the table. We saw (based on the scatterplot) that the regression model was appropriate. The regression equation (fitted line) was: 𝒚 =−𝟑.𝟎𝟔𝟖 + 𝟎.𝟔𝟏𝟓𝒙

Example A survey was conducted in the United States and 10 countries of Western Europe determined the percentage of teenagers who had used marijuana and other drugs. The results are summarized in the table. . The regression equation was: 𝒚 =−𝟑.𝟎𝟔𝟖 + 𝟎.𝟔𝟏𝟓𝒙 for country USA the percent of marijuana usage was 𝒙 = 34, the percent of other drug usage was 𝒚 = 24. The predicted percent of other drug usage was: 𝒚 =−𝟑.𝟎𝟔𝟖 + 𝟎.𝟔𝟏𝟓(𝒙=𝟑𝟒) = 17.84 The residual (observed – predicted) is: 24 – 17.84 = 6.16 (%) The residual value is positive, because, 𝒚 > 𝒚 , which means that we underestimated the percent of teens who use other drugs.

In StatCrunch Below table shows residual values in the data using StatCrunch. StatCrunch command: Stat>regression>simple linear > X-variable: Marijuana% Y-variable: Other drug% Save “Residuals” Click Compute

Residual Plots Residual plots help us assess the regression model assumptions. We check the model assumptions (assumption of random errors) using residual points. There are three assumptions to check: Residual points are approx. normally distributed (e.g., check the normal quantile or histogram of residuals or histogram of standardized residuals). Residuals have mean zero. Residual points are randomly plotted around the zero line (mean of residuals) – use the plot of residuals verses predictor or fitted value (this is a scatterplot except for here we do not want to see any obvious pattern). Residuals have constant variances. Residual points are evenly spread out around the zero line - use the plot of residuals verses predictor or fitted value (this is a scatterplot except for here we do not want to see any obvious pattern like fanning: an increasing dispersion as the fitted values increase).

Checking the Assumptions In StatCrunch StatCrunch command: Stat>regression>simple linear > X-variable: Marijuana% Y-variable: Other drug% Graphs: Histogram of residuals QQ plot of residuals Residuals vs X-values Click Compute

Histogram of Residuals

Checking the Assumption of Normality There are no major departure from the straight line; therefore, the normal distribution assumption of residuals is met.

Checking the Assumption of Mean Zero and Constant Variances Residual points are randomly plotted around the zero horizontal line (mean zero for the residuals). No major pattern is seen. Therefore, the assumption that residuals have mean zero is met. Assumption #3: Residual points are evenly spread out. Therefore, the assumption that residuals have constant variances is met.

Regression Model is Correct All Three Assumptions Are Met.

Example of Regression Model NOT Correct The curvature pattern suggests the need for higher order model or transformations.

Example of Regression Model NOT Correct The trend in dispersion: An increasing dispersion as the fitted values increase, in which case a transformation of the response may help. For example, taking log or square root.

𝑹 𝟐 - The Coefficient of Determination The square of the correlation ( 𝒓 𝟐 ) is the fraction of variation in the values of response (y) that is explained by the least-squares regression of y on x (explanatory variable). In our example: 𝒓 = 0.93 𝒓 𝟐 = (𝟎.𝟗𝟑) 𝟐 = 0.87 1 - R2 is the proportion of the model variability not explained with the linear relationship with X (left in the residual). In our example: 1 - R2 = 1 - 0.87 = 0.13 Interpretation: About 87% of the variation in the percent of teens who used other drugs (other than marijuana) is explained by the linear regression with percent of teens who used marijuana. Checking the above numbers with StatCrunch (actually reading from StatCrunch):

𝐅𝐢𝐧𝐝𝐢𝐧𝐠 𝐫 𝐟𝐫𝐨𝐦 𝐑 𝟐 Recall the association between Adult smokers % and ACT scores. The regression function was: Adult Smokers % = 45.348999 - 1.2272345 ACT 𝑹 𝟐 = 0.20 What is r (estimate of correlation)?

𝐅𝐢𝐧𝐝𝐢𝐧𝐠 𝐫 𝐟𝐫𝐨𝐦 𝐑 𝟐 Recall the association between Adult smokers % and ACT scores. The regression function was: Adult Smokers % = 45.348999 - 1.2272345 ACT 𝑹 𝟐 = 0.20 What is r (estimate of correlation)? r = sign of slope 𝑹 𝟐 r = - 𝟎.𝟐𝟎 = - 0.45

Steps in Doing Regression Start with a scatterplot. If the scatterplot does not look like a straight line relationship, stop. Otherwise, you can calculate correlation and also intercept and the slope of the regression line. Check whether regression is OK by looking at plot of residuals against anything relevant. If it is not OK, do not use regression. We cannot say that the explanatory variable is a useful predictor. Our aim: We want regression for which line is OK and we confirm that by looking at scatterplot and residual plots.