Warm-up: This table shows a person’s reported income and years of education for 10 participants. The correlation is .79. State the meaning of this correlation.

Slides:

Advertisements

Similar presentations

2nd Day: Bear Example Length (in) Weight (lb)

Advertisements

CHAPTER 3 Describing Relationships

Ch 3 – Examining Relationships YMS – 3.1

AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION.

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.

Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.

Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.

LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.

CHAPTER 3 Describing Relationships

LEAST-SQUARES REGRESSION 3.2 Role of s and r 2 in Regression.

Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:

Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.

Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.

CHAPTER 3 Describing Relationships

Describing Relationships

Sections Review.

Statistics 101 Chapter 3 Section 3.

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

Chapter 3: Describing Relationships

Warm-up: This table shows a person’s reported income and years of education for 10 participants. The correlation is .79. State the meaning of this correlation.

LSRL Least Squares Regression Line

Chapter 3.2 LSRL.

Regression and Residual Plots

1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.

Least-Squares Regression

Least Squares Regression Line LSRL Chapter 7-continued

residual = observed y – predicted y residual = y - ŷ

Chapter 3: Describing Relationships

CHAPTER 3 Describing Relationships

Least-Squares Regression

CHAPTER 3 Describing Relationships

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

Chapter 3 Describing Relationships Section 3.2

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

Chapter 5 LSRL.

Least-Squares Regression

Least-Squares Regression

Chapter 3: Describing Relationships

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

Chapter 3: Describing Relationships

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

CHAPTER 3 Describing Relationships

3.2 – Least Squares Regression

Chapter 3: Describing Relationships

CHAPTER 3 Describing Relationships

Section 3.2: Least Squares Regressions

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

CHAPTER 3 Describing Relationships

A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

Chapters Important Concepts and Terms

Chapter 3: Describing Relationships

9/27/ A Least-Squares Regression.

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

CHAPTER 3 Describing Relationships

Presentation transcript:

Warm-up: This table shows a person’s reported income and years of education for 10 participants. The correlation is .79. State the meaning of this correlation in the context of this problem.

Linear Regression - R2 and Residuals Chapter 3.2 Linear Regression - R2 and Residuals

Least-Squares Regression Line The LSRL is a model used to represent a set of quantitative data. Suppose you find the distance from each point in the data to the linear model, then square those distances and find the sum. This is called the sum of the squares of the residuals. The Least-Squares Regression Line (LSRL) is the line that minimizes this sum. The equation of the LSRL is

x represents explanatory variable (actual data). represents predicted y-value. b0 represents y-intercept. b1 represents slope.

Given a set of data, you can calculate the LSRL (without using your calculator!). Knowing the correlation makes this task even easier. Use the following formulas: Also, note that:

Exercise 1: The correlation (r) between the number of wins by American League baseball teams and the average attendance at their home games for the 2006 season is 0.696. What would you predict about the Average Attendance for a team that is 2 standard deviations above average in wins? The average attendance will be (0.696)(2) = 1.392 standard deviations above average.

b) If a team is 1 standard deviation below average in attendance, what would you predict about the number of games the team has won? The number of games the team has won will be (0.696)(1) = 0.696 standard deviations below the average wins. Exercise 2: Find the LSRL given the summary statistics – Tale of 2 Regressions WKS

Coefficient of Determination The coefficient of determination, also called R2, is the square of the r-value (correlation). The R2 value tells how much of the variation in the response variable is accounted for by the linear regression model. For example, if R2 = 1, then 100% of the variability in the response variable is accounted for by the linear model. In other words, the relationship between the two variables is perfectly linear. If R2 = 0.95, we can conclude that 95 % of the variability in the response variable is accounted for by the linear relationship with the explanatory variable.

Understanding r-squared: a single point simplification Al Coons Buckingham Browne & Nichols School Cambridge, MA Adapted to Peck, Olsen, Devore by Lee Kucera Capistrano Valley High School Mission Viejo, CA

y Error SSTo model (total sum of squares - distance from y-bar) Error eliminated by y-hat model (linear equation) Proportion of error eliminated by y-hat model Error eliminated by y-hat model = Error SSTo model r2 = proportion of variability accounted for by the given model.

y = Error SSTo model Error eliminated by y-hat model Proportion of error eliminated by y-hat model Error eliminated by y-hat model = Error SSTo model

1. Given the following set of data, find the equation of the LSRL, then find and interpret both the correlation and the coefficient of determination. Jet Ski Fatalities (1987-1996)

a. LSRL: fatal = -34.648 + 6.03 (year) (use meaningful variables in your equation rather than x and y, and use proper statistical notation!) b. Correlation (r-value): 0.938. A correlation of 0.938 indicates that there is a strong, positive, linear relationship between year and number of fatalities . c. Coefficient of determination (R2): 0.880. An R2 value of 0.880 indicates that 88% of the variability in number of fatalities is accounted for by the linear relationship with the year. d. Give the meaning of the slope. Each year, the model predicts that the # of fatalities increases by 6.03 on average.

NOTE: Go over the meaning of slope in the context of the problem. Also explain the formula for the slope ( ) by showing the Understanding r ppt.

Understanding the Formula slope = r (Sy/Sx) Al Coons Buckingham Browne & Nichols School Cambridge, MA al_coons@bbns.org

Sy =8.6 Sy=? Sy=8.6 Sx=? Sx=2.2 Sx=2.2 r = ? r = 1 r = ? r = .21 34 34 Sy =8.6 Sy=? Sy=8.6 10 10 3 4 5 6 7 8 9 3 4 5 6 7 8 9 Sx=? Sx=2.2 Sx=2.2 r = ? r = 1 r = ? r = .21 slope = r (Sy/Sx) =1(8.6/2.2) ~ 3.9 slope = r (Sy/Sx) =.21(8.6/2.2) ~ .82

* Check if r > 0 or r < 0. A study of class attendance and grades earned among first-year students at a state university showed that in general students who attended a higher percent of their classes earned higher grades. Class attendance explained 16% of the variation in grades among the students. What is the numerical correlation between percent of classes attended and grades earned? 0.4 * Check if r > 0 or r < 0.

Residual Plots A residual is the difference between the observed y-value and the predicted y-value for a given x-value. residual =

The sum of the squares of the residuals (SSR) is used to determine the Least-Squares Regression Line for a given set of data. A residual plot is a scatterplot which graphs the residuals on the vertical axis and the values of the explanatory variable on the horizontal axis for each data point, .

The residual plot gives a visual representation of the amount of error in the model. The closer the residuals are to zero, the smaller the error and the more accurate the model. The LSRL is a good model if the residual plot shows random scatter relatively close to the horizontal axis (zero). The horizontal axis represents the LSRL.

Points in the residual plot that lie directly on the horizontal axis lie directly on the LSRL. Points in the residual plot that lie above the horizontal axis lie above the LSRL. Therefore, the model gives an underestimate at that point. Therefore positive residuals represent underestimates. Points in the residual plot that lie below the horizontal axis, lie below the LSRL. Therefore the model gives an overestimate at that point. Therefore negative residuals represent overestimates. The LSRL is not a good model if the residual plot shows a pattern.

3. Construct a well-labeled residual plot using the data on jet ski fatalities from #1. What can you conclude about the appropriateness of the linear model based on the residual plot?

Since the residual plot does not show any distinct pattern, the linear model is appropriate for the original set of data. That is, number of fatalities can be predicted based on the year using the following linear equation: fatal = -34.648 + 6.03 (year)