Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL.

Similar presentations


Presentation on theme: "Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL."— Presentation transcript:

1 Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL

2 What is the objective of regression analysis? x – variable: is the independent or explanatory variable y- variable: is the dependent or response variable We will use values of x to predict values of y.

3 b – is the slope –it is the predicted amount by which y increases when x increases by 1 unit a – is the y-intercept –it is the predicted height of the line when x = 0 –in some situations, the y-intercept has no meaning The LSRL is - (y-hat) means the predicted y Be sure to put the hat on the y Scatterplots frequently exhibit a linear pattern. When this is the case, it makes sense to summarize the relationship between the variables by finding a line that is as close as possible to the plots in the plot. This is done by calculating the line of best fit or Least Squares Regression Line (LSRL). minimizes The LSRL is the line that minimizes the sum of the squares of the deviations from the line Let’s explore what this means...

4 (3,10) (6,2) Sum of the squares = 61.25 -4 4.5 -5 y =.5(0) + 4 = 4 0 – 4 = -4 (0,0) y =.5(3) + 4 = 5.5 10 – 5.5 = 4.5 y =.5(6) + 4 = 7 2 – 7 = -5 Suppose we have a data set that consists of the observations (0,0), (3,10) and 6,2). Let ’ s just fit a line to the data by drawing a line through what appears to be the middle of the points. Now find the vertical distance from each point to the line. Find the sum of the squares of these deviations (aka Residuals.)

5 (0,0) (3,10) (6,2) Sum of the squares = 54 Find the residuals from the line -3 6 minimizes LSRL The line that minimizes the sum of the squares of the deviations from the line is the LSRL. Find the sum of the squares of the deviations from the line

6 Researchers are studying pomegranate's antioxidants properties to see if it might be helpful in the treatment of cancer. In one study, mice were injected with cancer cells and randomly assigned to one of three groups, plain water, water supplemented with.1% pomegranate fruit extract (PFE), and water supplemented with.2% PFE. The average tumor volume for mice in each group was recorded for several points in time. (x = number of days after injection of cancer cells in mice assigned to plain water and y = average tumor volume (in mm 3 ) x1115192327 y 150270450580740 Sketch a scatterplot for this data set.

7 Pomegranate study continued x = number of days after injection of cancer cells in mice assigned to plain water and y = average tumor volume x1115192327 y 150270450580740 Calculate the LSRL and the correlation coefficient. Interpret the slope and the correlation coefficient in context. The predicted volume of the tumor increases by 37.25 mm 3 for each additional day after injection. Remember that an interpretation is stating the definition in context. There is a strong, positive, linear relationship between the average tumor volume and the number of days since injection. Does the intercept have meaning in this context? Why or why not?

8 Pomegranate study continued x = number of days after injection of cancer cells in mice assigned to plain water and y = average tumor volume x1115192327 y 150270450580740 Predict the average volume of the tumor for 20 days after injection. Predict the average volume of the tumor for 5 days after injection. Can volume be negative? This is the danger of extrapolation. The least- squares line should not be used to make predictions for y using x-values outside the range in the data set. Why? It is unknown whether the pattern observed in the scatterplot continues outside the range of x- values.

9 Extrapolation (cont.) A regression of mean age at first marriage for men vs. year fit to the first 4 decades of the 20 th century does not hold for later years:

10 Pomegranate study continued x = number of days after injection of cancer cells in mice assigned to plain water and y = average tumor volume x1115192327 y 150270450580740 Suppose we want to know how many days after injection of cancer cells would the average tumor size be 500 mm 3 ? Is this the appropriate regression line to answer this question? No, the slope of the line for predicting x is not and the intercepts are almost always different. Here is the appropriate regression line: The regression line of y on x should not be used to predict x, because it is not the line that minimizes the sum of the squared deviations in the x direction.

11 Pomegranate study continued x = number of days after injection of cancer cells in mice assigned to plain water and y = average tumor volume x1115192327 y 150270450580740 Find the mean of the x-values (x) and the mean of the y-values (y). Plot the point of averages (x,y) on the scatterplot. x = 19 and y = 438 + Will the point of averages always be on the regression line?

12 x = number of days after injection of cancer cells in mice assigned to plain water and y = average tumor volume x1115192327 y 150270450580740 Minitab, a statistical software package, was used to fit the least-squares regression line. Part of the resulting output is shown below. The regression equation is Predicted volume = -269.75 + 37.25 days PredictorCoefSE CoefTP Constant-269.7523.421412-11.517240.0014 Days37.251.18145431.528950.000 intercept slope We will discuss what these numbers mean in the Chapter 13.

13 Homework Pg.243: #5.15-5.19, 5.22, 5.23 (graph the scatterplots on your TI. Don’t need to show on paper)


Download ppt "Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL."

Similar presentations


Ads by Google