The Least Square Regression Line (LSRL) is: The linear equation that minimizes the sum of the squared vertical distances from the data points to the line ( ). We also call this LSRL the regression line or the line of best fit. To find the LSRL without all the data points you must use the following formulas: where b1 represents our slope, r is the correlation coefficient, sy is the standard deviation of the y variable, and sx is the standard deviation of the x variable where b0 represents our y-intercept, is the mean of the y variables and is the mean of the x variable Example 1: Colleges use SAT scores in the admissions process because they believe these scores provide some insight into how a high school student will perform at a college level. Supposed the entering freshman at a certain college have mean combined SAT scores of 1222 with a standard deviation of 83. In the first semester these students attained a mean GPA of 2.66 with a standard deviation of 0.56. A scatterplot showed the association to be reasonably linear, and the correlation between SAT score and GPA was 0.47. a. Identify the explanatory and response variables. X: SAT score Y: GPA because colleges use SAT scores to predict GPA b. Write the equation of the regression line to predict the GPA of a freshman given an SAT score. c. Predict the GPA of a freshman who scored a combined 1400.
Example 2: According to the article “First-Year Academic Success Example 2: According to the article “First-Year Academic Success...”(1999) there is a mild correlation (r =.55) between high school GPA and college GPA. The high school GPA’s have a mean of 3.7 and standard deviation of 0.47. The college GPA’s have a mean of 2.86 with standard deviation of 0.85. a. What is the explanatory variable? X: high school GPA b. What is the LSRL? c. Billy Bob’s high school GPA is 3.2, what could we expect of him in college?
Example 3: A manufacturer of dish detergent believes the height of soapsuds (in mm) in the dishpan depends on the amount of detergent (in grams) used. A study of the suds’ heights for a new dish detergent was conducted. Seven pans of water were prepared. All pans were of the same size and type and contained the same amount of water. The temperature of the water was the same for each pan. An amount of dish detergent was assigned at random to each pan, and that amount of detergent was added to the pan. Then the water in the dishpan was agitated for a set amount of time, and the height of the resulting suds was measured. The computer output from fitting a least squares regression line to the data are shown below. a) Write the equation of the fitted regression line. Define any variables used in this equation. x = amount of detergent used y = height of soapsuds b) Interpret the slope and the y intercept in context of this problem. On average for every gram increase in detergent amount there’s an increase of 9.5 mm in the height of soapsuds. If there’s 0 grams of detergent then the height of the soapsuds is -2.679 mm (which is impossible) c) Find and interpret the correlation coefficient there is a strong positive linear relationship between detergent amount and height of soapsuds. (both +.9844 and -.9844 when squared will give you +.969 so you have to be careful with the sign of r. we know r has to be positive because the slope is positive) y-intercept slope r2
Here r is negative because the slope is negative. Example 4:The Earth’s Moon has many impact craters that were created when the inner solar system was subjected to heavy bombardment of small celestial bodies. Scientists studied 11 impact craters on the Moon to determine whether there was any relationship between the age of the craters (based on radioactive dating of lunar rocks) and the impact rate (as deduced from the density of the craters). a) Write the equation of the fitted regression line. Define any variables used in this equation. b) Find and interpret the correlation coefficient There is a strong negative linear relationship between age and impact rate. Here r is negative because the slope is negative.