Section 13.7 Linear Correlation and Regression
What You Will Learn Linear Correlation Scatter Diagram Linear Regression Least Squares Line
Linear Correlation Linear correlation is used to determine whether there is a linear relationship between two quantities and, if so, how strong the relationship is.
Linear Correlation Coefficient The linear correlation coefficient, r, is a unitless measure that describes the strength of the linear relationship between two variables. If the value is positive, as one variable increases, the other increases. If the value is negative, as one variable increases, the other decreases. The variable, r, will always be a value between –1 and 1 inclusive.
Scatter Diagrams A visual aid used with correlation is the scatter diagram, a plot of points (bivariate data). The independent variable, x, generally is a quantity that can be controlled. The dependent variable, y, is the other variable.
Scatter Diagrams The value of r is a measure of how far a set of points varies from a straight line. The greater the spread, the weaker the correlation and the closer the r value is to 0. The smaller the spread, the stronger the correlation and the closer the r value is to 1 or –1.
Correlation
Correlation
Linear Correlation Coefficient The formula to calculate the correlation coefficient (r) is as follows.
Example 1: Number of Absences Versus Number of Defective Parts Egan Electronics provided the following daily records about the number of assembly line workers absent and the number of defective parts produced for 6 days. Determine the correlation coefficient between the number of workers absent and the number of defective parts produced.
Example 1: Number of Absences Versus Number of Defective Parts
Example 1: Number of Absences Versus Number of Defective Parts Solution Here’s the scatter diagram.
Example 1: Number of Absences Versus Number of Defective Parts Solution Find r.
Example 1: Number of Absences Versus Number of Defective Parts Solution
Example 1: Number of Absences Versus Number of Defective Parts Solution Since the maximum possible value for r is 1.00, a correlation coefficient of 0.922 is a strong, positive correlation. This result implies that, generally, the more assembly line workers absent, the more defective parts produced.
Linear Regression Linear regression is the process of determining the linear relationship between two variables.
Linear Regression The line of best fit (regression line or the least squares line) is the line such that the sum of the squares of the vertical distances from the line to the data points (on a scatter diagram) is a minimum.
The Line of Best Fit The equation of the line of best fit is
Example 3: The Line of Best Fit a) Use the data in Example 1 to find the equation of the line of best fit that relates the number of workers absent on an assembly line and the number of defective parts produced. b) Graph the equation of the line of best fit on a scatter diagram that illustrates the set of bivariate points.
Example 3: The Line of Best Fit Solution From Example 1, we know that
Example 3: The Line of Best Fit Solution Now, find the y-intercept, b.
Example 3: The Line of Best Fit Solution The equation of the line of best fit is y = mx + b y = 3.23x + 8.52
Example 3: The Line of Best Fit Solution To graph y = 3.23x + 8.52, plot at least two points and draw the graph. 27.90 6 21.44 4 14.98 2 y x
Example 3: The Line of Best Fit 27.90 6 21.44 4 14.98 2 y x