Chapter 10 Correlation and Regression
SCATTER DIAGRAMS AND LINEAR CORRELATION
Note: Studies of correlation and regression of two variables usually begin with a graph of paired data values (x,y). We call this a scatter diagram
Scatter Plot/Scatter Diagram It is a graph in which data pairs (x,y) are plotted as individual points on a grid with horizontal axis x and vertical axis y. We call x the explanatory variable and y the response variable.
Why do we use scatter plot? We use scatter plot to observe whether there seems to be a linear relationship between x and y values.
Example: Phosphorous is a chemical used in many household and industrial cleaning compound. Unfortunately, phosphorous tends to find its way into surface water, where it can kill fish, plants, and other wetland creatures. Phosphorous reduction programs are required by law and are monitored by the EPA. A random sample of eight sites in a California wetlands study gave the following information about phosphorous reduction in drainage water. X is a random variable that represents phosphorous concentration and y is a random variable that represents total phosphorous concentration. Graph the scatter plot and then comment on the relationship between x and y x y
Group Work Here are the safety report of different divisions: Make a scatter plot Does a line fit the data reasonably well? Draw a line that “fits best” DivisionX (hours in safety training) Y (accidents)
Scatter Diagrams with its correlation
Correlation coefficient r
How do you compute the sample correlation coefficient r?
Example: calculate r X y
Group work: calculate r X y
Note:
TI 83/TI 84 Enter the data into two columns. Use Stat Plot and choose the first type. Use option 9:ZoomStat under Zoom. CATALOG, find DiagnosticON, press enter twice. Then, press STAT, CALC, then option 8:LinReg(a+bx)
Homework Practice P503 #1-16 even
LINEAR REGRESSION AND THE COEFFICIENT OF DETERMINATION
Least-squares criterion The sum of the squares of the vertical distances from the data points (x,y) to the line is made as small as possible.
What does it mean? It means that for any given point, the distance from the point to the line has the least amount of error (the sum of the squares of the vertical distance from the points to the line be made as small as possible). We use the least-squares criterion to find the best linear equation.
Example: Find the least-squares line y=a+bx X y
Sketch the scatter plot and the least- squares line
Remember
Using the Least-Squares Line for Prediction
Group Work a)Find the least-squares line. b)Predict when x = 51 X= Super soldier serum y=Captain America
Coefficient of determination
Homework Practice Pg 520 #1-18 eoo
INFERENCES FOR CORRELATION AND REGRESSION
Sample Statistic to Population Parameter Sample StatisticPopulation Parameter
Important Note:
Example: Learn more, earn more! We have probably all heard this platitude. The question is whether or not there is some truth in the statement. Do college graduates have an improved chance at a better income? Is there a trend in the general population? Consider the following variables: x=percentage of the population 25 or older with at least four years of college and y = percentage growth in per capita income over the past seven years. A random sample of six communities in Ohio gave the info. Use 1% level of significance
Answer
Group Work x y
Standard Error of Estimate
TI 83/TI 84
How to Find a Confidence Interval for a predicted y from the Least-Squares Line
Confidence interval Continue.
Example: XY
It is important because it measures the rate at which y changes per unit change in x
Example: X Y
Group Work x Y
Homework Practice Pg 543 #1-12 odd