Download presentation
Presentation is loading. Please wait.
1
3.1: Scatterplots & Correlation
Knight’s Charge P168 P174 p176
4
Section 3.1 Scatterplots and Correlation
After this section, you should be able to… IDENTIFY explanatory and response variables CONSTRUCT scatterplots to display relationships INTERPRET scatterplots MEASURE linear association using correlation INTERPRET correlation
5
Explanatory & Response Variables
Explanatory Variables (Independent Variables ) Car weight Number of cigarettes smoked Number of hours studied Response Variables (Dependent Variables) Accident death rate Life expectancy SAT scores
6
Scatterplots A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as a point on the graph.
7
Scatterplots Decide which variable should go on each axis.
Remember, the eXplanatory variable goes on the X- axis! Label and scale your axes. Plot individual data values.
8
Scatterplots Make a scatterplot of the relationship between body weight and pack weight. Body weight is our eXplanatory variable. Body weight (lb) 120 187 109 103 131 165 158 116 Backpack weight (lb) 26 30 24 29 35 31 28
9
Constructing a Scatterplot:
Enter x values into list 1 and enter y values into list 2. Go to Stat Plot and turn on, pick 1st graph Set window Appropriately Graph
10
Constructing a Scatterplot
11
Describing Scatterplots
As in any graph of data, look for the overall pattern and for striking departures from that pattern. You can describe the overall pattern of a scatterplot by the direction, form, and strength of the relationship. An important kind of departure is an outlier, an individual value that falls outside the overall pattern of the relationship. Also, clustering.
12
Words That Describe… Direction (slope) Form Strength
Positive or Negative Form Linear, quadratic, cubic, exponential, curved, non-linear, etc. Strength Strong, weak, somewhat strong, very weak, moderately strong, etc.
13
More on Strength… Strength refers to how tightly grouped the points are in a particular pattern. Later on we use describe strength as “correlation”
14
Describe this Scatterplot
15
Describe this Scatterplot
16
Describe this Scatterplot
17
Interpreting a Scatterplot
Interpret….tell what the data suggests in real world terms. Example: The data suggests that the more hours a student studied for Mrs. Betts AP Stats test the higher grade the student earned. There is a positive relationship between hours studied and grade earned.
18
Describe and interpret the scatterplot below
Describe and interpret the scatterplot below. The y-axis refers to backpack weight in pounds and the x-axis refers to body weight in pounds.
19
Describe and interpret the scatterplot below
Describe and interpret the scatterplot below. The y-axis refers to backpack weight in pounds and the x-axis refers to body weight in pounds. Sample Answer: There is a moderately strong, positive, linear relationship between body weight and pack weight. There is one possible outlier, the hiker with the body weight of 187 pounds seems to be carrying relatively less weight than are the other group members. It appears that lighter students are carrying lighter backpacks
20
Describe and interpret the scatterplot below
Describe and interpret the scatterplot below. The y-axis refer to a school’s mean SAT math score. The x-axis refers to the percentage of students at a school taking the SAT.
21
Describe and interpret the scatterplot below
Describe and interpret the scatterplot below. The y-axis refer to a school’s mean SAT math score. The x-axis refers to the percentage of students at a school taking the SAT. Sample Answer: There is a moderately strong, negative, curved relationship between the percent of students in a state who take the SAT and the mean SAT math score. Further, there are two distinct clusters of states and at least one possible outliers that falls outside the overall pattern.
22
What is Correlation? A mathematical value that describes the strength of a linear relationship between two quantitative variables. Correlation values are between -1 and 1. Correlation is abbreviated: r The strength of the linear relationship increases as r moves away from 0 towards -1 or 1.
23
What does “r” tell us?! Correlation describes what percent of variation in y is ‘explained’ by x. Notice that the formula is the sum of the z-scores of x multiplied by the z-scores of y.
24
Scatterplots and Correlation
25
What does “r” mean? R Value Strength -1 Perfectly linear; negative
-0.75 Strong negative relationship -0.50 Moderately strong negative relationship -0.25 Weak negative relationship nonexistent 0.25 Weak positive relationship 0.50 Moderately strong positive relationship 0.75 Strong positive relationship 1 Perfectly linear; positive
26
How strong is the correlation? Is it positive or negative?
0.235 -0.456 0.975 -0.784
27
Describe and interpret the scatterplot below
Describe and interpret the scatterplot below. Be sure to estimate the correlation.
28
Sample Answer: As the number of boats registered in Florida increases so does the number of manatees killed by boats. This relationship is evidenced in the scatterplot by a strong, positive linear relationship. The estimated correlation is approximately r =0.85. **Answers between would be acceptable.
29
Describe and interpret the scatterplot below
Describe and interpret the scatterplot below. Be sure to estimate the correlation.
30
Sample Answer: As the number of predicted storms increases, so does the number of observed storms, but the relationship is weak. The relationship evidenced in the scatterplot is a fairly weak positive linear relationship. The estimated correlation is approximately r = 0.25. **Answers between 0.15 and 0.45 would be acceptable.
31
Estimate the Correlation Coefficient
32
Estimate the Correlation Coefficient
33
Calculate Correlation: (and r2)
Enter x values in list 1 and y values in list 2. Press Catalog, Diagnostics Stat, Calc, #8, L1,L2 Enter This gives you your slope, y-int, r and r2 Correlation should be 0.79
34
Facts about Correlation
Correlation requires that both variables be quantitative. Correlation does not describe curved relationships between variables, no matter how strong the relationship is. Correlation is not resistant. r is strongly affected by a few outlying observations. Correlation makes no distinction between explanatory and response variables. r does not change when we change the units of measurement of x, y, or both. r does not change when we add or subtract a constant to either x, y or both. The correlation r itself has no unit of measurement.
35
R: Ignores distinctions between X & Y
36
R: Highly Effected By Outliers
37
Why?! Since r is calculated using standardized values (z-scores), the correlation value will not change if the units of measure are changed (feet to inches, etc.) Adding a constant to either x or y or both will not change the correlation because neither the standard deviation nor distance from the mean will be impacted.
38
Correlation Formula: Suppose that we have data on variables x and y for n individuals. The values for the first individual are x1 and y1, the values for the second individual are x2 and y2, and so on. The means and standard deviations of the two variables are x-bar and sx for the x-values and y-bar and sy for the y- values. The correlation r between x and y is:
39
Answers 3.1, 3.5
40
Answers 3.7, 3.11
41
Answers 3.15, 3.17,3.21
42
3.2: Least Squares Regressions
43
Section 3.2 Least-Squares Regression
After this section, you should be able to… INTERPRET a regression line CALCULATE the equation of the least-squares regression line CALCULATE residuals CONSTRUCT and INTERPRET residual plots DETERMINE how well a line fits observed data INTERPRET computer regression output
44
Regression Lines A regression line summarizes the relationship between two variables, but only in settings where one of the variables helps explain or predict the other. A regression line is a line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.
45
Regression Lines Regression lines are used to conduct analysis.
Colleges use student’s SAT and GPAs to predict college success Professional sports teams use player’s vital stats (40 yard dash, height, weight) to predict success The Federal Reserve uses economic data (GDP, unemployment, etc.) to predict future economic trends. Macy’s uses shipping, sales and inventory data predict future sales.
46
Regression Line Equation
Suppose that y is a response variable (plotted on the vertical axis) and x is an explanatory variable (plotted on the horizontal axis). A regression line relating y to x has an equation of the form: ŷ = ax + b In this equation, ŷ (read “y hat”) is the predicted value of the response variable y for a given value of the explanatory variable x. a is the slope, the amount by which y is predicted to change when x increases by one unit. b is the y intercept, the predicted value of y when x = 0.
47
Regression Line Equation
48
Format of Regression Lines
49
Interpreting Linear Regression
Y-intercept: A student weighing zero pounds is predicted to have a backpack weight of 16.3 pounds (no practical interpretation). Slope: For each additional pound that the student weighs, it is predicted that their backpack will weigh an additional pounds more, on average.
50
Interpreting Linear Regression
51
Interpreting Linear Regression
54
Residuals residual = y - ŷ
A residual is the difference between an observed value of the response variable and the value predicted by the regression line. That is, residual = observed y – predicted y residual = y - ŷ Positive residuals (above line) Negative residuals (below line) residual
55
How to Calculate the Residual
Calculate the predicted value, by plugging in x to the LSRE. Determine the observed/actual value. Subtract.
56
Calculate the Residual
If a student weighs 170 pounds and their backpack weighs 35 pounds, what is the value of the residual? If a student weighs 105 pounds and their backpack weighs 24 pounds, what is the value of the residual?
57
Calculate the Residual
1. If a student weighs 170 pounds and their backpack weighs 35 pounds, what is the value of the residual? Predicted: ŷ = (170) = Observed: 35 Residual: = pounds The student’s backpack weighs pounds more than predicted.
58
Calculate the Residual
2. If a student weighs 105 pounds and their backpack weighs 24 pounds, what is the value of the residual? Predicted: ŷ = (105) = Observed: 24 Residual: 24 – = The student’s backpack weighs pounds less than predicted
59
Residual Plots A residual plot is a scatterplot of the residuals against the explanatory variable. Residual plots help us assess how well a regression line fits the data.
60
Linear model not appropriate
Interpreting Residual Plots A residual plot magnifies the deviations of the points from the line, making it easier to see unusual observations and patterns. The residual plot should show no obvious patterns The residuals should be relatively small in size. A valid residual plot should look like the “night sky” with approximately equal amounts of positive and negative residuals. Pattern in residuals Linear model not appropriate
61
Should You Use LSRL? 1. 2.
62
Interpreting Computer Regression Output
Be sure you can locate: the slope, the y intercept and determine the equation of the LSRL.
63
r2: Coefficient of Determination
r 2 tells us how much better the LSRL does at predicting values of y than simply guessing the mean y for each value in the dataset. In this example, r2 equals 60.6%. 60.6% of the variation in pack weight is explained by the linear relationship with bodyweight. (Insert r2)% of the variation in y is explained by the linear relationship with x.
64
Interpret r2 Interpret in a sentence (how much variation is accounted for?) r2 = 0.875, x= hours studied, y= SAT score r2 = 0.523, x= hours slept, y= alertness score
65
Interpret r2 Answers: 87.5% of the variation in SAT score is explained by the linear relationship with the number of hours studied. 52.3% of the variation in alertness score is explained by the linear relationship with the number of hours slept.
66
S: Standard Deviation of the Residuals
1. Identify and interpret the standard deviation of the residual.
67
S: Standard Deviation of the Residuals
Answer: S= 0.740 Interpretation: On average, the model under predicts fat gain by kilograms using the least-squares regression line.
68
S: Standard Deviation of the Residuals
If we use a least-squares regression line to predict the values of a response variable y from an explanatory variable x, the standard deviation of the residuals (s) is given by S represents the typical or average error (residual). Positive = UNDER predicts Negative = OVER predicts
69
Self Check Quiz! The data is a random sample of 10 trains comparing number of cars on the train and fuel consumption in pounds of coal. What is the regression equation? Be sure to define all variables. What is r2 telling you? Define and interpret the slope in context. Does it have a practical interpretation? Define and interpret the y-intercept in context. What is s telling you?
70
1. ŷ = x ŷ = predicted fuel consumption in pounds of coal x = number of rail cars % of the varation is fuel consumption is explained by the linear realtionship with the number of rail cars. 3. Slope = With each additional car, the fuel consuption increased by pounds of coal, on average. This makes practical sense. 4. Y-interpect = When there are no cars attached to the train the fuel consuption is pounds of coal. This has no practical intrepretation beacuse there is always at least one car, the engine. 5. S= On average, the model over predicts fuel consumption by pounds of coal using the least-squares regression line.
71
Extrapolation We can use a regression line to predict the response ŷ for a specific value of the explanatory variable x. The accuracy of the prediction depends on how much the data scatter about the line. Exercise caution in making predictions outside the observed values of x. Extrapolation is the use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate.
72
Outliers and Influential Points
An outlier is an observation that lies outside the overall pattern of the other observations. An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. Points that are outliers in the x direction of a scatterplot are often influential for the least-squares regression line. Note: Not all influential points are outliers, nor are all outliers influential points.
73
Outliers and Influential Points
The left graph is perfectly linear. In the right graph, the last value was changed from (5, 5) to (5, 8)…clearly influential, because it changed the graph significantly. However, the residual is very small.
74
Correlation and Regression Limitations
The distinction between explanatory and response variables is important in regression.
75
Correlation and Regression Limitations
Correlation and regression lines describe only linear relationships. NO!!!
76
Correlation and Regression Limitations
Correlation and least-squares regression lines are not resistant.
77
Correlation and Regression Wisdom
An association between an explanatory variable x and a response variable y, even if it is very strong, is not by itself good evidence that changes in x actually cause changes in y. Association Does Not Imply Causation A serious study once found that people with two cars live longer than people who only own one car. Owning three cars is even better, and so on. There is a substantial positive correlation between number of cars x and length of life y. Why?
78
Answers 27-32, 37,39,47 27. A E D B B D
79
Answer #47
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.