Download presentation
1
Two-Variable Data Analysis
AP Statistics Two-Variable Data Analysis
2
Key Ideas Scatterplots Lines of Best Fit The Correlation Coefficient
Least Squares Regression Line Coefficient of Determination Residuals Outliers and Influential Points Transformations to Achieve Linearity
3
Two-Variable Data When looking at two-variable data, we are primarily interested in whether or not two variables have a linear relationship and how changes in one variable can predict changes in the other variable. Two-variable data is also called bivariate data.
4
Response, Explanatory A response variable measures an outcome of a study. An explanatory variable helps explain or influences changes in a response variable. Exercises: page 173 problems 3.1 – 3.4
5
Example STUDENT HOURS STUDIED SCORE ON EXAM A 0.5 65 B 2.5 80 C 3.0 77
1.5 60 E 1.25 68 F 0.75 70 G 4.0 83 H 2.25 85 I J 6.0 96 K 3.25 84 L M 0.0 51 N 1.75 63 O 2.0 71
6
Example, cont. A teacher wanted to know if additional studying resulted in higher grades. In other words, does studying have an effect on test performance? Draw a scatterplot – putting one variable on the horizontal axis and the other on the vertical axis. If we have an explanatory variable, it should go on the horizontal axis and the response variable on the vertical axis.
7
Interpreting a Scatterplot
Direction, form and strength of the relationship Striking deviations Outliers
8
Association? If the variable on the vertical axis tends to increase as the variable on the horizontal axis increases, we say that the two variables are positively associated. If one of them decreases as the other increases, we way they are negatively associated.
9
Calculator Tip To draw a scatterplot on your calculator, enter the data in two lists (L1 and L2). Go to STAT PLOT and choose the scatterplot icon. Enter L1 for Xlist and L2 for Ylist. Do ZOOM: ZoomStat. See Technology Toolbox on page 183
10
Exercises Page 179, problems 3.5 – 3.10
11
Correlation We are primarily interested in determining the extent to which two variables are linearly associated. The first statistic we have to determine a linear relationship is the Pearson product moment correlation, or more simply, the correlation coefficient, denoted by the letter r. The correlation coefficient is a measure of the strength of the linear relationship between two variables as well as an indicator of the direction of the linear relationship.
12
Formula If we have a sample of size n of paired data, say (x,y), and assuming that we have computed summary statistics for x and y (means and standard deviations), the correlation coefficient r is defined as follows: Find r for the previous example.
13
Properties of r If r is positive, it indicates that the variables are positively associated. If r is negative, the variables are negatively associated. If r = 0, it indicates that there is no linear association that would allow us to predict y from x. It doesn’t mean that there is no relationship – just not a linear one. It doesn’t matter which variable you call x and which one you call y. r doesn’t depend on units of measurements. r is not resistant to extreme values because it is based on the mean.
14
Guidelines There are no hard and fast rules about how strong a relationship is based on the numerical value of r. VALUE OF r STRENGTH OF RELATIONSHIP -1 < r < -0.8 0.8 < r < 1 strong -0.8 < r < -0.5 0.5 < r < 0.8 moderate -0.5 < r < 0.5 weak
15
Calculator Tip You will first need to turn “Diagnostic On”. You can find this in CATALOG. Choose it and press ENTER twice. Enter values in L1 and L2. STAT: CALC: LinReg(a + bx) then ENTER Enter L1, L2 and press ENTER
16
Correlation and Causation
Association does not imply causation! Just because two things seem to go together does not mean that one caused the other. Some third variable, called a lurking variable, may be influencing them both.
17
YEAR Number of Methodist Ministers in New England Number of barrels of Cuban rum imported to Boston 1860 63 8376 1865 48 6406 1870 53 7005 1875 64 8486 1880 72 9595 1885 80 10,643 1890 85 11,265 1895 76 10.071 1900 10,547 1905 83 11,008 1910 105 13,885 1915 140 18,559 1920 175 23,024 1925 183 24,185 1930 192 25,434 1935 221 29.238
18
Line of Best Fit Once we have determined that two variables have a strong linear relationship, we can find a line of best fit so that we can predict values. This is called linear regression. In this situation, it matters which variable we call x and which one we call y. The line we are looking for is called the least squares regression line.
19
Example STUDENT HOURS STUDIED SCORE ON EXAM A 0.5 65 B 2.5 80 C 3.0 77
1.5 60 E 1.25 68 F 0.75 70 G 4.0 83 H 2.25 85 I J 6.0 96 K 3.25 84 L M 0.0 51 N 1.75 63 O 2.0 71
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.