1.4 Data in 2 Variables Definitions
5.3 Data in 2 Variables: Visualizing Trends When data is collected over long period of time, it may show trends Trends allow you to make predictions about future events Trends can be over time, or over change in some other variable (e.g. mass) One effective way to visualize: scatterplot –Shows joint distribution of 2 variables
Scatterplots Independent variable Variable whose values are arbitrarily chosen Dependent variable Variable whose values depend on independent variable
Scatterplots can help determine if there is a relationship in the data –Is there a pattern in the data? there is a relationship As x increases, y increases there is no relationship As x increases, y stays pretty much the same
We can show the relationship using a line of best fit
If the data is nonlinear, we use a curve of best fit
Correlation Measure of the strength of the apparent relationship between two variables Look at upward/downward/horizontal trend –Positive/Negative/No correlation Look at how closely the points fit the curve of best fit –Strong/Moderate/Weak correlation Note: trend and fit are unrelated
Classifying Linear Relationships Strong positive correlation Positive slope Tightly clustered to line of best fit
Classifying Linear Relationships Strong negative correlation Negative slope Tightly clustered to line of best fit
Classifying Linear Relationships No correlation 0 slope Randomly scattered
Classifying Linear Relationships Moderate positive correlation
Classifying Linear Relationships Weak positive correlation
Classifying Linear Relationships Weak negative correlation
Warning!!! Correlation does not necessarily mean causation Just because there is a relationship between A and B does not mean A causes B –More on this next day
Using trends for predictions Use the equation of the line of best fit Extrapolation –Estimation of a value outside known data set Interpolation –Estimation of a value between two known values
y = mx + b Extrapolation Interpolation
Go to “Go For the Gold!”
Go for the Gold! Line of Best Fit: Men Mensdistance = Year – Sum of squares = Slope is 0.016: change in distance over time (in years) –Every year, the distance should increase by 1.6 cm Y-intercept is –24.04 –In year zero, they jumped backwards!? –meaningless
Go for the Gold! Line of Best Fit: Women Womensdistance = Year –35 Sum of squares = Slope is 0.021; y-int is -35 –Every year, the winning women’s distance should increase by 2.1 cm. –Y-intercept is meaningless for this case In 2008, winning men’s distance should be 8.85 m and the women’s distance should be 7.17 m (actual distances 8.34 m and 7.04 m) In 2012?