Examining Bivariate Data Unit 3 – Statistics
Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent Variable –Attempts to explain the observed outcomes Scatterplot –Shows the relationship between two quantitative variables measured on the same individuals
Scatterplots Examining –Look for overall pattern and any deviations –Describe pattern with form, strength, and direction Drawing –Uniformly scale the vertical and horizontal axes –Label both axes –Adopt a scale that uses the entire available grid
Categorical Variables –Add a different color/shape to distinguish between categorical variables –Example: Make a scatterplot of the following data: GPA SAT GENDERMFFFMMFM
Algebra Review Equation of a line –y = a + bx –Graph intercept first, then use slope to find other points Y-intercept –Value when x = 0 Slope –Change in y divided by change in x
Correlation Measures the direction and strength of the linear relationship between two quantitative variables Is basically the average of the z-scores for every point
Facts About Correlation Makes no distinction between explanatory and response variables Requires both variables be quantitative Sign of r matches the sign of the slope (pos or neg) r is inclusive from -1 to 1 where -1 and 1 are perfect and 0 is no correlation Only measures strength of linear relationships Is not resistant Will find on the calc by doing LinReg
Correlation does not imply causation.
Regression Least Squares Regression Line of y on x –Makes the sum of the squares of the vertical distances of the data points from the line as small as possible –Line should be as close as possible to the points in the vertical direction
Facts about Least-Squares Regression Distinguishes between explanatory and response variables LSRL always passes through the point Find the equation of the line passing through the point (3, 4) with a slope of -2.
LSRL Equation of the LSRL SlopeIntercept
Coefficient of determination – r 2 The fraction or percent of the variation in the values of y that is explained by the least-squares regression of y on x Measures the contribution of x in predicting y
Residuals observed y – predicted y or Positive values show that data point lies above the LSRL and negative are below The sum of residuals is always zero (just like deviations!)
Residual Plots A scatterplot of the regression residuals against the explanatory variable Helps us assess the fit of a regression line Want a random pattern Watch for individual points with large residuals or that are extreme in the x direction
Outliers vs. Influential Observations Outlier –An observation that lies outside the overall pattern of the other observations Influential observation –Removing this point would markedly change the result of the calculation