Download presentation
Presentation is loading. Please wait.
1
Scatter Diagrams and Linear Correlation
Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task , grade point average vs. time studying, grade point average vs. time playing video games, amount of smoking vs. rate of lung cancer Scatter diagram: (x,y) data plotted as individual points x – explanatory variable (independent) y – response variable (dependent) Evaluate scatterplot data y vs x values – shows relationship between 2 quantitative variables measured on the same individual
3
Scatter Diagrams and Linear Correlation
Look at overall pattern Any striking deviation (outliers)? Describe by a) form (linear or curved) b) direction - positively associated +slope negatively associated – slope c) strength - how closely do points follow form Examples: age of person vs. time to master cell phone task , grade point average vs. time studying, grade point average vs. time playing video games, amount of smoking vs. rate of lung cancer
4
Degrees of correlation
5
Scatter Diagrams and Linear Correlation
Tips for drawing scatterplot Scale axis: intervals for each axis must be the same; scale can be different for each axis Label both axis Adopt a scale that uses entire grid (do not compress plot into 1 corner of grid
6
Scatter Diagrams and Linear Correlation
Correlation coefficient (r) Assesses strength and direction of linear relationship between x and y. Unit less -1≤ r ≤ r = -1 or 1 perfect correlation (all points exactly on the line) Closer to 1or -1; better line describes relationship; better fit of data r > 0 positive association at x, y r < 0 negative association a x , y x and y are interchangeable in calculating r r does not change if either (or both) variables have unit changes (inches to cm, or F to C)
7
Linear and non-linear correlations
8
Scatter Diagrams and Linear Correlation
r = 1 Σ( x-x y-y_) n sx sy Using TI-83 ex p.129 (number of police vs. muggings) Cautions : Association does not imply causation Lurking variables may play rate r only good for linear models Correlation between averages higher than between individual point.
9
Scatter Diagrams and Linear Correlation
Facts No distinction between x and y variable. The value of r is unaffected by switching x and y Both x and y must be quantitative Only good for linear relationships Not resistant to outliers Correlation or r is not a complete description of 2-variable data, the x and y standard deviations and means should be included HW: p131 2,4,6,8 a,b,c, 10 a,b,c, 12 a,b,c For “c” use calculator to compute r
10
4.2 Least Squares Regression
Method for finding a line (best fit) that summarizes the relationship between 2 variables a x (explanatory) and y (response) Use the line to predict value of y for a given x Must have specific response variable y and explanatory variable x (cannot switch like r)
11
4.2 Least Squares Regression
Least Squares Regression Line (LSRL) Minimizes square of error (y-values) Error = observed –predicted value Σ(y-ŷ)2 (y actual value, ŷ is predicted value) (ŷ is called y hat) Line of y on x that makes the sum of the squares of data points to fitted line as small as possible
12
4.2 Least Squares Regression
LSRL Equation ŷ = a + bx ŷ predicted value of y Slope b = r(sy/sx) y – intercept a = y – bx x and y are means for all x and y data, respectively and are on the LSLR (x, y) sy sx are std. deviations of x,y data r correlation
13
4.2 Least Squares Regression
TI-83 – enter data into L1, L2 (x,y) Use STAT CALC , select #8:LinReg(a+bx) to get the best fit required Slope: important for interpretation of data Rate of change of y for each increase of x Intercept – may not be practically important for problems.
14
4.2 Least Squares Regression
Plot LSLR: using formula ŷ = a + bx find 2 values on the line. (x1, ŷ1) and (x2, ŷ2) make sure x1 and x2 are near opposite ends of the data Influential observations and outliers Influential – extreme in the x-direction if we remove an influential point it will affect the LSLR significantly Outliers – extreme in the y-direction does not significantly change the LSLR
15
Coefficient of Determination
r2 – coefficient of determination r – describes the strength and direction of a straight line relationship r2 - fraction of variation in values of y that is explained by LSRL of y on x r = 1, r2 = 1 perfect correlation 100% of the variation explained by LSRL r = 0.7, r2 = about 49% of y is explained by LSLR
16
Residuals Residuals – difference between observed value and predicted value Residual = y –ŷ Mean of least square residuals = 0 Residual plots – scatterplot of regression residuals against explanatory variable (x) Useful in accessing fit of regression line i.e. do we have a straight line? Linear –uniform scatter Curved indicates relationship not linear Increasing/ decreasing indicates predicting of y will be less accurate for larger x
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.