Since When is it Standard to Be Deviant?
Passing the “So What?” Test Mean Age = 32.4 (So What?) Ever meet the average person? Measures of dispersion provide an idea of what the mean may mean for the rest of us.
Calculating the Range Age 12 17 20 30 31 34 36 42 60 60 – 12 = 48
At Home with a Range One of the easiest calculations in statistics. Maximum – Minimum Very close to useless No very helpful in getting a good sense of dispersion Far too general to set the groundwork for more advanced statistics
Variance: Please Disperse! To properly make sense of the degree of dispersion of data, we need to be able to take the average “distance” of each data point from the measure of central tendency. The variance (s2) is almost exactly that! s2 = S(X – M)2/n – 1 Let’s examine this in theory a bit
The Centroid and Data Points
Practical Experience: Calculating Variance by Hand Part 1: Sum of Squares 10 12 13 15 18 20 22 25 M 16.875 X - M (X – M)2 -6.875 47.2656 Note: S(X-M)2 is also referred to as the Sum of Squares (SS) -4.875 23.7656 -3.875 15.0156 3.5156 -1.875 1.125 1.2656 3.125 9.7656 5.125 26.2656 8.125 66.0156 S = 135 S = 192.8748 S = 0.00
Calculating Variance Part 2: Plugging in the Numbers Remember the equation: s2 = S(X – M)2/n – 1 (0r s2 = SS/n – 1) s2 = 192.8748/8-1 s2 = 192.8748/7 s2 = 27.5535
Why n-1? Remember, this is a sample, not a population. The true variance of the population is probably giong to be a bit larger than that of the sample we’ve collected. By dividing by a slightly smaller number (n-1), we raise the variance to more accurately reflect the population.
So What? Now we know that the variance of the scores on this inventory is 27.5535. What does this tell us? Does the number 27.5535 have much of a connection to the original data set? Yes, but sadly, not intuitively. If you have a bit of experience with variances, you could tell that it’s a moderate to large variance for that data set. If not, the number is simply too big to be very useful.
The Standard Deviation in One Easy Step Is a crucial part of most statistics Provides us with a sense of dispersion in a number that’s easier to understand. SD (or s) = √s2 s = √27.5535 s = 5.25 No, Seriously!
Back to So What? Now we know that the average change from score to score from the central tendency is a score of 5.25 Try this:
Correlation Does Not Imply Cause The degree to which to phenomena co-occur We do this all the time intuitively Tooth decay and getting kicked out of 3rd grade health class Not Useless Provides ideas Can lead to tentative conclusions that might be tested in more rigorous settings Useful when experimental research is not feasible or unethical
Pearson's Product Moment Correlation Galton the mathematically illiterate -1.0 0.0 +1.0 Magnitude (0.03 vs 0.98) Valence (-0.68 vs 0.68) Singularity (+1.0 or -1.0) Assumptions Normality Linearity Interval level data (usually)
Correlation Matrix A survey of 234 nursing students completed a survey indicating (among other things) their approach to learning and the nature of their control beliefs. From Cantwell, R.H. (1997). Cognitive dispositions and student nurses' appraisals of their learning environment. Issues in Educational Research, 7(1), 1997, 19-36
Pearson's Product Moment Correlation Computational Equation This is the “official” equation for the Pearson’s r. It’s not as unwieldy as it looks, but it can be simplified with a simple algebraic manipulation to this one…
Let's Compute a Correlation! Though not the only way to hand-calculate Pearson’s r, it’s the easiest equation available Note that denominator (AKA the covariate) is always going to be a positive number, whereas the numerator may be either positive or negative. This allows the numerator to dictate the valence of the coefficient.
IQ (X) 110 112 118 119 122 125 127 130 132 134 136 138 GPA (Y) 1.0 1.6 1.2 2.1 2.6 1.8 2.0 3.2 3.0 3.6 X2 Y2 XY 12,100 1.00 110.0 12,544 179.2 2.56 13,924 1.44 141.6 14,161 249.9 4.41 14,884 6.76 317.2 15,625 3.24 225.0 16,129 6.76 330.2 16,900 4.00 260.0 17,424 10.24 422.4 17,956 6.76 348.4 18,496 9.00 408.0 19,044 12.96 496.8 1503 27.3 189,187 69.13 3488.7
Now For Some Calculations
Coefficient of Determination Degree to which the variance on one variable is explained by the other. r2 0.8552 = 0.732 73% of GPA can be accounted for by IQ OR 73% of IQ can be accounted for by GPA Remember, correlation does not imply cause
Getting Our Hands Dirty Import the provided data set and calculate the range, variance and standard deviation for each variable Construct a correlation matrix using all variables and interpret the results Calculate the coefficient of determination for at least three of the correlation coefficients in the matrix
Which of the following describes a negative correlation (AKA an inverse relationship)? When high scores on one variable tend to be associated with high scores on another variable When high scores on one variable tend to be associated with low scores on another variable When scores on one variable appear to be completely unrelated to the sores on another variable. Both b and c are correct When high scores on one variable tend to be associated with low scores on another variable
Standardized distribution Which of the following terms best describes the average of squared deviations from the mean? Standard deviation Variance Mode Standardized distribution Variance
Questions? Thoughts?