Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control
Problem 7: Dispersion Prepare 2 line graphs, one for males and one for females using the data presented below. Put both line graphs on the same axes.
Problem 7: Dispersion Attitudes on Race Relations MalesFemales XfXf
Problem 7: Dispersion
How can we quantify the difference between the men and the women in this problem. Compute the mean (average) for the men. Compute the mean (average) for the women.
Problem 7: Dispersion What are the highest and lowest scores for the men? What are the highest and lowest scores for the women? Count the number of scores from lowest to highest. This number is called the Range of the scores. In this case the Range doesn’t help us describe the difference between the males and the females. We need better measures of dispersion.
Problem 8: Dispersion For the following data: What is the highest and lowest score? What is the Range? (count the number of scores from the lowest to the highest.) What is the Mean (average)? How far is each person from the Mean? (Fill in the column. Always subtract the mean from the score. )
Problem 8: Dispersion Data Table SubjectScore X Distance from Mean x = (Score – Mean) Squared Distance from Mean Fred0 George1 Harry2 Jerry4 Larry5 Jennifer6 Jan7 Joan8 Jessica8 Juana9 N =Total = Mean = Total deviation =Sum Squares =
Problem 8: Dispersion Compute the “Sum of Squared Deviations from the Mean” (SS) for this data set (or sample or whatever you call it). Compute the variance of the sample. Compute the standard deviation of the sample.
Dispersion Definitions The range is the number of scores from the smallest to the largest. Deviation Score = Score – Mean – Always subtract the mean from the score – Always preserve the sign (positive or negative) – The total of the deviation scores is always zero Sum Squares = Total of the squared deviation scores. (SS) Variance = SS/N Standard Deviation = square root of variance
Standard Deviation Surely there is an easier way to measure dispersion than using all this squaring and square rooting. Turns out, the standard deviation is the exact point on a normal curve where the second derivative is zero. If you were skiing down the slope, it would get steeper and steeper then it would start to flatten out. That point is the standard deviation. That’s why it is the preferred measure of dispersion.
Standard Deviation
Problem 9 Given the following collection of scores: 2, 3, 5, 6, 6, 8 – Calculate the range of the scores – Calculate the sum of squares – Calculate the variance – Calculate the standard deviation
Problem 9 Data Table SubjectXDeviation score (x)x2x2 Fran2 Frank3 Frangelica5 Fonz6 Frieda6 Fabiano8 N =Total = Mean = SS =
Normal distributions e = … The base of the natural logarithm π = pi = … Normal—or Gaussian—distributions are a family of symmetrical, bell- shaped density curves defined by a mean (mu) and a standard deviation (sigma): N ( ). xx
A family of density curves Here the means are different ( = 10, 15, and 20) while the standard deviations are the same ( = 3). Here the means are the same ( = 15) while the standard deviations are different ( = 2, 4, and 6).
mean µ = 64.5 standard deviation = 2.5 N(µ, ) = N(64.5, 2.5) All Normal curves N ) share the same properties Reminder: µ (mu) is the mean of the idealized curve, while is the mean of a sample. σ (sigma) is the standard deviation of the idealized curve, while s is the s.d. of a sample. About 68% of all observations are within 1 standard deviation ( of the mean ( ). About 95% of all observations are within 2 of the mean . Almost all (99.7%) observations are within 3 of the mean. Inflection point
Definitions: Statistical Symbols In an actual sample – Scores are represented by – Mean = – Deviation Score – Standard Deviation = s – Variance = s 2 In a theoretical distribution (density curve) – Mean = μ – Standard Deviation = σ – Variance = σ 2