Presentation is loading. Please wait.

Presentation is loading. Please wait.

VARIABILITY Distributions Measuring dispersion

Similar presentations


Presentation on theme: "VARIABILITY Distributions Measuring dispersion"— Presentation transcript:

1 VARIABILITY Distributions Measuring dispersion
Variance and standard deviation

2 Review: Distribution Case no. Age Height M/F 1 23 68 M 2 22 64 F 3 69 4 25 71 5 27 6 72 7 24 65 8 66 9 10 11 21 12 62 13 14 15 16 56 17 18 70 19 20 26 60 52 31 61 28 29 30 67 Summary statistics mean = 24 mean = 67 %M 39 %F 61 An arrangement of cases according to their score or value on one or more variables Categorical variable Continuous variable

3 Dispersion How do cases “disperse” (arrange themselves) around the mean?
officers

4 Three statistics that measure dispersion Measure how cases “disperse” (arrange themselves) around the mean Average deviation  (x - ) n Average distance between the mean and the values (scores) for each case Uses absolute distances (no + or -) Affected by extreme scores We’ll never use it in class Variance (s2): A sample’s cumulative dispersion  (x - ) n  we always use n-1 (our sample sizes are always small) Standard deviation (s): A standardized form of variance, comparable between samples  (x - ) n  we always use n-1 (our sample sizes are always small) Square root of the variance Expresses dispersion in units of equal size for that particular distribution Less affected by extreme scores Mean 2.3 officers

5 This is not an acceptable graph – it’s only to illustrate dispersion
Variability exercise Sample 1 (n=10) Officer Score Mean Diff. Sq. ____________________________________________________ Sum Variance (sum of squares / n-1) s2 .99 Standard deviation (sq. root of variance) s .99 Random sample of patrol officers, each scored 1-5 on a cynicism scale This is not an acceptable graph – it’s only to illustrate dispersion

6 Sample 2 (n=10) Officer Score Mean Diff. Sq ___ ___ ___ 2 1 ___ ___ ___ 3 1 ___ ___ ___ 4 2 ___ ___ ___ 5 3 ___ ___ ___ 6 3 ___ ___ ___ 7 3 ___ ___ ___ 8 3 ___ ___ ___ 9 4 ___ ___ ___ 10 2 ___ ___ ___ Sum ____ Variance s2 ____ Standard deviation s ____ Another random sample of patrol officers, each scored 1-5 on a cynicism scale Compute ...

7 Two random samples of patrol officers, each scored 1-5 on a cynicism scale
Sample 1 (n=10) Officer Score Mean Diff. Sq. Sum Variance (sum of squares / n-1) s2 .99 Standard deviation (sq. root of variance) s .99 Sample 2 (n=10) Officer Score Mean Diff. Sq. Sum Variance (sum of squares / n-1) s Standard deviation (sq. root of variance) s .97 These are not acceptable graphs – they’re only used here to illustrate how the scores disperse around the mean

8 VARIABILITY Shape of distributions Flat, peaked, normal

9 “Flat” distributions Mean A poor 3.65 descriptor
Dispersion (aka, “variability”): How scores or values arrange themselves around the mean When scores are more dispersed (i.e., “variability” is greater) a distribution’s shape gets flatter Greater distance between most scores and the mean Many scores are at a considerable distance from the mean The mean loses value as a “summary statistic” Arrests Mean A poor  descriptor

10 “Peaked” and “normal” distributions
Dispersion (aka, “variability”): How scores or values arrange themselves around the mean Peaked: If most scores cluster about a certain value the shape of the distribution is called “peaked” Normal: If the clustering of scores is around the mean the distribution is called “normal” In social science research it turns out that scores or values for many variables are normally or near-normally distributed This allows use of the mean to describe the underlying datasets That’s why means are called a “summary statistic” - they can “summarize” the values of samples or populations Arrests Mean Not a good  descriptor Peaked distribution (but not “normal”) Arrests Mean A good  descriptor Peaked and “normal” distribution

11 Characteristics of normal distributions
Unimodal and symmetrical: shapes on both sides of the mean are identical 68.26 percent of the area “under” the curve – meaning percent of the cases – falls within one “standard deviation” (+/ ) from the mean The fact that a distribution is “normal” or “near-normal” does NOT imply that the mean is of any particular value. All it implies is that scores distribute themselves around the mean “normally”. Means depend on the data. In this distribution the mean could be any value. By definition, the standard deviation score that corresponds with the mean of a normal distribution - whatever the mean might be - is zero. ( = 0) Mean (whatever it is) Standard deviation (always 0 at the mean)

12 How well do means represent (summarize) a sample?
If variable “no. of tickets” was “normally” distributed most cases would fall inside a bell-shaped curve. Here they don’t. Number of tickets Frequency B D F H K A C E G I J L M -1 SD mean SD 13 officers scored on numbers of tickets written in one week In a normal distribution about 66% of cases would fall within 1 SD of the mean. 13 X .66 = 9 cases But here only 7 cases (Officers D-J) do, while nearly as many (6) don’t. Scores are very dispersed, making the distribution mostly flat. So here the mean is NOT a good shortcut for describing how officers performed. Officer A: 1 ticket Officers B & C: 2 tickets each Officers D & E: 3 tickets each Officers F & G: 4 tickets each Officers H & I: 5 tickets each Officer J: 6 tickets Officers K & L: 7 tickets each Officer M: 9 tickets Mean = SD = 2.33

13 13 officers scored on numbers of tickets written in one week
Here, 9 of 13 cases (officers C-K) do fall within 1 SD of the mean. The distribution is near-normal because most officers wrote close to the same number of tickets. The cases “cluster” around the mean. So, for this sample the mean is a decent summary statistic - a good shortcut for describing officer performance D G E H J A B C F I K L M -1 SD mean SD Number of tickets Frequency Here most cases do fall inside the bell-shaped curve. Variable “no. of tickets” seems near-normally distributed Officer A: 1 ticket Officer B: 2 tickets Officer C: 3 tickets Officers D, E, F: 4 tickets each Officers G, H, I: 5 tickets each Officers J & K: 6 tickets each Officer L: 7 tickets Officer M: 9 tickets Mean = SD = 2.1

14 Going beyond description…
When variables are normally or near-normally distributed, the mean, variance and standard deviation can help describe datasets But they are also useful in explaining why things change; that is, in testing hypotheses You want to test the hypothesis that college-educated cops are more effective: college  greater effectiveness Independent variable: college (Y/N) Dependent variable: effectiveness (scale 1-5) You go to the XYZ police dept., draw two samples of patrol officers - one of college grads, the other of non-college grads - and test each officer for effectiveness. On a scale of 1 (ineffective) to 5 (highly effective) this is how they scored: 10 college grads (mean 3.7) 10 non-college (mean 2.8) The difference between means is in the hypothesized direction. But does that “prove” that college grads are more effective? To determine whether the difference in means is “statistically significant,” meaning large enough to prove the value of education, we need to know each sample’s variance. Don’t worry - we’ll cover this later! Are college-educated cops more effective? College grads Non-college grads

15 Exam information You must bring a regular, non-scientific calculator with no functions beyond a square root key. You will be asked to apply concepts including research question, hypothesis and variables to the “college education and police job performance" article. You will be given data and asked to create graph(s) depicting the distribution of a single variable. You will compute basic statistics, including mean, median, mode and standard deviation. All computations must be shown on the answer sheet. You will be given the formula for variance (s2). You must use and display the procedure described in the slides and practiced in class for manually calculating variance (s2) and its square root, known as standard deviation (s). This is a relatively brief exam. You will have one hour to complete it. We will then take a break and move on to the next topic.


Download ppt "VARIABILITY Distributions Measuring dispersion"

Similar presentations


Ads by Google