VARIABILITY. Case no.AgeHeightM/F 12368M 22264F 32369F 42571M 52764F 62272M 72465F 82366M 92366F 102568F 112168M 122162F 132471M 142766F 152162F 162556F.

VARIABILITY

Case no.AgeHeightM/F 12368M 22264F 32369F 42571M 52764F 62272M 72465F 82366M 92366F 102568F 112168M 122162F 132471M 142766F 152162F 162556F 172271M 182270M 192566F 202660F 21 52F 223170F 232471M 243161F 252372M 262771F 272571M 282664F 292266F 302969M 312467F Summary statistics mean = 24mean = 67 %M 39 %F 61 Review: Distribution An arrangement of cases according to their score or value on one or more variables Categorical variable Continuous variable

Dispersion and the mean Dispersion: How scores or values arrange themselves around the mean If most scores cluster about the mean the shape of the distribution is peaked – This is the so-called “normal” distribution – In social science the scores or values for many variables are normally or near-normally distributed – This allows use of the mean to describe the dataset (that’s why it’s called a “summary statistic”) When scores are more dispersed a distribution’s shape is flatter – Distance between most scores and the mean is greater – Many scores are at a considerable distance from the mean – The mean loses value as a summary statistic Normal distribution “Flat” distribution Mean A good 3.0  descriptor Arrests TT Mean A poor 3.65  descriptor

Normal distributions Characteristics: – Unimodal and symmetrical: shapes on both sides of the mean are identical – 68.26 percent of the area “under” the curve – meaning 68.26 percent of the cases – falls within one “standard deviation” (+/- 1 ) from the mean – NOTE: The fact that a distribution is “normal” or “near-normal” does NOT imply that the mean is of any particular value. All it implies is that scores distribute themselves around the mean “normally”. Means depend on the data. In this distribution the mean could be any value. By definition, the standard deviation score that corresponds with the mean of a normal distribution - whatever that score might be - is zero. Mean (whatever it is) Standard deviation (always 0 at the mean)

Measuring dispersion Average deviation  (x - ) ----------- n – Average distance between the mean and the values (scores) for each case – Uses absolute distances (no + or -) – Affected by extreme scores Variance (s 2 ): A sample’s cumulative dispersion  (x - ) 2 ----------- n  use n-1 for small samples Standard deviation (s): A standardized form of variance, comparable between samples  (x - ) 2 ----------- n  use n-1 for small samples – Square root of the variance – Expresses dispersion in units of equal size for that particular distribution – Less affected by extreme scores

Number of tickets Frequency B D F H K A C E G I J L M 2.13 4.46 6.79 -1 SD mean +1 SD How well do means represent (summarize) a sample? Mean = 4.46 SD = 2.33 13 officers scored on numbers of tickets written in one week Officer A: 1 ticket Officers B & C: 2 tickets each Officers D & E: 3 tickets each Officers F & G: 4 tickets each Officers H & I: 5 tickets each Officer J: 6 tickets Officers K & L: 7 tickets each Officer M: 9 tickets In a normal distribution about 66% of cases fall within 1 SD of the mean..66 X 13 cases = 9 cases But here only 7 cases (Officers D-J) fall within 1 SD of the mean. Six officers wrote very few or very many tickets, making the distribution considerably more dispersed than “normal.” So…for this sample, the mean does NOT seem to be a good summary statistic. It is NOT a good shortcut for describing how officers in this sample performed. If variable “no. of tickets” was “normally” distributed most cases would fall inside the bell- shaped curve. Here they don’t.

Mean = 4.69 SD = 2.1 In a normal distribution 66 percent of the cases fall within 1 SD of the mean.66 X 13 = 8.58 = 9 cases Here, 9 of the 13 cases (officers C-K) do fall within 1 SD of the mean. The distribution is normal because most officers wrote close to the same number of tickets, so the cases “clustered” around the mean. So, for this sample the mean is a good summary statistic - a good shortcut for describing officer performance D G E H J A B C F I K L M 2.59 4.69 6.79 -1 SD mean +1 SD Number of tickets Frequency 13 officers scored on numbers of tickets written in one week Officer A: 1 ticket Officer B: 2 tickets Officer C: 3 tickets Officers D, E, F: 4 tickets each Officers G, H, I: 5 tickets each Officers J & K: 6 tickets each Officer L: 7 tickets Officer M: 9 tickets If variable “no. of tickets” was “normally” distributed most cases would fall inside the bell- shaped curve. Here they do!

Going beyond description… As we’ve seen, when variables are normally or near- normally distributed, the mean, variance and standard deviation can help describe datasets But they are also useful in explaining why things change; that is, in testing hypotheses For example, assume that patrol officers in the XYZ police dept. were tested for effectiveness, and that on a scale of 1 (least eff.) to 5 (most eff.) their mean score was 3.2, distributed about normally You want to use XYTZ P.D. to test the hypothesis that college-educated cops are more effective: college  greater effectiveness – Independent variable: college (Y/N) – Dependent variable: effectiveness (scale 1-5) You draw two officer samples (we’ll cover this later in the term) and compare their mean effectiveness scores – 10 college grads (mean 3.7) – 10 non-college (mean 2.8) On its face, the difference between means is in the hypothesized direction: college grads seem more effective. But that’s not the end of it. Each group’s variance would then be used to determine whether the difference in scores is “statistically significant.” Don’t worry - we’ll cover this later! College grads Non-college grads Are college- educated cops more effective?

Sample 1 (n=10) OfficerScoreMeanDiff.Sq. 132.9.1.01 232.9.1.01 332.9.1.01 432.9.1.01 532.9.1.01 632.9.1.01 732.9.1.01 812.9 -1.9 3.61 922.9 -.9.81 1052.9 2.1 4.41 ____________________________________________________ Sum 8.90 Variance (sum of squares / n-1) s 2.99 Standard deviation (sq. root of variance) s.99 Variability exercise Random sample of patrol officers, each scored 1-5 on a cynicism scale This is not an acceptable graph – it’s only to illustrate dispersion

Sample 2 (n=10) OfficerScoreMeanDiff.Sq. 12 ______ ___ 21_________ 31___ ______ 42______ ___ 53______ ___ 63______ ___ 73______ ___ 83______ ___ 94___ ______ 102______ ___ Sum ____ Variance s 2 ____ Standard deviation s ____ Another random sample of patrol officers, each scored 1-5 on a cynicism scale Compute...

Sample 2 (n=10) OfficerScoreMeanDiff.Sq. 12 2.4 -.4.16 212.4 -1.4 1.96 312.4 -1.4 1.96 422.4 -.4.16 532.4.6.36 632.4.6.36 732.4.6.36 832.4.6.36 942.4 1.6 2.56 1022.4 -.4.16 Sum 8.40 Variance (sum of squares / n-1) s 2.93 Standard deviation (sq. root of variance) s.97 Sample 1 (n=10) OfficerScoreMeanDiff.Sq. 132.9.1.01 232.9.1.01 332.9.1.01 432.9.1.01 532.9.1.01 632.9.1.01 732.9.1.01 812.9 -1.9 3.61 922.9 -.9.81 1052.9 2.1 4.41 Sum 8.90 Variance (sum of squares / n-1) s 2.99 Standard deviation (sq. root of variance) s.99 Two random samples of patrol officers, each scored 1-5 on a cynicism scale These are not acceptable graphs – they’re only used here to illustrate how the scores disperse around the mean

z-score (a “standard” score) If the distribution of a variable (e.g., number of arrests) is approximately normal, we can estimate where any score would fall in relation to the mean. We first convert the sample score into a z-score using the sample standard deviation z-scores -3 -2 -1 0 +1 +2 +3

We then look up the z-score in a table. It gives the proportion of cases in the distribution… – Between a case and the mean – Beyond the case, away from the mean (left for negative z’s, right for positive z’s) Z-scores can be used to identify the percentile bracket into which a case falls (e.g., bottom ten percent) Since z-scores are standardized like percentages, they can be used to compare samples The z-table indicates the proportion of the area under the curve (the proportion of scores) between the mean and any z score, and the proportion of the area beyond that score (to the left or right)z-table In a normal distribution 95 percent of all z-scores falls between +/- 1.96 In a normal distribution 5 present of all z-scores fall beyond +/- 1.96 Rare/unusual cases Proportion of area “under the curve” where cases lie.025.475.475.025 100 percent of cases 95 percent of cases 2½ pct. -1.96+1.96

Variability exercise Sample of twenty officers drawn from the Anywhere police department, each measured for number of arrests 0 1 2 3 4 5 6 Arrests Frequency 1 2 3 4 5 6 Unit of analysis: officers Case: one officer Variable: number of arrests Number of arrests is presumably normally distributed in the population of officers, meaning the whole police department. That is, most officers make about the same number of arrests; a few make less, and a few make more.

Officer#ArrestsMeanDiff.Diff. SquaredZ-score 12 24 35 43 51 63 72 8 (Jay)0 93 104 115 123 132 141 154 166 173 184 192 203 Sum of squared differences Variance (sum of squares/n-1) Standard deviation (sq root var) Assignment 1.Compute the sample standard deviation 2.Obtain the z-score for 0, 1, 2, 3, 4, 5 and 6 arrests (x -  x) z = -------- s NOTE: There are only seven values: 0, 1, 2, 3, 4, 5, 6. Only need to compute their statistics once.

Ofcr#ArrMeanDiff.Diff. Sq 1231 24311 35324 43300 513-24 63300 7231 8 (Jay)03-39 93300 104311 115324 123300 13231 1413-24 154311 166339 173300 184311 19231 203300 Sum of squared differences 42 Variance (sum of squares/n-1) 2.21 Standard Deviation (sq. root) 1.49

z-score -2 -1 0 +1 +2 No. of officers 1 2 3 4 5 6 No. of arrests 0 1 2 3 4 5 6 arrestscalculatezProp. between mean and zProp. beyond z 0 (Jay)0-3/1.49-2.0148% (.4778)2% (.0222) 11-3/1.49-1.3441% (.4099)9% (.0901) 22-3/1.49-.6725% (.2486)25% (.2514) 33-3/1.490050% (.50) 44-3/1.49+.6725% (.2486)25% (.2514) 55-3/1.49+1.3441% (.4099)9% (.0901) 6 (Dudley)6-3/1.49+2.0148% (.4778)2% (.0222) Jay’s score falls in the bottom two percent of a normal distribution Dudley’s score falls in the top two percent of a normal distribution

Exam information You must bring a regular, non-scientific calculator with no functions beyond a square root key and a z-table. You need to understand the concept of a distribution. You will be given data and asked to create graph(s) depicting the distribution of a single variable. You will compute basic statistics, including mean, median, mode, standard deviation and z-score. All computations must be shown on the answer sheet. You will be given the formulas for variance (s 2 ) and z. You must use and display the procedure described in the slides and practiced in class for manually calculating variance (s 2 ) and standard deviation (s). You will use the z-table to calculate where cases from a given sample would fall in a normal distribution. This is a relatively brief exam. You will have one hour to complete it. We will then take a break and move on to the next topic.

VARIABILITY. Case no.AgeHeightM/F 12368M 22264F 32369F 42571M 52764F 62272M 72465F 82366M 92366F 102568F 112168M 122162F 132471M 142766F 152162F 162556F.

Similar presentations

Presentation on theme: "VARIABILITY. Case no.AgeHeightM/F 12368M 22264F 32369F 42571M 52764F 62272M 72465F 82366M 92366F 102568F 112168M 122162F 132471M 142766F 152162F 162556F."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

VARIABILITY. Case no.AgeHeightM/F 12368M 22264F 32369F 42571M 52764F 62272M 72465F 82366M 92366F 102568F 112168M 122162F 132471M 142766F 152162F 162556F.

Similar presentations

Presentation on theme: "VARIABILITY. Case no.AgeHeightM/F 12368M 22264F 32369F 42571M 52764F 62272M 72465F 82366M 92366F 102568F 112168M 122162F 132471M 142766F 152162F 162556F."— Presentation transcript:

Similar presentations

About project

Feedback