Chapter 4 Describing Data
What is Average? Central tendency Effect of Outliners Mean Median Mode A value that is much higher or much lower than almost all other values
Mean, Median, Mode Measure Common? Exist? Takes every values into account? Affect by outliners? Advantages Mean Most familiar “average” Always Yes Commonly understood; Statistically used Median Common No (aside from counting) No Good if there is outliner Mode Sometimes used 0, 1 or more no For nominal level
Confusion about “Average” Example 5 – Which mean? All 100 first-year students at a small college take three courses. 2 courses are taught in large lectures, with 100 students in a single class. The third course is taught in 10 classes of 10 students each. Students and administrators get into an argument about whether classes are too large. The students claim that the mean size of their core courses is 70. The administrators claim that the mean class size is only 25. ???
Example 5 – Which mean? (cont.) Students Mean size of the classes in which each student is personally enrolled Each student is taking 2 classes with enrolment of 100 each and one class with an enrolment of 10, so the mean size of each student’s classes is:
Example 5 – Which mean? (cont.) Administrators Mean enrollment in all classes There are 2 classes with 100 students each and 10 classes with 10 students each, thus a total of 300 students in 12 classes:
Weighted Mean Accounts for variations in the relative importance of data values. Each data value is assigned a weight, and the weighted mean is:
Weighted mean (cont.) Example 6 GPA Example 7 Stock Voting Stockholder Shared owned Vote A 225 Y B 170 C 275 D 500 N E 90
Exercise Ex29-40 pp155-156
Shapes of Distribution Number of modes Single-peaked (unimodal), bimodal, trimodal Symmetry or skewness Symmetric – mirror image Left-skew Mean<Median Right-skew Mean>Median Variation Spread of data
Exercise Ex15-23 p163
Measures of Variation Variation, spread, deviation How data differ? Quantitative measurement
Measures of Variation (cont.) Range Difference between its highest and lowest data values Affect by extreme values Use 2 data values only and ignore other variation within the data set
Measures of Variation (cont.) Quartiles and the Five-Number Summary Quartiles First(25%), second(50%), third(75%) quartiles Five-Number Summary Lowest value Lower quartile (first quartile) Median (second quartile) Upper quartile (third quartile) Highest value
Measures of Variation (cont.) Percentile Dividing a data set into 100 parts The nth percentile of a data set has at least n% data lies below it
Measures of Variation (cont.) Standard Deviation An indicator for spread(dispersion, variation) SD=0, all data are the same, i.e. no variation SD>0, the further SD is away from 0, indicating the higher the spread of the data values
Measures of Variation (cont.) Advantage of SD Using one single value to describe the spread All data are involved to calculate SD, i.e. representative Easy to understand and make comparison Use in statistical analysis Rough estimate of SD : range/4
Statistical Paradoxes Better in each case, but worse Overall Basketball Shots First half Second half Baskets Attempts Percent Henry 4 10 40% 3 75% Michael 1 25% 7 70% Overall Baskets Attempts Percent Henry 7 14 50% Michael 8 57.1%
Exercise Ex 5-8 p174 Chapter Review Exercises 3 a-f p186
Focus on Economics Are the Rich Getting Richer? pp191-193