Presentation is loading. Please wait.

Presentation is loading. Please wait.

Seven (plus or minus two) Clusters, A Monte Carlo Study Larry Hoyle, Policy Research Institute, The University of Kansas.

Similar presentations


Presentation on theme: "Seven (plus or minus two) Clusters, A Monte Carlo Study Larry Hoyle, Policy Research Institute, The University of Kansas."— Presentation transcript:

1 Seven (plus or minus two) Clusters, A Monte Carlo Study Larry Hoyle, Policy Research Institute, The University of Kansas

2 1972 Kansas Statistical Abstract

3 Shading by Overprinting

4 Shading by Line Spacing

5 Line Shading Detail

6 What did they have in common? Neither method is “continuous” So both methods required grouping or classes Fixed number of combinations Characters on a fixed grid Integer number of lines in the polygon Lines are relatively coarse

7 How to Group for Shading Equal Intervals Equal numbers (quantiles) By clusters Don’t group (unclassed)

8 Population Density – 7 Equal Intervals 100 counties fall into the bottom class

9 Population Density - Equal Numbers 15 counties in each class - a very different picture

10 Population Density - Cluster Means Group around the 7 values that “best” represent the data

11 Population Density - Unclassed No classes, just shade in proportion to value

12 Clustering Tries for “Best” grouping Each member of cluster can be represented by the mean of the group

13 Proc Fastclus You specify the number of clusters Minimizes cluster sum of squared distance (e.g. minimum within cluster variance) inspired by: – k-means (MacQueen) leader algorithm (Hartigan)

14 Example clustering - data

15 4 clusters y cluster data. x 0102030405060708090 R-squared=.9912

16 4 clusters data Correlation.9956 R-squared=.9912

17 3 clusters y cluster data. x 0102030405060708090 R-squared=.9609

18 How many clusters is enough?

19 Plot R-squared by number of clusters Sample of 300 observations, Uniform distribution, 11 cluster analyses

20 What happens if there really aren’t any clusters? Let’s try 500 samples

21 Uniform, 300 obs. per sample 500 samples, 11 clusterings each

22 Uniform, 1000 obs. per sample 500 samples, 11 clusterings each

23 Normal, 300 obs. per sample 500 samples, 11 clusterings each

24 Normal, 1000 obs. per sample 500 samples, 11 clusterings each

25 Exponential, 300 obs. per sample 500 samples, 11 clusterings each

26 Distribution of worst sample

27 Exponential, 1000 obs. per sample 500 samples, 11 clusterings each

28 So What’s with 7  2?

29 Uniform, 7  2 500 samples, 11 clusterings each

30 Normal, 7  2 500 samples, 11 clusterings each

31 Exponential, 7  2 500 samples, 11 clusterings each

32 Minimum R squared by sample size and distribution At least 95% of the variance for all

33 Histograms Equal intervals Number of observations in each interval

34 Needle Plot of Cluster Means

35 Bar chart needs more bars

36 The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Information Processing George Miller, The Psychological Review 1956, vol.63 pp. 81-97

37 Limits on Categories for Absolute Judgments Pitch 6 Loudness 5 Visual position 9 Size of a square 5 Hue 8 Name the colors in this slide

38 “And finally, what about the magical number seven?” George A. Miller

39 Miller – Quote 1 seven wonders of the world seven seas seven deadly sins seven daughters of Atlas in the Pleiades seven ages of man seven levels of hell seven primary colors seven notes of the musical scale seven days of the week” “What about the

40 Miller – Quote 2 seven-point rating scale seven categories for absolute judgment seven objects in the span of attention seven digits in the span of immediate memory” “What about the

41 “…Perhaps there is something deep and profound behind all these sevens, something just calling out for us to discover it.” Miller – Quote 3

42 Miller - close “But I suspect that it is only a pernicious, Pythagorean coincidence.”

43 Coincidence or Nature’s Parsimony? Does our capacity match what’s needed for 95% of the variance? 95%? Hmmmm……. confidence intervals an A 19 fingers and toes 970,000 web pages Larry Hoyle Policy Research Institute University of Kansas LarryHoyle@ku.edu


Download ppt "Seven (plus or minus two) Clusters, A Monte Carlo Study Larry Hoyle, Policy Research Institute, The University of Kansas."

Similar presentations


Ads by Google