Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering based on book chapter Cluster Analysis in Multivariate Analysis by Hair, Anderson, Tatham, and Black.

Similar presentations


Presentation on theme: "Clustering based on book chapter Cluster Analysis in Multivariate Analysis by Hair, Anderson, Tatham, and Black."— Presentation transcript:

1 Clustering based on book chapter Cluster Analysis in Multivariate Analysis by Hair, Anderson, Tatham, and Black

2 Cluster analysis What? we group objects based on characteristics they posses also called as numerical taxonomy or typology construction often atheoretical: no statistical basis, lots of heuristics

3 Intuitive basis Ach, ja: Gestaltlagen

4 Clustering methods: nonhierarchical hierarchical fuzzy
vector quantization hierarchical agglomerative divisive fuzzy probabilistic mixture models?

5 Obectives exploratory/confirmatory taxonomy description (e.g. biology)
data simplification (e.g. segmentation)

6 Select the variables abracadabra, explicit theories, past research, suppositions, hopes, deadlines,

7 Research design detect and remove outliers choose a similarity measure
Householder norm (usually Euclid) Mahalanobis correlation standardize the data by variable within case

8 Similarity measures

9 Research design representativeness of the sample (cf. outliers)
multicollinearity?

10 How is this done? abracadabra, explicit theories, past research, suppositions, hopes, deadlines,

11 Clustering procedure single linkage complete linkage average linkage
centroid method Ward’s method

12 Single linkage results easily in snake-like clusters even if they don’t exist

13 Complete linkage eliminates the snake formation, otherwise a big question mark

14 Average linkage joins clusters with smallest average distances
not so outlier sensitive tends to form cluster with small within-cluster variation biased to form clusters with approximately the same variance etc.

15 Centroid method

16 Centroid method most outlier robust
confusing situations: intercentroid distances may become smaller than distances between already joined pairs: messes up the dendorgram

17 Ward’s method distance between two clusters is something squared
tends to combine clusters with small number of objects biased toward clusters with approximately equal number of objects

18 Nonhierachical heuristical methods: sequential treshold/parallel treshold objective function based: VQ:s, e.g., K-means procedure Hierachical: O(N2), K-means O(KN)

19 How many clusters open question
practical limits (it would be nice to have 3-6 clusters) dendrogram based (large increase in cluster distances

20 Validation exogeneous variables indexes, e.g. Davies-Bouldin measure
age:15 age:20 age:14 doesn’t like DD likes Donald Duck

21 Key issues similarity or dissimilarity measure
...and data standardization


Download ppt "Clustering based on book chapter Cluster Analysis in Multivariate Analysis by Hair, Anderson, Tatham, and Black."

Similar presentations


Ads by Google