Clustering based on book chapter Cluster Analysis in Multivariate Analysis by Hair, Anderson, Tatham, and Black.

Clustering based on book chapter Cluster Analysis in Multivariate Analysis by Hair, Anderson, Tatham, and Black

Cluster analysis What? we group objects based on characteristics they posses also called as numerical taxonomy or typology construction often atheoretical: no statistical basis, lots of heuristics

Intuitive basis Ach, ja: Gestaltlagen

Clustering methods: nonhierarchical hierarchical fuzzy
vector quantization hierarchical agglomerative divisive fuzzy probabilistic mixture models?

Obectives exploratory/confirmatory taxonomy description (e.g. biology)
data simplification (e.g. segmentation)

Select the variables abracadabra, explicit theories, past research, suppositions, hopes, deadlines,

Research design detect and remove outliers choose a similarity measure
Householder norm (usually Euclid) Mahalanobis correlation standardize the data by variable within case

Similarity measures

Research design representativeness of the sample (cf. outliers)
multicollinearity?

How is this done? abracadabra, explicit theories, past research, suppositions, hopes, deadlines,

Clustering procedure single linkage complete linkage average linkage
centroid method Ward’s method

Single linkage results easily in snake-like clusters even if they don’t exist

Complete linkage eliminates the snake formation, otherwise a big question mark

Average linkage joins clusters with smallest average distances
not so outlier sensitive tends to form cluster with small within-cluster variation biased to form clusters with approximately the same variance etc.

Centroid method

Centroid method most outlier robust
confusing situations: intercentroid distances may become smaller than distances between already joined pairs: messes up the dendorgram

Ward’s method distance between two clusters is something squared
tends to combine clusters with small number of objects biased toward clusters with approximately equal number of objects

Nonhierachical heuristical methods: sequential treshold/parallel treshold objective function based: VQ:s, e.g., K-means procedure Hierachical: O(N2), K-means O(KN)

How many clusters open question
practical limits (it would be nice to have 3-6 clusters) dendrogram based (large increase in cluster distances

Validation exogeneous variables indexes, e.g. Davies-Bouldin measure
age:15 age:20 age:14 doesn’t like DD likes Donald Duck

Key issues similarity or dissimilarity measure
...and data standardization

Clustering based on book chapter Cluster Analysis in Multivariate Analysis by Hair, Anderson, Tatham, and Black.

Similar presentations

Presentation on theme: "Clustering based on book chapter Cluster Analysis in Multivariate Analysis by Hair, Anderson, Tatham, and Black."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Clustering based on book chapter Cluster Analysis in Multivariate Analysis by Hair, Anderson, Tatham, and Black.

Similar presentations

Presentation on theme: "Clustering based on book chapter Cluster Analysis in Multivariate Analysis by Hair, Anderson, Tatham, and Black."— Presentation transcript:

Similar presentations

About project

Feedback