Download presentation
Presentation is loading. Please wait.
Published byRachel Green Modified over 10 years ago
1
Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit colour yellow, green, purple, red, orange Binary data:fruit / no fruit
2
Similarity matrix We define a similarity between units – like the correlation between continuous variables. (also can be a dissimilarity or distance matrix) A similarity can be constructed as an average of the similarities between the units on each variable. (can use weighted average) This provides a way of combining different types of variables.
3
relevant for continuous variables: Euclidean city block or Manhattan Distance metrics A B A B (also many other variations)
4
Similarity coefficients for binary data simple matching count if both units 0 or both units 1 Jaccard count only if both units 1 (and many other variants) simple matching can be extended to categorical data
5
Clustering methods hierarchical divisive put everything together and split monothetic / polythetic agglomerative keep everything separate and join the most similar points (classical cluster analysis) non-hierarchical k-means clustering
6
Agglomerative hierarchical Single linkage or nearest neighbour finds the minimum spanning tree: shortest tree that connects all points chaining Complete linkage or furthest neighbour Compact clusters of approximately equal size. (makes compact groups even when none exist) Average linkage methods between single and average linkage
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.