Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.

Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit colour yellow, green, purple, red, orange Binary data:fruit / no fruit

Similarity matrix We define a similarity between units – like the correlation between continuous variables. (also can be a dissimilarity or distance matrix) A similarity can be constructed as an average of the similarities between the units on each variable. (can use weighted average) This provides a way of combining different types of variables.

relevant for continuous variables: Euclidean city block or Manhattan Distance metrics A B A B (also many other variations)

Similarity coefficients for binary data simple matching count if both units 0 or both units 1 Jaccard count only if both units 1 (and many other variants) simple matching can be extended to categorical data

Clustering methods hierarchical divisive put everything together and split monothetic / polythetic agglomerative keep everything separate and join the most similar points (classical cluster analysis) non-hierarchical k-means clustering

Agglomerative hierarchical Single linkage or nearest neighbour finds the minimum spanning tree: shortest tree that connects all points chaining Complete linkage or furthest neighbour Compact clusters of approximately equal size. (makes compact groups even when none exist) Average linkage methods between single and average linkage

Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.

Similar presentations

Presentation on theme: "Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.

Similar presentations

Presentation on theme: "Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit."— Presentation transcript:

Similar presentations

About project

Feedback