Download presentation
Presentation is loading. Please wait.
Published byMarshall Jordan Modified over 9 years ago
1
by Timofey Shulepov Clustering Algorithms
2
Clustering - main features Clustering – a data mining technique Def.: Classification of objects into sets by the common traits among the objects, but not between different sets. Usage: Statistical Data Analysis Machine Learning Data Mining Pattern Recognition Image Analysis Bioinformatics
3
Types of clustering Hierarchical Finding new clusters using previously found ones Partitional Finding all clusters at once Self-Organizing Maps Hybrids (incremental)
4
Concept of distance measure Distance measure – determines how the similarity of two elements is calculated. Similarity is expressed in terms of a distance function Distance functions – vary significantly for interval-scaled, categorical, and other variables Examples of Dist. Fcns: Euclidean distance, Manhattan distance, etc.
5
Distance functions, in more detail. Euclidean distance – aka “as the crow flies”, or 2-norm distance. The most commonly used one, the usually implied distance measurement (ruler, 2 dots). Manhattan distance – aka “taxicab” or 1-norm distance. Going from A to B via intersections (sort of). Maximum norm – explanation is too complicated for this presentation Mahalanobis distance – similar to Euclidean, but it considers specifics of data sets, and is scale-invariant Garcia
6
Hierarchical Clustering Hierarchical clustering Result: Given the input set S, the goal is to produce a hierarchy (dendogram) in which nodes represent subsets of S simulating the structure found in S. Can be agglomerative or divisive Agglomerative – “bottoms-up”: begin with one element as a separate cluster, and escalate. Divisive – “top-down”: begin with one large set, and divide it into smaller sets.
7
Agglomerative Hierarchical Clustering 1. Place each instance of S in its own cluster (singleton), creating the list of clusters L (initially, the leaves of T): L = S1, S2, S3,.., Sn. 2. Compute a merging cost function between every pair of elements in L to find the two closest clusters {Si, Sj} which will be the cheapest couple to merge Remove Si & Sj from L. 4. Merge Si & Sj to create a new internal node Sij in T which will be the parent of Sj & Sj in the result tree. 5. Do (2) until there is only one set remaining.
8
K-Clustering K-clustering algorithm Result: Given the input set S and a fixed integer k, a partition of S into k subsets must be returned. K-means clustering is the most common partitioning algorithm.
9
K-clustering algo cont'd 1. Select k initial cluster centroids, c1, c2, c3..., ck. 2. Assign each instance x in S to the cluster whose centroid is the nearest to x. 3. For each cluster, re-compute its centroid based on which elements are contained in. 4. Go to (2) until convergence is achieved. Garcia
10
Self-Organized Maps Def.: A group of several connected nodes mapped into a k-dimensional space following some specific geometrical topology (grids, rings, lines,...). Initially placed at random and iteratively adjusted according to the distribution of examples (input) along the k-dimensional space. Garcia
11
Annotated Bibliography Wikipediahttp://en.wikipedia.org/wiki/Data_clustering#Ty pes_of_clustering http://en.wikipedia.org/wiki/Data_clustering#Ty pes_of_clusteringhttp://en.wikipedia.org/wiki/Data_clustering#Ty pes_of_clustering Enrique Blanco Garcia http://genome.imim.es/~eblanco/seminars/docs/clusterin g/index_types.html#hierarchy
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.