Presentation is loading. Please wait.

Presentation is loading. Please wait.

By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.

Similar presentations

Presentation on theme: "By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets."— Presentation transcript:

1 by Timofey Shulepov Clustering Algorithms

2 Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets by the common traits among the objects, but not between different sets.  Usage:  Statistical Data Analysis  Machine Learning  Data Mining  Pattern Recognition  Image Analysis  Bioinformatics

3 Types of clustering  Hierarchical  Finding new clusters using previously found ones  Partitional  Finding all clusters at once  Self-Organizing Maps  Hybrids (incremental)

4 Concept of distance measure  Distance measure – determines how the similarity of two elements is calculated.  Similarity is expressed in terms of a distance function  Distance functions – vary significantly for interval-scaled, categorical, and other variables  Examples of Dist. Fcns: Euclidean distance, Manhattan distance, etc.

5 Distance functions, in more detail.  Euclidean distance – aka “as the crow flies”, or 2-norm distance. The most commonly used one, the usually implied distance measurement (ruler, 2 dots).  Manhattan distance – aka “taxicab” or 1-norm distance. Going from A to B via intersections (sort of).  Maximum norm – explanation is too complicated for this presentation  Mahalanobis distance – similar to Euclidean, but it considers specifics of data sets, and is scale-invariant  Garcia

6 Hierarchical Clustering  Hierarchical clustering  Result: Given the input set S, the goal is to produce a hierarchy (dendogram) in which nodes represent subsets of S simulating the structure found in S.  Can be agglomerative or divisive  Agglomerative – “bottoms-up”: begin with one element as a separate cluster, and escalate.  Divisive – “top-down”: begin with one large set, and divide it into smaller sets.

7 Agglomerative Hierarchical Clustering  1. Place each instance of S in its own cluster (singleton), creating the list of clusters L (initially, the leaves of T): L = S1, S2, S3,.., Sn.  2. Compute a merging cost function between every pair of elements in L to find the two closest clusters {Si, Sj} which will be the cheapest couple to merge  Remove Si & Sj from L.  4. Merge Si & Sj to create a new internal node Sij in T which will be the parent of Sj & Sj in the result tree.  5. Do (2) until there is only one set remaining.

8 K-Clustering  K-clustering algorithm  Result: Given the input set S and a fixed integer k, a partition of S into k subsets must be returned.  K-means clustering is the most common partitioning algorithm.

9 K-clustering algo cont'd  1. Select k initial cluster centroids, c1, c2, c3..., ck.  2. Assign each instance x in S to the cluster whose centroid is the nearest to x.  3. For each cluster, re-compute its centroid based on which elements are contained in.  4. Go to (2) until convergence is achieved.  Garcia

10 Self-Organized Maps  Def.: A group of several connected nodes mapped into a k-dimensional space following some specific geometrical topology (grids, rings, lines,...). Initially placed at random and iteratively adjusted according to the distribution of examples (input) along the k-dimensional space.  Garcia

11 Annotated Bibliography  Wikipedia pes_of_clustering pes_of_clustering pes_of_clustering  Enrique Blanco Garcia g/index_types.html#hierarchy

Download ppt "By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets."

Similar presentations

Ads by Google