Intro. ANN & Fuzzy Systems Lecture 21 Clustering (2)

Slides:



Advertisements
Similar presentations
Document Clustering Carl Staelin. Lecture 7Information Retrieval and Digital LibrariesPage 2 Motivation It is hard to rapidly understand a big bucket.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering.
Hierarchical Clustering
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering, DBSCAN The EM Algorithm
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Linear Discriminant Analysis
Introduction to Bioinformatics
IT 433 Data Warehousing and Data Mining Hierarchical Clustering Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department.
6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.
Intro. ANN & Fuzzy Systems Lecture 8. Learning (V): Perceptron Learning.
Clustering II.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Segmentation Graph-Theoretic Clustering.
Clustering.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering Unsupervised learning Generating “classes”
Clustering Algorithms Mu-Yu Lu. What is Clustering? Clustering can be considered the most important unsupervised learning problem; so, as every other.
Gaussian Mixture Model and the EM algorithm in Speech Recognition
Intro. ANN & Fuzzy Systems Lecture 36 GENETIC ALGORITHM (1)
© by Yu Hen Hu 1 ECE533 Digital Image Processing Image Segmentation.
Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory 1 CLUSTERS Prof. George Papadourakis,
Today Ensemble Methods. Recap of the course. Classifier Fusion
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
Intro. ANN & Fuzzy Systems Lecture 23 Clustering (4)
Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
A split-and-merge framework for 2D shape summarization D. Gerogiannis, C. Nikou and A. Likas Department of Computer Science, University of Ioannina, Greece.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Feature extraction using fuzzy complete linear discriminant analysis The reporter : Cui Yan
CURE: EFFICIENT CLUSTERING ALGORITHM FOR LARGE DATASETS VULAVALA VAMSHI PRIYA.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 32: HIERARCHICAL CLUSTERING Objectives: Unsupervised.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Intro. ANN & Fuzzy Systems Lecture 13. MLP (V): Speed Up Learning.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Intro. ANN & Fuzzy Systems Lecture 38 Mixture of Experts Neural Network.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Intro. ANN & Fuzzy Systems Lecture 37 Genetic and Random Search Algorithms (2)
Intro. ANN & Fuzzy Systems Lecture 24 Radial Basis Network (I)
Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)
Christoph F. Eick Questions Review October 12, How does post decision tree post-pruning work? What is the purpose of applying post-pruning in decision.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Lecture 39 Hopfield Network
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Data Mining: Basic Cluster Analysis
Machine Learning Clustering: K-means Supervised Learning
Hierarchical Clustering
Clustering and Segmentation
CH 5: Multivariate Methods
Data Clustering Michael J. Watts
Clustering (3) Center-based algorithms Fuzzy k-means
Lecture 25 Radial Basis Network (II)
Lecture 21 Clustering (2).
Hierarchical Clustering
Feature space tansformation methods
Text Categorization Berlin Chen 2003 Reference:
SEEM4630 Tutorial 3 – Clustering.
Hierarchical Clustering
Presentation transcript:

Intro. ANN & Fuzzy Systems Lecture 21 Clustering (2)

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 2 Outline Similarity (Distance) Measures Distortion Criteria Scattering Criterion Hierarchical Clustering and other clustering methods

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 3 Distance Measure Distance Measure – What does it mean “Similar"? –Norm: –Mahalanobis distance: d(x,y) = |x – y| T S xy  1 |x – y| –Angle:d(x,y) = x T y/(|x||y|) Binary and symbolic features (x, y contains 0, 1 only): –Tanimoto coefficient:

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 4 Clustering Criteria Is the current clustering assignment good enough? Most popular one is the mean-square error distortion measure Other distortion measures can also be used:

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 5 Scatter Matrics Scatter matrices are defined in the context of analysis of variance in statistics. They are used in linear discriminant analysis. However, they can also be used to gauge the fitness of a particular clustering assignment. Mean vector for i-th cluster: Total mean vector Scatter matrix for i-th cluster: Within-cluster scatter matrix Between-cluster scatter matrix

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 6 Scattering Criteria Total scatter matrix: Note that the total scatter matrix is independent of the assignment I(x k,i). But … S W and S B both depend on I(x k,i)! Desired clustering property –S W small –S B large How to gauge S w is small or S B is large? There are several ways. Tr. S w (trace of S W ): Let be the eigenvalue decomposition of S W, then

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 7 Cluster Separating Measure (CSM) Similar to scattering criteria. csm = (m i -m j )/(  i +  j ) The larger its value, the more separable the two clusters. Assume underlying data distribution is Gaussian.

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 8 Hierarchical Clustering Merge Method: Initially, each x k is a cluster. During each iteration, nearest pair of distinct clusters are merged until the number of clusters is reduced to 1. How to measure distance between two clusters: d min (C(i), C(j)) = min. d(x,y); x  C(i), y  C(j)  leads to minimum spanning tree d max (C(i), C(j)) = max. d(x,y); x  C(i), y  C(j) d avg (C(i), C(j)) = d mean (C(i), C(j)) = m i – m j

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 9 Hierarchical Clustering (II) Split method: Initially, only one cluster. Iteratively, a cluster is splited into two or more clusters, until the total number of clusters reaches a predefined goal. The scattering criterion can be used to decide how to split a given cluster into two or more clusters. Another way is to perform a m-way clustering, using, say, k-means algorithm to split a cluster into m smaller clusters.