Chapter 9 UNSUPERVISED LEARNING: Clustering Part 2 Cios / Pedrycz / Swiniarski / Kurgan.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
Outline Data with gaps clustering on the basis of neuro-fuzzy Kohonen network Adaptive algorithm for probabilistic fuzzy clustering Adaptive probabilistic.
Self Organization of a Massive Document Collection
Neural Networks Chapter 9 Joost N. Kok Universiteit Leiden.
Self Organization: Competitive Learning
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)
Kohonen Self Organising Maps Michael J. Watts
Artificial neural networks:
Unsupervised Networks Closely related to clustering Do not require target outputs for each input vector in the training data Inputs are connected to a.
Self-Organizing Map (SOM). Unsupervised neural networks, equivalent to clustering. Two layers – input and output – The input layer represents the input.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
X0 xn w0 wn o Threshold units SOM.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Simple Neural Nets For Pattern Classification
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
1 Study of Topographic and Equiprobable Mapping with Clustering for Fault Classification Ashish Babbar EE645 Final Project.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Neural Networks based on Competition
November 24, 2009Introduction to Cognitive Science Lecture 21: Self-Organizing Maps 1 Self-Organizing Maps (Kohonen Maps) In the BPN, we used supervised.
Neural Networks Lecture 17: Self-Organizing Maps
Lecture 09 Clustering-based Learning
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Radial Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
KOHONEN SELF ORGANISING MAP SEMINAR BY M.V.MAHENDRAN., Reg no: III SEM, M.E., Control And Instrumentation Engg.
Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Self Organizing Maps (SOM) Unsupervised Learning.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.
Artificial Neural Network Unsupervised Learning
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Dynamical Analysis of LVQ type algorithms, WSOM 2005 Dynamical analysis of LVQ type learning rules Rijksuniversiteit Groningen Mathematics and Computing.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
381 Self Organization Map Learning without Examples.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
CHAPTER 14 Competitive Networks Ming-Feng Yeh.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Big data classification using neural network
Chapter 5 Unsupervised learning
Self-Organizing Network Model (SOM) Session 11
Chapter 9 UNSUPERVISED LEARNING: Clustering Part 2
Data Mining, Neural Network and Genetic Programming
LECTURE 03: DECISION SURFACES
Clustering (3) Center-based algorithms Fuzzy k-means
Clustering Evaluation The EM Algorithm
Unsupervised learning
Lecture 22 Clustering (3).
Feature mapping: Self-organizing Maps
Artificial Neural Networks
Unsupervised Networks Closely related to clustering
Presentation transcript:

Chapter 9 UNSUPERVISED LEARNING: Clustering Part 2 Cios / Pedrycz / Swiniarski / Kurgan

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 2 Some key observed features in the operation of a human associative memory: information is retrieved/recalled from the memory on basis of some measure of similarity relating to a key pattern memory is able to store and recall representations as structured sequences the recalls of information from memory are dynamic and similar to time-continuous physical systems SOM Clustering

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 3

4

5 Self-Organizing Feature Maps In data analysis it is fundamental to: capture the topology and probability distribution of pattern vectors map pattern vectors from the original high-D space onto the lower-D new feature space (compressed)

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 6 Data compression requires selection of features that best represent data for a specific purpose, e.g., better visual inspection of the data's structure Most attractive from the human point of view are visualizations in 2D or 3D The major difficulty is faithful projection/mapping of data to ensure preservation of the topology present in the original feature space. Self-Organizing Feature Maps

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 7 Topology-preserving mapping should have these properties: similar patterns in the original feature space must also be similar in the reduced feature space - according to some similarity criteria similarity in the original, and the reduced spaces, should be of "continuous nature “ i.e., density of patterns in the reduced feature space should correspond to those in the original space. Self-Organizing Feature Maps

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 8 Several methods were developed for 2D topology- preserving mapping: linear projections, such as eigenvectors nonlinear projections, such as Sammon's projection nonlinear projections, such as SOM neural networks Self-Organizing Feature Maps

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 9 Sammon’s Projection

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 10 Sammon’s Projection Performs a non-linear projection, typically, onto a 2D plane Disadvantages: -it is computationally heavy -it cannot be used to project new points (points that were not used during training) on the output plane

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 11 SOM: Principle high-dim space low-dim space (2-dim, 3-dim)

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 12 Developed by Kohonen in 1982 SOM is an unsupervised learning, topology preserving, projection algorithm It uses a feedforward topology It is a scaling method projecting data from high-D input space into a lower-D output space Similar vectors in the input space are projected onto nearby neurons on the 2D map Self-Organizing Feature Maps

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 13 The feature map is a layer in which the neurons are self-organizing themselves, according to input values Each neuron of the input layer is connected to each neuron of the 2D topology/map The weights associated with the inputs are used to propagate them to the map neurons Self-Organizing Feature Maps

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 14 The neurons in a certain area around the winning neuron are also influenced SOM reflects the ability of biological neurons to perform global ordering based on local interactions SOM: Topology and Learning Rule

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 15 One iteration of the SOM learning: 1.Present a randomly selected input vector x to all neurons 2.Select the winning neuron, i.e., one whose weight vector is closest to the input vector, according to the chosen similarity measure 3.Adjust the weight of the jth winning neuron, and the weights of neighboring (defined by some neighborhood function) neurons SOM: Topology and Learning Rule

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 16 The jth winning neuron is selected as the one having minimal distance value: Competitive (winner-takes-all / Kohonen) learning rule is used for adjusting the weights: SOM: Topology and Learning Rule

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 17 Kohonen also proposed a dot product similarity for selecting the winning neuron: and the learning rule: where N j (t) is the winning neuron ’ s neighborhood and (0<  (t) <  ) is the decreasing learning function. This formula assures automatic weight normalization to the length of one. SOM: Topology and Learning Rule

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 18

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 19 The neighborhood kernel (of the winning neuron) is a non-increasing function of time and distance It defines the region of influence that the input has on the SOM Neighborhood Kernel

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 20 The geometric set of neurons must decrease with the increase of iteration steps (time) Convergence of the learning process requires that the radius must decrease with learning time/iteration This causes global ordering by local interactions and local weight adjustments. Neighborhood Kernel

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 21

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 22 With the neighborhood function fixed, the neighborhood kernel learning function can be defined as: Neighborhood Kernel

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 23 Another frequently used neighborhood kernel is a Gaussian function with a radius decreasing with time: Neighborhood Kernel

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 24 Conditions for successful learning of the SOM network: the training data must be large since self-organization relies on statistical properties of data proper selection of the neighborhood kernel function assures that only the weights of the winning neuron and its neighborhood neurons are locally adjusted the radius of the winning neighborhood, as well as the learning function rate, must monotonically decrease with time the amount of weight adjustment for neurons in a winning neighborhood depends on how close they are to the winner SOM: Algorithm

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 25 Given: The 2D network topology consisting of M neurons; training data set of L n-D input vectors; number of iterations T; neighborhood kernel function; learning rate function 1. Set learning rate to the max learning rate function 2. set iteration step t=0 3. randomly select initial values of the weights 4. randomly select the input pattern and present it to the network 5. compute the current learning rate for step t using the given learning rate function SOM: Algorithm

© 2007 Cios / Pedrycz / Swiniarski / Kurgan compute the Euclidean distances || x i - w k (t) ||, k = 1, …, M 7. select the jth winning neuron || x i - w j (t) || = min || x i (t) – w k (t) ||, k = 1, …, M 8. define the winning neighborhood around the winning neuron using the neighborhood kernel 9. adjust the weights of the neurons w p (t+1) = w p (t) +  (t) (x i – w p (t)), p  N j (t) 10. increase t=t+1 If t>T stop; otherwise go to step 4 Result: Trained SOM network SOM: Algorithm

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 27 from Kohonen

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 28 Given: Trained SOM network consisting of the 2D array of M neurons, each neuron receiving an input via its weight vector w. Small training data set consisting of pairs (x i, c i ), i=1, 2, …, L, where c is the class label 1. Set i = 1 2. Present input pattern to the network 3. Calculate the distances or the dot products 4. Locate the spatial position of the winning neuron and assign label c to that neuron 5. Increase i= i+1 and continue with i < = L Result: Calibrated SOM network SOM: Interpretation

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 29 In practice, we do not optimize the SOM design, instead, we pay attention to these factors: Initialization of the weights Often they are normalized, and can be set equal to the first several input vectors. If they are not initialized to some input vectors then they should be grouped in one quadrant of the unit circle so that they can unfold to map the distribution of input data. Starting value of the learning rate and its decreasing schedule Structure of neighborhood kernel and its decreasing schedule SOM: Issues

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 30 SOM: Example

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 31 SOM: Example

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 32 SOM: Example

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 33 SOM: Example

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 34 Self-Organizing Feature Map (SOM) Visualize a structure of highly dimensional data by mapping it onto the low-dimensional (typically two-dimensional) grid of linear neurons

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 35 Clustering and Vector Quantization

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 36 Cluster validity Using different clustering methods might result in different partitions of X, at each value of c Which clusterings are valid? It is plausible to expect “good” clusters at more than one value of c ( 2  c < n ) How many clusters do exist in the data?

Cluster validity Some Symbol X : {X1,X2,…,Xn} U = {Uik} : c partitions of X are sets of (cn) values {Uik} that can be conveniently array. There are three sets of partition matrices © 2007 Cios / Pedrycz / Swiniarski / Kurgan 37

Cluster Validity classification of validity measures Direct Measure: Davies-Bouldin Index, Dunn’s index Indirect measures for fuzzy clusters: degree of separation, partition coefficient and partition entropy. Xie and Beni index © 2007 Cios / Pedrycz / Swiniarski / Kurgan 38

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 39 Cluster validity Cluster Error is associated with any U  M c it is the number of vectors in X that are mislabeled by U E(U) is an absolute measure of cluster validity when X is labeled, and is undefined when X is not labeled.

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 40 Cluster validity Process X at c = 2, 3, …, n – 1 and record the optimal values of some criterion as a function of c The most valid clustering is taken as an extremum of this function (or some derivative of it) Problem: many criterion functions usually have multiple local stationary points at fixed c, and global extrema are not necessarily the “best” c-partitions of the data

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 41 Cluster validity More formal approach is to pose the validity question in the framework of statistical hypothesis testing Major difficulty is that the sampling distribution is not known Nonetheless, goodness of fit statistics such as chi-square and Kolmogorov-Smirnov tests have been used

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 42 Cluster validity The global minimum of J w may suggest the “wrong” 2- clusters Example from Bezdek: n=29 data vectors {x k }  R 2 “correct” 2-partition of X is shown on the left The global minimum is hardly an attractive solution. X s For hard 2-partition,

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 43 Cluster validity Basic question: what constitutes a “good” cluster ? What is a “cluster” anyhow? The difficulty is that the data X, and every partition of X, are separated by the algorithm generating partition matrix U (and defining “clusters” in the process)

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 44 Cluster validity Many of the algorithms ignore this fundamental difficulty and are thus heuristic in nature Some heuristic methods measure the amount of fuzziness in U, and presume the least fuzzy partitions to be the most valid

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 45 Degree of Separation - Bezdek The degree of separation between fuzzy sets u 1 and u 2 is the scalar and its generalization from 2 to c clusters is:

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 46 Degree of Separation - Bezdek Example: c=2 and two different fuzzy 2-partitions of X: Z(U;2) = Z(V;2) = 0.50 U and V are very different so Z does not distinguish between the two partitions.

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 47 Partition Coefficient - Bezdek U is a fuzzy c-partition of n data points. The partition coefficient, F, of U, is the scalar: The value of F(U;c) depends on all (c x n) elements of U in contrast to Z(U;c) that depends on just one.

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 48 Partition Coefficient - Bezdek Example: values of F on U and V partitions of X: F(U;2) = F(V; 2) = The value of F gives accurate indication of the partition, for both the most uncertain and certain states.

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 49 Partition Coefficient - Bezdek from Bezdek

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 50 Partition Coefficient - Bezdek Values of F(U;c) for c = 2, 3, 4, 5, 6 with the norms N E, N D and N M F first identifies a primary structure at c * = 2; and then the secondary structure at c = 3 Norm cNENE NDND NMNM

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 51 Partition Entropy - Bezdek Let {A i | 1  i  c } denote a c-partition of events of any sample space connected with an experiment; and let p T = (p 1,p 2, …, p c ) denote a probability vector associated with the {A i }. The pair ({A i }, p) is called a finite probability scheme for the experiment i th component of p is the probability of event A i c is called the length of the scheme Note: c does NOT indicate the number of clusters

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 52 Partition Entropy - Bezdek Our aim is to find a measure h(p) of the amount of uncertainty associated with each state. h(p) – should maximize for p=(1/c, …, 1/c) h(p) – should minimize for p=(0, 1, 0, …) (any partition statistically certain)

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 53 Partition Entropy - Bezdek The entropy of the scheme is defined as: p i log a (p i ) = 0 whenever p i = 0

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 54 Partition Entropy - Bezdek Partition entropy of any fuzzy c-partition U  M fc of X, where |X| = n, is, for 1  c  n

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 55 Partition Entropy - Bezdek Let U  M fc be a fuzzy c-partition of n data points. Then for 1  c  n and a  (1,  ) 0  H(U;c)  log a (c) H(U;c) = 0  M co is hard H(U;c) = log a (c)  U = [1/c]

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 56 Partition Entropy - Bezdek Example: entropy for U and V H(U;c) = 49 log e (2)/51 = H(V;c) = log e (2)/51 = U is a very uncertain partition

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 57 Partition Entropy - Bezdek We assume that minimization of H(U;c) corresponds to maximizing the amount of information about structural memberships an algorithm extracted from data X. H(U;c) is used for cluster validity as follows:  c denotes a finite set of “optimal” U’s  M fc c = 2, 3, …, n-1

Partition Entropy - Bezdek Normalized partition entropy: Reasons: Variable ranges make interpretation of values of Vpc and Vpe difficult. Since they are not referenced to a fixed scale. For example © 2007 Cios / Pedrycz / Swiniarski / Kurgan 58

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 59 Partition Entropy - Bezdek Normalized partition entropy:

Cluster Validity Comment for partition coefficient and partition entropy Vpc maximizes (and Vpe minimizes) on every crisp c-partition of X. And at the other extreme, Vpc takes its unique minimum( and Vpe takes its unique maximum) at the centroid U =[1/c]=. The “fuzziest” partition you can get since it assigns every point in X to all c classes with equal membership values 1/c. © 2007 Cios / Pedrycz / Swiniarski / Kurgan 60

Cluster Validity Comment for partition coefficient and partition entropy Vpc and Vpe essentially measure the distance U is from being crisp by measuring the fuzziness in the rows of U All these two indices really measures is fuzziness relative to partitions that yield other values of the indices There are roughly ( ) crisp matrices in Mhcn and Vpc is constantly 1 © 2007 Cios / Pedrycz / Swiniarski / Kurgan 61

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 62 from Bezdek

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 63 Prototypes for FEATURE SELECTION from Bezdek Symptom Feature centers Absolute differences (Hernia) v 1j (Gallstones) v 2j 

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 64 Cluster Errors for FEATURE SELECTION from Bezdek Symptoms used Cluster Error E(U) {1-11}23 {3}23 {3, 8}23 {3, 9}36 {3, 8, 9}36

Cluster Validity Example © 2007 Cios / Pedrycz / Swiniarski / Kurgan 65

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 66 Cluster Validity Divisive Coefficient (DC) a,b,c, d,e c,d,e d,e a b c d e a,b

Cluster Validity Divisive Coefficient For each object i, let d ( i ) denote the diameter of the last cluster to which it belongs (before being split off as a single object), divided by the diameter of the whole data set. The divisive coefficient (DC), given by DC=( © 2007 Cios / Pedrycz / Swiniarski / Kurgan 67

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 68 Cluster Validity C- number of clusters being included

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 69 Cluster Validity Divisive Coefficient (DC) a,b,c, d,e c,d,e d,e a b c d e a,b l0l0 l1l1 l2l2 l4l4 l3l3 l5l5 l6l6 l7l7 l8l8 L1=(1-((10-5)/10))3=1.5

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 70 Cluster Validity Divisive Coefficient (DC) a,b,c, d,e c,d,e d,e a b c d e a,b l0l0 l1l1 l2l2 l4l4 l3l3 l5l5 l6l6 l7l7 l8l8

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 71 Cluster Validity Divisive Coefficient (DC) a,b,c, d,e c,d,e d,e a b c d e a,b l0l0 l1l1 l2l2 l4l4 l3l3 l5l5 l6l6 l7l7 l8l8

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 72 Cluster Validity Divisive Coefficient (DC) a,b,c, d,e c,d,e d,e a b c d e a,b l0l0 l1l1 l2l2 l4l4 l3l3 l5l5 l6l6 l7l7 l8l8

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 73 Cluster Validity Divisive Coefficient (DC) a,b,c, d,e c,d,e d,e a b c d e a,b l0l0 l1l1 l2l2 l4l4 l3l3 l5l5 l6l6 l7l7 l8l8

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 74 Cluster Validity How to assess the quality of clusters? How many clusters should be found distinguished in data?

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 75 Cluster Validity: Davies-Bouldin index

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 76 Cluster Validity: Dunn separation index

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 77 Cluster Validity: Xie-Benie achieve the lowest value of “r”

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 78 Cluster Validity: Fuzzy Clustering Partition coefficient Partition entropy Sugeno-Takagi

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 79 Random Sampling Two-phase clustering: (a) Random sampling and (b) Clustering of prototypes

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 80 References