Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Clustering-A neural network approach K.-L. Du NN, Vol.23, No. 1, 2009, pp Presenter : Wei-Shen Tai 2010/1/26
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 2 Outline Introduction Competitive learning - SOM Problems and solution Under-utilization problem Shared clusters - Rival-penalized competitive learning Winner-take-most rule relaxes the WTA - Soft competitive learning Outliers - Robust clustering Cluster validity Computer simulations Summary Comments
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 3 Motivation Competitive learning based clustering method Those differences of training process between methods determine the final results of each one. Discuss several effective solutions for solving problems occurring in competitive learning.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 4 Objective A comprehensive overview of competitive learning based clustering methods. Associated topics such as the under-utilization problem, fuzzy clustering, robust clustering.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 5 Vector quantization (VQ) It is a classical method for approximating a continuous probability density function (PDF) Using a finite number of prototypes. A set of feature vectors x is represented by a finite set of prototypes. A simple training algorithm for vector quantization is: 1. Pick a sample point at random 2. Move the nearest quantization vector centroid towards this sample point, by a small fraction of the distance 3. Repeat
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 6 Competitive learning It can be implemented using a two-layer (JK) neural network. The output layer is called the competition layer. Objective function It is usually derived by minimizing the mean squared error (MSE) functional.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 7 Kohonen network SOM It is not based on the minimization of any known objective function. Major problems such as forced termination, unguaranteed convergence, non- optimized procedure, and the output being often dependent on the sequence of data. There are some proofs for the convergence of the one-dimensional SOM based on the Markov chain analysis,but no general proof of convergence for multi-dimensional SOM is available.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 8 Learning vector quantization LVQ algorithms define near-optimal decision borders between classes. Unsupervised LVQ It is essentially the Simple Competitive Learning (SCL) based VQ. Supervised LVQ It is based on the known classification of feature vectors, and can be treated as a supervised version of the SOM. This algorithm tends to reduce the point density of c i around the Bayesian decision surfaces. InputCompetitiveLinear (target)
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 9 Mountain clustering A simple and effective method for estimating the number of clusters and the initial locations of the cluster centers. The potential for each grid is calculated based on the density of the surrounding data points. 1. The grid with the highest potential is selected as the first cluster center and then the potential values of all the other grids are reduced according to their distances to the first cluster center. 2. The next cluster center is located at the grid point with the highest remaining potential. 3. This process is repeated until the remaining potential values of all the grids fall below a threshold.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 10 Subtractive clustering As a modified mountain clustering Uses all the data points to replace all the grid points as potential cluster centers. This effectively reduces the number of grid points to N. The potential measure for each data point x i is defined as a function of the Euclidean distances to all the other input data points. It requires only one pass of the training data. Besides, the number of clusters does not need to be pre-specified.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 11 Neural gas It is a topology-preserving network, and can be treated as an extension to the C- means. It has a fixed number of processing units, K, with no lateral connection. 1. In step t, find the ranking of prototype vector according to the distance between input x t and individual prototype. 2. The prototypes are updated by A data optimal topological ordering is achieved by using neighborhood ranking within the input space at each training step. Unlike the SOM, which uses predefined static neighborhood relations, the NG determines a dynamical neighborhood relation as learning proceeds. The NG is an efficient and reliable clustering algorithm, which is not sensitive to the neuron initialization.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 12 Adaptive resonance theory (ART) The theory leads to a series of real-time unsupervised network models for clustering, pattern recognition, and associative memory. 1. At the training stage, the stored prototype of a category is adapted when an input pattern is sufficiently similar to the prototype. 2. When novelty is detected(input is dissimilar to existing prototypes), the ART adaptively and autonomously creates a new category with the input pattern as the prototype. The ART has the ability to adapt, yet not forget the past training, and it overcomes the so-called stability-plasticity dilemma. Like the incremental C-means, the ART model family is sensitive to the order of presentation of the input patterns.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 13 Lateral inhibition of ART It typically consists of a comparison field (F1) and a recognition field (F2) composed of neurons. Each recognition field neuron outputs a negative signal to each of the other recognition field neurons and inhibits their output accordingly.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 14 Supervised clustering When output patterns are used in clustering, this leads to supervised clustering. The locations of the cluster centers are determined by both the input pattern spread and the output pattern deviations. Examples of supervised clustering include the LVQ family, the ARTMAP family, the conditional FCM, the supervised C-means, and the C-means plus k-NN based clustering.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 15 Clustering using non-Euclidean distance measures Mahalanobis distance It can be used to look for hyper-ellipsoid shaped clusters. Special distance It is applied for those detecting circles and hyper- spherical shells methods such as extensions of the C-means and FCM algorithms.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 16 Hierarchical clustering It consists of a sequence of partitions in a hierarchical structure, which can be represented as a clustering tree called dendro-gram. Hierarchical clustering takes the form of either agglomerative (bottom up) or divisive (top-down) technique. Agglomerative starts from N clusters, each containing one data point. A series of nested merging is performed until all the data points are grouped into one cluster. Density based clustering Groups objects of a data set into clusters based on density conditions. It can handle outliers and discover clusters of arbitrary shape.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 17 Constructive clustering techniques Conventional partition clustering algorithms Selecting the appropriate value of K is a difficult task without a prior knowledge of the input data. Self-creating mechanism in the competitive learning process can adaptively determine the natural number of clusters. The self-creating and organizing neural network (SCONN) For a new input, the winning node is updated if it is active; otherwise a new node is created from the winning node. Growing Neural Gas(GNG) It is capable of generating and removing neurons and lateral connections dynamically. It achieves robustness against noise and performs perfect topology- preserving mapping.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 18 Miscellaneous clustering methods Expectation maximization (EM) Represents each cluster using a probability distribution, typically a Gaussian distribution. It is derived by maximizing the log likelihood of the probability density function of the mixture model. It is treated as a fuzzy clustering technique.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 19 Under-utilization problem A initialization problem, since some prototypes, called dead units, may never win the competition. Solutions 1.Conscience strategy The frequent winner receives a bad conscience by adding a penalty term to its distance from the input signal. 2.Distortion measure 3.Multiplicatively biased competitive learning (MBCL) The bias of winner will be added, lower bias is considered in determining winner.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 20 Shared clusters The problem is considered in the rival- penalized competitive learning (RPCL) algorithm. A rival penalizing force For each input, the second-place winner called the rival is also updated by a smaller learning rate along the opposite direction. Over-penalization or under-penalization problem The STAR C-means has a mechanism similar to the RPCL, but penalizes the rivals in an implicit way.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 21 Winner-take-most rule It relaxes the WTA rule by allowing more than one neuron as winners to a certain degree. Soft competitive learning Such as the SOM (Kohonen, 1989), the NG (Martinetz et al., 1993), the GNG (Fritzke, 1995a). The maximum-entropy clustering applies
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 22 Outliers Outliers in a data set affects the result of clustering. Solutions 1.Noise clustering All outliers are collected into a separate, amorphous noise cluster. If a noisy point is far away from all the K clusters, it is attracted to the noise cluster. 2.Possibilistic C-means PCM can find those natural clusters in the data set. When K is smaller than the number of actual clusters, only K good clusters are found, and the other data points are treated as outliers. When K is larger than the number of actual clusters, all the actual clusters can be found and some clusters will coincide. In the noise clustering, there is only one noise cluster, while in the PCM there are K noise clusters.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 23 Cluster validity Measures based on maximal compactness and maximal separation of clusters Measures based on minimal hyper-volume and maximal density of clusters A good partitioning of the data usually leads to a small total hypervolume and a large average density of the clusters.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 24 Computer simulations
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 25 Summary A state-of-the-art survey and introduction to neural network based clustering were provided in this paper.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 26 Comments Advantage This paper reviewed state-of-the-art clustering methods based on competitive learning. Several effective solutions which overcome those problems occurred in clustering methods were mentioned as well. Drawback There are few performance indices for clustering validity were introduced in this paper. Application Clustering methods.