1 《智能信息处理》课程 第四讲 模糊信息处理技术( 4 ) 模糊聚类原理 2008 年 10 月 17 日 (星期五 3 、 4 节, 理教 110 )

Slides:



Advertisements
Similar presentations
Copyright Jiawei Han, modified by Charles Ling for CS411a
Advertisements

What is Cluster Analysis?
Clustering Basic Concepts and Algorithms
PARTITIONAL CLUSTERING
CS690L: Clustering References:
Clustering: Introduction Adriano Joaquim de O Cruz ©2002 NCE/UFRJ
Data Mining Techniques: Clustering
Clustering II.
Clustering.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Segmentação (Clustering) (baseado nos slides do Han)
Clustering.
What is Cluster Analysis?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
What is Cluster Analysis?
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Radial Basis Function Networks
Clustering Unsupervised learning Generating “classes”
Clustering Algorithms Mu-Yu Lu. What is Clustering? Clustering can be considered the most important unsupervised learning problem; so, as every other.
Hierarchical clustering & Graph theory
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Cluster Analysis Part I
CLUSTER ANALYSIS.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Clustering.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Fuzzy C-Means Clustering
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Data Mining Algorithms
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Mr. Idrissa Y. H. Assistant Lecturer, Geography & Environment Department of Social Sciences School of Natural & Social Sciences State University of Zanzibar.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Classification Categorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Nearest Neighbour and Clustering. Nearest Neighbour and clustering Clustering and nearest neighbour prediction technique was one of the oldest techniques.
Topic 4: Cluster Analysis Analysis of Customer Behavior and Service Modeling.
COMP24111 Machine Learning K-means Clustering Ke Chen.
GROUP 6 KIIZA FELIX 2013/BIT/110 MUHANGUZI EUSTUS 2013/BIT/104/PS TUGIROKWIKIRIZA FLAVIA 2013/BIT/111/PS HAMSTONE NATOSHA 2013/BIT/122/PS GILBERT MUMBERE.
Cluster Analysis This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed under a Creative Commons.
Data Mining Comp. Sc. and Inf. Mgmt. Asian Institute of Technology
What Is Cluster Analysis?
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 10 —
Data Mining K-means Algorithm
Topic 3: Cluster Analysis
Fuzzy Clustering.
Dr. Unnikrishnan P.C. Professor, EEE
CSCI N317 Computation for Scientific Applications Unit Weka
Clustering Wei Wang.
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Text Categorization Berlin Chen 2003 Reference:
Topic 5: Cluster Analysis
CSE572: Data Mining by H. Liu
Presentation transcript:

1 《智能信息处理》课程 第四讲 模糊信息处理技术( 4 ) 模糊聚类原理 2008 年 10 月 17 日 (星期五 3 、 4 节, 理教 110 )

2 Fuzzy Clustering  What’s clustering?  Some concepts  Clustering Algorithms  K-means method  Fuzzy C-means (FCM) clustering method  Hierarchical Clustering Algorithms  Mixture of Gaussians  Homework

3 What’s clustering ?  Clustering can be considered the most important unsupervised learning problem, it deals with finding a structure in a collection of unlabeled data.  Definition of clustering The process of organizing objects into groups whose members are similar in some way.  A cluster is a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.

4 a graphical example of clustering

5  It is easily to identify the 4 clusters into which the data can be divided; the similarity criterion is distance: two or more objects belong to the same cluster if they are “close” according to a given distance (in this case geometrical distance). This is called distance-based clustering.  Another kind of clustering is conceptual clustering: two or more objects belong to the same cluster if this one defines a concept common to all that objects.

6 Vehicle Example

7 Vehicle Clusters Top speed [km/h] Weight [kg] Sports cars Medium market cars Lorries

8 Terminology Top speed [km/h] Weight [kg] Sports cars Medium market cars Lorries Object or data point feature feature space cluster feature label

9 The Goals of Clustering  To determine the intrinsic grouping in a set of unlabeled data. How to decide what constitutes a good clustering?

10 The Goals of Clustering ( 2 )  It can be shown that there is no absolute “best” criterion which would be independent of the final aim of the clustering.  Consequently, it is the user which must supply this criterion, in such a way that the result of the clustering will suit their needs.

11 The Goals of Clustering ( 3 )  For instance, we could be interested in finding representatives for homogeneous groups (data reduction), in finding “natural clusters” and describe their unknown properties (“natural” data types), in finding useful and suitable groupings (“useful” data classes) or in finding unusual data objects (outlier detection).

12 Rich Applications of Clustering  Pattern Recognition  Spatial Data Analysis  Create thematic maps in GIS by clustering feature spaces  Detect spatial clusters or for other spatial mining tasks  Image Processing  Economic Science (especially market research)  WWW  Document classification  Cluster Weblog data to discover groups of similar access patterns

13 Examples of Clustering Applications  Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs  Land use: Identification of areas of similar land use in an earth observation database  Insurance: Identifying groups of motor insurance policy holders with a high average claim cost  City-planning: Identifying groups of houses according to their house type, value, and geographical location  Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults

14 What is Cluster Analysis?  Cluster: a collection of data objects  Similar to one another within the same cluster  Dissimilar to the objects in other clusters  Cluster analysis  Finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters  Unsupervised learning: no predefined classes  Typical applications  As a stand-alone tool to get insight into data distribution  As a preprocessing step for other algorithms

15 Requirements of a clustering algorithm  scalability;  dealing with different types of attributes;  discovering clusters with arbitrary shape;  minimal requirements for domain knowledge to determine input parameters;  ability to deal with noise and outliers;  insensitivity to order of input records;  high dimensionality;  interpretability and usability.

16 Quality: What Is Good Clustering?  A good clustering method will produce high quality clusters with  high intra-class similarity  low inter-class similarity  The quality of a clustering result depends on both the similarity measure used by the method and its implementation  The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns

17 Problems  current clustering techniques do not address all the requirements adequately (and concurrently);  dealing with large number of dimensions and large number of data items can be problematic because of time complexity;  the effectiveness of the method depends on the definition of “distance” (for distance-based clustering);  if an obvious distance measure doesn’t exist we must “define” it, which is not always easy, especially in multi- dimensional spaces;  the result of the clustering algorithm (that in many cases can be arbitrary itself) can be interpreted in different ways.

18 Clustering Algorithms Clustering algorithms may be classified as listed below:  Exclusive Clustering  Overlapping Clustering  Hierarchical Clustering  Probabilistic Clustering

19 Exclusive Clustering  Data are grouped in an exclusive way, so that if a certain datum belongs to a definite cluster then it could not be included in another cluster.  A simple example of that is shown in the figure below, where the separation of points is achieved by a straight line on a bi- dimensional plane.

20

21 Overlapping clustering  Overlapping clustering uses fuzzy sets to cluster data, so that each point may belong to two or more clusters with different degrees of membership.  In this case, data will be associated to an appropriate membership value.

22 Hierarchical Clustering  A hierarchical clustering algorithm is based on the union between the two nearest clusters. The beginning condition is realized by setting every datum as a cluster. After a few iterations it reaches the final clusters wanted.

23 Probabilistic Clustering  Probabilistic clustering uses a completely probabilistic approach for clustering the data in hand.

24 Four most used clustering algorithms  K-means  Fuzzy C-means  Hierarchical clustering  Mixture of Gaussians

25 Distance Measure  An important component of a clustering algorithm is the distance measure between data points.  If the components of the data instance vectors are all in the same physical units then it is possible that the simple Euclidean distance metric is sufficient to successfully group similar data instances.  However, even in this case the Euclidean distance can sometimes be misleading.

26

27 K-Means Clustering  K-means (MacQueen, 1967) is one of the simplest unsupervised learning algorithms that solve the well known clustering problem.MacQueen, 1967  The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori.  The main idea is to define k centroids, one for each cluster. These centroids shoud be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed and an early groupage is done. At this point we need to re-calculate k new centroids as barycenters of the clusters resulting from the previous step. After we have these k new centroids, a new binding has to be done between the same data set points and the nearest new centroid. A loop has been generated. As a result of this loop we may notice that the k centroids change their location step by step until no more changes are done. In other words centroids do not move any more. Finally, this algorithm aims at minimizing an objective function, in this case a squared error function. The objective function

28 K-Means Clustering  K-means (MacQueen, 1967) is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori.MacQueen, 1967  The main idea is to define k centroids, one for each cluster. These centroids shoud be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed and an early groupage is done. At this point we need to re-calculate k new centroids as barycenters of the clusters resulting from the previous step. After we have these k new centroids, a new binding has to be done between the same data set points and the nearest new centroid. A loop has been generated. As a result of this loop we may notice that the k centroids change their location step by step until no more changes are done. In other words centroids do not move any more. Finally, this algorithm aims at minimizing an objective function, in this case a squared error function. The objective function

29 Partitioning Algorithms: Basic Concept  Partitioning method: Construct a partition of a database D of n objects into a set of k clusters, s.t., min sum of squared distance  Given a k, find a partition of k clusters that optimizes the chosen partitioning criterion  Global optimal: exhaustively enumerate all partitions  Heuristic methods: k-means and k-medoids algorithms  k-means (MacQueen ’ 67): Each cluster is represented by the center of the cluster  k-medoids (Kaufman & Rousseeuw ’ 87): Each cluster is represented by one of the objects in the cluster

30 The K-Means Clustering Method  Given k, the k-means algorithm is implemented in four steps:  Partition objects into k nonempty subsets  Compute seed points as the centroids of the clusters of the current partition (the centroid is the center, i.e., mean point, of the cluster)  Assign each object to the cluster with the nearest seed point  Go back to Step 2, stop when no more new assignment

31 The K-Means Clustering Method  Example K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center Update the cluster means reassign

32 Comments on the K-Means Method  Strength: Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. Comparing: PAM: O(k(n-k) 2 ), CLARA: O(ks 2 + k(n-k))  Comment: Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms  Weakness  Applicable only when mean is defined, then what about categorical data?  Need to specify k, the number of clusters, in advance  Unable to handle noisy data and outliers  Not suitable to discover clusters with non-convex shapes

33 Fuzzy C-Means Clustering  Fuzzy c-means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters. This method (developed by Dunn in 1973 and improved by Bezdek in 1981) is frequently used in pattern recognition. It is based on minimization of the following objective function:Dunn in 1973Bezdek in 1981  where m is any real number greater than 1, u ij is the degree of membership of x i in the cluster j, x i is the ith of d-dimensional measured data, c j is the d-dimension center of the cluster, and ||*|| is any norm expressing the similarity between any measured data and the center.

34  Fuzzy partitioning is carried out through an iterative optimization of the objective function shown above, with the update of membership u ij and the cluster centers c j by This iteration will stop when

35 FCM’s Steps 1.Initialize U=[u ij ] matrix, U (0) 2.At k-step: calculate the centers vectors C (k) =[c j ] with U (k) 3.Update U (k), U (k+1) 4.If || U (k+1) - U (k) ||< then STOP; otherwise return to step 2.

36 Remarks  As already told, data are bound to each cluster by means of a Membership Function, which represents the fuzzy behavior of this algorithm. To do that, we simply have to build an appropriate matrix named U whose factors are numbers between 0 and 1, and represent the degree of membership between data and centers of clusters.

37 A 1-D example

38 matrix U Now, instead of using a graphical representation, we introduce a matrix U whose factors are the ones taken from the membership functions: (a) (b) The number of rows and columns depends on how many data and clusters we are considering. More exactly we have C = 2 columns (C = 2 clusters) and N rows.

39 Other properties

40 A 1-D application of the FCM Figures below show the membership value for each datum and for each cluster.

41 In the simulation, we have used a fuzzyness coefficient m = 2 and we have also imposed to terminate the algorithm when. The picture shows the initial condition where the fuzzy distribution depends on the particular position of the clusters. No step is performed yet so that clusters are not identified very well. Now we can run the algorithm until the stop condition is verified. The figure below shows the final condition reached at the 8th step with m=2 and =0.3:

42 Is it possible to do better? Certainly, we could use an higher accuracy but we would have also to pay for a bigger computational effort. In the figure below we can see a better result having used the same initial conditions and =0.01, but we needed 37 steps!

43 Hierarchical Clustering Algorithms Given a set of N items to be clustered, and an N*N distance (or similarity) matrix, the basic process of hierarchical clustering (defined by S.C. Johnson in 1967) is this:S.C. Johnson in Start by assigning each item to a cluster, so that if you have N items, you now have N clusters, each containing just one item. Let the distances (similarities) between the clusters the same as the distances (similarities) between the items they contain. 2.Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one cluster less. 3.Compute distances (similarities) between the new cluster and each of the old clusters. 4.Repeat steps 2 and 3 until all items are clustered into a single cluster of size N. (*)

44 Algorithm Steps 1.Begin with the disjoint clustering having level L(0) = 0 and sequence number m = 0. 2.Find the least dissimilar pair of clusters in the current clustering, say pair (r), (s), according to d[(r),(s)] = min d[(i),(j)] where the minimum is over all pairs of clusters in the current clustering. 3.Increment the sequence number : m = m +1. Merge clusters (r) and (s) into a single cluster to form the next clustering m. Set the level of this clustering to L(m) = d[(r),(s)] 4.Update the proximity matrix, D, by deleting the rows and columns corresponding to clusters (r) and (s) and adding a row and column corresponding to the newly formed cluster. The proximity between the new cluster, denoted (r,s) and old cluster (k) is defined in this way: d[(k), (r,s)] = min d[(k),(r)], d[(k),(s)] 5.If all objects are in one cluster, stop. Else, go to step 2.

45 agglomerative / divisive  This kind of hierarchical clustering is called agglomerative because it merges clusters iteratively.  There is also a divisive hierarchical clustering which does the reverse by starting with all objects in one cluster and subdividing them into smaller pieces.

46 Example  a hierarchical clustering of distances in kilometers between some Italian cities

47 Input distance matrix BAFIMINARMTO BA FI MI NA RM TO

48 MI , TO merged into MI/TO BAFIMI/TONARM BA FI MI/TO NA RM

49 merge NA and RM into a new NA/RM cluster BAFIMI/TONA/RM BA FI MI/TO NA/RM

50 BA/FI/NA/RMMI/TO BA/FI/NA/RM0295 MI/TO2950

51 Hierarchical tree

52 Clustering as a Mixture of Gaussians  a model-based approach, which consists in using certain models for clusters and attempting to optimize the fit between the data and the model.  Each cluster can be mathematically represented by a parametric distribution, like a Gaussian (continuous) or a Poisson (discrete). The entire data set is therefore modeled by a mixture of these distributions. An individual distribution used to model a specific cluster is often referred to as a component distribution

53 A mixture model with high likelihood tends to have the following traits:  component distributions have high “peaks” (data in one cluster are tight);  the mixture model “covers” the data well (dominant patterns in the data are captured by component distributions). Main advantages of model-based clustering:  well-studied statistical inference techniques available;  flexibility in choosing the component distribution;  obtain a density estimation for each cluster;  a “soft” classification is available.

54 Mixture of Gaussians

55 The algorithm works in the following way: it chooses the component (the Gaussian) at random with probability ; it samples a point. Let ’ s suppose to have: x 1, x 2,..., x N We can obtain the likelihood of the sample:. What we really want to maximise is (probability of a datum given the centres of the Gaussians).

56 is the base to write the likelihood function: Now we should maximise the likelihood function by calculating but it would be too difficult. That ’ s why we use a simplified algorithm called EM (Expectation-Maximization).

57 References  Tariq Rashid: “Clustering”  Osmar R. Zaïane: “Principles of Knowledge Discovery in Databases - Chapter 8: Data Clustering”  Pier Luca Lanzi: “Ingegneria della Conoscenza e Sistemi Esperti – Lezione 2: Apprendimento non supervisionato” %20Apprendimento%20non%20supervisionato.pdf %20Apprendimento%20non%20supervisionato.pdf  J. C. Dunn (1973): "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters", Journal of Cybernetics 3:  J. C. Bezdek (1981): "Pattern Recognition with Fuzzy Objective Function Algoritms", Plenum Press, New York  Tariq Rashid: “Clustering”  Hans-Joachim Mucha and Hizir Sofyan: “Nonhierarchical Clustering”  A.P. Dempster, N.M. Laird, and D.B. Rubin (1977): "Maximum Likelihood from Incomplete Data via theEM algorithm", Journal of the Royal Statistical Society, Series B, vol. 39, 1:1-38  Osmar R. Zaïane: “Principles of Knowledge Discovery in Databases - Chapter 8: Data Clustering”  Jia Li: “Data Mining - Clustering by Mixture Models”

58 Homework 1. 为什么需要聚类分析?它有什么作用? 2. 请列举出一些聚类算法的应用领域,并 简要说明。 3. 实现 FCM 算法,并用它处理一个 2-D 数 据的聚类问题,给出实验结果。

59 谢谢!