Fuzzy C-Means Clustering

Slides:



Advertisements
Similar presentations
Cluster Analysis: Basic Concepts and Algorithms
Advertisements

Data Mining Cluster Analysis Basics
Clustering Basic Concepts and Algorithms
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Techniques: Clustering
Introduction to Bioinformatics
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
What is Cluster Analysis?
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like diagram that.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Cluster Analysis: Basic Concepts and Algorithms
Cluster Analysis (1).
What is Cluster Analysis?
Cluster Analysis CS240B Lecture notes based on those by © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004.
Fuzzy K means.
What is Cluster Analysis?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Data Mining Strategies. Scales of Measurement  Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103,  Four Scales  Categorical.
DATA MINING LECTURE 8 Clustering The k-means algorithm
Clustering Algorithms Mu-Yu Lu. What is Clustering? Clustering can be considered the most important unsupervised learning problem; so, as every other.
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
DATA MINING CLUSTERING K-Means.
Digital Image Processing In The Name Of God Digital Image Processing Lecture8: Image Segmentation M. Ghelich Oghli By: M. Ghelich Oghli
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Jeff Howbert Introduction to Machine Learning Winter Clustering Basic Concepts and Algorithms 1.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Adapted from Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Clustering 2 – K Means contd & Hierarchical Methods Data Clustering – An IntroductionSlide 1.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Clustering.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Cluster Analysis: Basic Concepts and Algorithms.
Machine Learning Queens College Lecture 7: Clustering.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Clustering/Cluster Analysis. What is Cluster Analysis? l Finding groups of objects such that the objects in a group will be similar (or related) to one.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Fuzzy C-means Clustering Dr. Bernard Chen University of Central Arkansas.
K-MEANS CLUSTERING. INTRODUCTION- What is clustering? Clustering is the classification of objects into different groups, or more precisely, the partitioning.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
ΠΑΝΕΠΙΣΤΗΜΙΟ ΙΩΑΝΝΙΝΩΝ ΑΝΟΙΚΤΑ ΑΚΑΔΗΜΑΪΚΑ ΜΑΘΗΜΑΤΑ Εξόρυξη Δεδομένων Ομαδοποίηση (clustering) Διδάσκων: Επίκ. Καθ. Παναγιώτης Τσαπάρας.
Unsupervised Learning
Fuzzy Logic in Pattern Recognition
Data Mining: Basic Cluster Analysis
Clustering CSC 600: Data Mining Class 21.
Data Mining K-means Algorithm
Data Clustering Michael J. Watts
Clustering Basic Concepts and Algorithms 1
Critical Issues with Respect to Clustering
Fuzzy Clustering Algorithms
Text Categorization Berlin Chen 2003 Reference:
Unsupervised Learning
Presentation transcript:

Fuzzy C-Means Clustering Thực hiện: Châu Vĩnh Tuân - 50802429 Phạm Nguyên Trình - 50802353

What is clustering? Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should capture the natural structure of the data. In some cases, however, cluster analysis is only a useful starting point for other purposes, such as data summarization. Cluster analysis groups data objects based only on information found in the data that describes the objects and their relationships. The goal is that the objects within a group be similar (or related) to one another and different from (or unrelated to) the objects in other groups. The greater the similarity (or homogeneity) within a group and the greater the difference between groups, the better or more distinct the clustering.

Where has clustering long played as an important role? Clustering for Understanding Biology. Information Retrieval. Climate Psychology and Medicine. Business Clustering for Utility Summarization Compression Efficiently Finding Nearest Neighbors

Different Types of Clusterings Hierarchical versus Partitional Exclusive versus Overlapping versus Fuzzy Complete versus Partial

Hierarchical versus Partitional Traditional Non- Traditional

Exclusive versus Overlapping versus Fuzzy Exclusive versus Overlapping (non-Exclusive) In non-exclusive clusterings, points may belong to multiple clusters. Can represent multiple classes or ‘border’ points Fuzzy In fuzzy clustering, a point belongs to every cluster with some weight between 0 and 1 Weights must sum to 1 Probabilistic clustering has similar characteristics

Complete versus Partial All data must be clustered Partial Just cluster some useful data

Different Types of Clusters Well-Separated Prototype-Based Graph-Based Density-Based Shared-Property (Conceptual Clusters)

Some important algorithms We preview the following three simple, but important techniques to introduce many of the concepts involved in cluster analysis. K-means. This is a prototype-based, partitional clustering technique that attempts to find a user-specified number of clusters (K ), which are represented by their centroids. Agglomerative Hierarchical Clustering. This clustering approach refers to a collection of closely related clustering techniques that produce a hierarchical clustering by starting with each point as a singleton cluster and then repeatedly merging the two closest clusters until a single, all-encompassing cluster remains. Some of these techniques have a natural interpretation in terms of graph-based clustering, while others have an interpretation in terms of a prototype-based approach. DBSCAN. This is a density-based clustering algorithm that produces a partitional clustering, in which the number of clusters is automatically determined by the algorithm. Points in low-density regions are classi-fied as noise and omitted; thus, DBSCAN does not produce a complete clustering.

Fuzzy Logic Fuzzy Logic is a form of many-valued logic. Fuzzy Logic variables may have a truth value that ranges in degree between [ 0, 1 ]

Fuzzy Set Fuzzy sets are sets whose elements have degrees of membership. A fuzzy set is a pair ( A , m ) where A is a set and m : A  [ 0 , 1 ] For each x  A , m(x) is called the grade of membership of x in (A,m). For a finite set A = {x1,...,xn}, the fuzzy set (A,m) is often denoted by{m(x1) / x1,...,m(xn) / xn}. m(x) = 0 : x is not included in (A, m) m(x) = 1: x is fully included in (A, m)

Fuzzy C-Means Clustering Fuzzy c-means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters Be frequently used in pattern recognition.

Fuzzy C-Means Clustering Base on minimization of the following objective function: m is any real number greater than 1 uij is the degree of membership of xi in the cluster j xi is the i-th of d-dimensional measured data cj is the d-dimension center of the cluster ||*|| is any norm expressing the similarity between any measured data and the center

FCM algorithm The algorithm is composed of the following steps Initialize U=[uij] matrix, U(0) At k-step: calculate the centers vectors C(k)=[cj] with U(k)

FCM algorithm The algorithm is composed of the following steps Update U(k) , U(k+1) If ||U(k+1) - U(k)||< ε (maxij {|uij(k+1)-uij(k)|}) then STOP; otherwise return to step 2.

FCM advantages Gives best result for overlapped data set and comparatively better then k-means algorithm. Unlike k-means where data point must exclusively belong to one cluster center here data point is assigned membership to each cluster center as a result of which data point may belong to more then one cluster center.

FCM disadvantages Apriori specification of the number of clusters. With lower value of  ε we get the better result but at the expense of  more number of iteration. Euclidean distance measures can unequally weight underlying factors.

FCM demo http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletFCM.html