PARTITIONAL CLUSTERING

Slides:



Advertisements
Similar presentations
K-Means Clustering Algorithm Mining Lab
Advertisements

Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
Component Analysis (Review)
PARTITIONAL CLUSTERING
CS690L: Clustering References:
Clustering: Introduction Adriano Joaquim de O Cruz ©2002 NCE/UFRJ
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Outline Data with gaps clustering on the basis of neuro-fuzzy Kohonen network Adaptive algorithm for probabilistic fuzzy clustering Adaptive probabilistic.
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Chapter 2: Pattern Recognition
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering.
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
What is Cluster Analysis?
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Evaluating Performance for Data Mining Techniques
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
DATA MINING CLUSTERING K-Means.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Data Mining Spring 2007 Noisy data Data Discretization using Entropy based and ChiMerge.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.
May 2003 SUT Color image segmentation – an innovative approach Amin Fazel May 2003 Sharif University of Technology Course Presentation base on a paper.
Multivariate statistical methods Cluster analysis.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Multivariate statistical methods
Fuzzy Logic in Pattern Recognition
Data Mining: Basic Cluster Analysis
Clustering CSC 600: Data Mining Class 21.
Chapter 7. Classification and Prediction
Slides by Eamonn Keogh (UC Riverside)
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Data Mining K-means Algorithm
Basic machine learning background with Python scikit-learn
Clustering (3) Center-based algorithms Fuzzy k-means
CSE 5243 Intro. to Data Mining
AIM: Clustering the Data together
Fall Risk Assessment.
REMOTE SENSING Multispectral Image Classification
Ying shen Sse, tongji university Sep. 2016
Roberto Battiti, Mauro Brunato
DATA MINING Introductory and Advanced Topics Part II - Clustering
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Fixed, Random and Mixed effects
Text Categorization Berlin Chen 2003 Reference:
Feature Selection Methods
Principal Component Analysis
EM Algorithm and its Applications
Presentation transcript:

PARTITIONAL CLUSTERING Deniz ÜSTÜN

CONTENT WHAT IS CLUSTERING ? WHAT IS PARTITIONAL CLUSTERING ? THE USED ALGORITHMS IN PARTITIONAL CLUSTERING

What is Clustering ? A process of clustering is classification of the objects which are similar among them, and organizing of data into groups. The techniques for Clustering are among the unsupervised methods.

What is Partitional Clustering ? The Partitional Clustering Algorithms separate the similar objects to the Clusters. The Partitional Clustering Algorithms are succesful to determine center based Cluster. The Partitional Clustering Algorithms divide n objects to k cluster by using k parameter. The techniques of the Partitional Clustering start with a randomly chosen clustering and then optimize the clustering according to some accuracy measurement.

The Used Algorithms in Partitional Clustering K-MEANS ALGORITHM K-MEDOIDS ALGORITHM FUZZY C-MEANS ALGORITHM

K-MEANS ALGORITHM K-MEANS algorithm is introduced as one of the simplest unsupervised learning algorithms that resolve the clustering problems by J.B. MacQueen in 1967 (MacQueen, 1967). K-MEANS algorithm allows that one of the data belong to only a cluster. Therefore, this algorithm is a definite clustering algorithm. Given the N-sample of the clusters in the N-dimensional space.

K-MEANS ALGORITHM This space is separated, {C1,C2,…,Ck} the K clusters. The vector mean (Mk) of the Ck cluster is given (Kantardzic, 2003) : where the value of Xk is i.sample belong to Ck. The square-error formula for the Ck is given :

K-MEANS ALGORITHM The square-error formula for the Ck is called the changing in cluster. The square-error for all the clusters is the sum of the changing in clusters. The aim of the square-error method is to find the K clusters that minimize the value of the Ek2 according to the value of the given K

K-MEANS ALGORITHM EXAMPLE Gözlemler Değişken1 Değişken2 Küme Üyeliği X1 3 2 C1 X2 C2 X3 7 8

K-MEANS ALGORITHM EXAMPLE

K-MEANS ALGORITHM EXAMPLE Gözlemler d(M1) d(M2) Küme Üyeliği X1 2,82 1,41 C2 X2 3,60 X3 7,07 C1

K-MEANS ALGORITHM EXAMPLE Gözlemler Değişken1 Değişken2 Küme Üyeliği X1 3 2 C2 X2 X3 7 8 C1

K-MEANS ALGORITHM EXAMPLE

K-MEANS ALGORITHM EXAMPLE-1 Gözlemler d(M1) d(M2) Küme Üyeliği X1 7,21 0,7 C2 X2 7,07 X3 7,10 C1 C1 C2

K-MEANS ALGORITHM EXAMPLE-2 Dataset The Number of Attributes The Number of Features The Number of Class Synthetic 1200 2 4

K-MEANS ALGORITHM EXAMPLE-2

K-MEANS ALGORITHM EXAMPLE-2

K-MEANS ALGORITHM EXAMPLE-2

K-MEDOIDS ALGORITHM The aim of the K-MEDOIDS algorithm is to find the K representative objects (Kaufman and Rousseeuw, 1987). Each cluster in K-MEDOIDS algorithm is represented by the object in cluster. K-MEANS algorithm determine the clusters by the mean process. However, K-MEDOIDS algorithm find the cluster by using mid-point.

K-MEDOIDS ALGORITHM EXAMPLE-1

K-MEDOIDS ALGORITHM EXAMPLE-1 Select the Randomly K-Medoids

K-MEDOIDS ALGORITHM EXAMPLE-1 Allocate to Each Point to Closest Medoid

K-MEDOIDS ALGORITHM EXAMPLE-1 Allocate to Each Point to Closest Medoid

K-MEDOIDS ALGORITHM EXAMPLE-1 Allocate to Each Point to Closest Medoid

K-MEDOIDS ALGORITHM EXAMPLE-1 Determine New Medoid for Each Cluster

K-MEDOIDS ALGORITHM EXAMPLE-1 Determine New Medoid for Each Cluster

K-MEDOIDS ALGORITHM EXAMPLE-1 Allocate to Each Point to Closest Medoid

K-MEDOIDS ALGORITHM EXAMPLE-1 Stop Process

K-MEDOIDS ALGORITHM EXAMPLE-2 Dataset The Number of Attributes The Number of Features The Number of Class Synthetic 2000 2 3

K-MEDOIDS ALGORITHM EXAMPLE-2

K-MEDOIDS ALGORITHM EXAMPLE-2

FUZZY C-MEANS ALGORITHM Fuzzy C-MEANS algorithm is the best known and widely used a method. Fuzzy C-MEANS algorithm is introduced by DUNN in 1973 and improved by BEZDEK in 1981 [Höppner vd, 2000]. Fuzzy C-MEANS lets that objects are belonging to two and more cluster. The total value of the membership of a data for all the classes is equal to one. However, the value of the memebership of the cluster that contain this object is high than other clusters. This Algorithm is used the least squares method [Höppner vd, 2000].

FUZZY C-MEANS ALGORITHM The algorithm start by using randomly membership matrix (U) and then the center vector calculate [Höppner vd, 2000].

FUZZY C-MEANS ALGORITHM According to the calculated center vector, the membership matrix (u) is computed by using the given as: The new membership matrix (unew) is compared with the old membership matrix (uold) and the the process continues until the difference is smaller than the value of the ε

FUZZY C-MEANS ALGORITHM EXAMPLE Dataset The Number of Attributes The Number of Features The Number of Class Synthetic 2000 2 3

FUZZY C-MEANS ALGORITHM EXAMPLE

Results K-MEDOIDS is the best algorithm according to K-MEANS and FUZZY C-MEANS. However, K-MEDOIDS algorithm is suitable for small datasets. K-MEANS algorithm is the best appropriate in terms of time. In FUZZY C-MEANS algorithm, a object can belong to one or more cluster. However, a object can belong to only a cluster in the other two algorithms.

References [MacQueen, 1967] J.B., MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations”, Proc. Symp. Math. Statist.and Probability (5th), 281-297,(1967). [Kantardzic, 2003] M., Kantardzic, “Data Mining: Concepts, Methods and Algorithms”, Wiley, (2003). [Kaufman and Rousseeuw, 1987] L., Kaufman, P. J., Rousseeuw, “Clustering by Means of Medoids,” Statistical Data Analysis Based on The L1–Norm and Related Methods, edited by Y. Dodge, North-Holland, 405–416, (1987). [Kaufman and Rousseeuw, 1990] L., Kaufman, P. J., Rousseeuw, “Finding Groups in Data: An Introduction to Cluster Analysis”, John Wiley and Sons., (1990). [Höppner vd, 2000] F., Höppner, F., Klawonn, R., Kruse, T., Runkler, “Fuzzy Cluster Analysis”, John Wiley&Sons, Chichester, (2000). [Işık and Çamurcu, 2007] M., Işık, A.Y., Çamurcu, “K-MEANS, K-MEDOIDS ve Bulanık C-MEANS Algoritmalarının Uygulamalı olarak Performanslarının Tespiti”, İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, Sayı :11, 31-45, (2007).