INITIALISATION OF K-MEANS

Slides:



Advertisements
Similar presentations
Different Perspectives at Clustering: The Number-of-Clusters Case B. Mirkin School of Computer Science Birkbeck College, University of London IFCS 2006.
Advertisements

L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:
Document Clustering Carl Staelin. Lecture 7Information Retrieval and Digital LibrariesPage 2 Motivation It is hard to rapidly understand a big bucket.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering.
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
N. Kumar, Asst. Professor of Marketing Database Marketing Cluster Analysis.
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Clustering II.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters.
Cluster Analysis: Basic Concepts and Algorithms
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
COLOR TEXTURE SEGMENTATION USING FEATURE DISTRIBUTIONS.
Clustering: Tackling Challenges with Data Recovery Approach B. Mirkin School of Computer Science Birkbeck University of London Advert of a Special Issue:
Метод К-средних в кластер-анализе и его интеллектуализация Б.Г. Миркин Профессор, Кафедра анализа данных и искусственного интеллекта, НИУ ВШЭ Москва РФ.
Math 5364 Notes Chapter 8: Cluster Analysis Jesse Crawford Department of Mathematics Tarleton State University.
Data clustering: Topics of Current Interest Boris Mirkin 1,2 1 National Research University Higher School of Economics Moscow RF 2 Birkbeck University.
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Clustering.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
A split-and-merge framework for 2D shape summarization D. Gerogiannis, C. Nikou and A. Likas Department of Computer Science, University of Ioannina, Greece.
CURE: EFFICIENT CLUSTERING ALGORITHM FOR LARGE DATASETS VULAVALA VAMSHI PRIYA.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Machine Learning Queens College Lecture 7: Clustering.
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
CSE4334/5334 Data Mining Clustering. What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related)
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Data Mining – Algorithms: K Means Clustering
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Clustering Anna Reithmeir Data Mining Proseminar 2017
Data Mining: Basic Cluster Analysis
Cluster Analysis II 10/03/2012.
Clustering CSC 600: Data Mining Class 21.
Philippe Biela – Journée ClasSpec - EMD – 6/07/2007
Data Mining K-means Algorithm
Mikhail Bilenko, Sugato Basu, Raymond J. Mooney
Topic 3: Cluster Analysis
K-means and Hierarchical Clustering
Clustering.
Revision (Part II) Ke Chen
Information Organization: Clustering
Lecture 21 Clustering (2).
Multivariate Statistical Methods
Data Mining – Chapter 4 Cluster Analysis Part 2
Cluster Analysis.
Topic 5: Cluster Analysis
Clustering The process of grouping samples so that the samples are similar within each group.
SEEM4630 Tutorial 3 – Clustering.
Cluster analysis Presented by Dr.Chayada Bhadrakom
EM Algorithm and its Applications
Hierarchical Clustering
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Introduction to Machine learning
Presentation transcript:

INITIALISATION OF K-MEANS A COMPLEMENTARY SQUARE-ERROR CLUSTERING CRITERION AND INITIALISATION OF K-MEANS BORIS MIRKIN DATA ANALYSIS& MACHINE INTELLIGENCE, NATIONAL RESEARCH UNIVERSITY HIGHER SCHOOL OF ECONOMICS, MOSCOW RF COMPUTER SCIENCE & INFORMATION SYSTEMS, UNIVERSITY LONDON UK S U P P O RT O F T H E N R U H S E A C A D E M I C F U N D P RO G R A M ( G R A N T A S U B S I DY G R A N T E D TO T H E N R U H S E BY T H E R F G OV E R N M E N T BIRKBECK 16 - 01 - 0085 D U E TO F O R T H E I M P L E M E N TAT I O N O F T H E G LO B A L C O M P E T I T I V E N E S S P RO G R A M I N 2 0 1 6 – 201 7 ) I S A C K N OW L E D G E D. BigData and DM: Paris 7-8 September 2017 1

CONTENTS 1. Batch k-means algorithm and criterion 2. Data scatter decomposition and the complementary clustering criterion (CCC) 3. 4. A review of k-means initialization Extracting anomalous clusters one-by-one 5. Experiments with ik-means clustering 6. Ward agglomeration (WA) preceded by ik-means Affinity Propagation with CCC PAC = AP-CCC+WA: Capturing right number of 7. 8. clusters 2 BigData and DM: Paris 7-8 September 2017

ck, Sk Centers sets (k=1,…, K) (a) Initialize (b) Assign entities to nearest center (c) Cluster update BigData and DM: Paris 7-8 September 2017 (d) Center update 3

1. Cluster update: Update sets Sk (k=1,…, K) Cluster k: center ck and set Sk (k=1,…, K) K-means: 0. Initialize: Specify K, number of clusters, and initial centers ck (k=1,…, K) 1. Cluster update: Update sets Sk (k=1,…, K) using Minimum distance rule 2. Center update: Update centers ck (k=1,…, K) as means of Sk 3. Halt condition: If new centers coincide with the previous ones, stop. Else go to 1. BigData and DM: Paris 7-8 September 2017 4

K-MEANS CRITERIO N � �, �� �=� �∈��� Find partition S and minimize: 𝑲 centers c to 𝑾 𝑺, � = � �, �� �=� �∈��� Criterion: Sum of distances between entities and centers of their clusters Distance X= [ Y= [ (squared Euclidean): -2] -1] -2-(-1)]=[0, 3, -1] 1, 2, -1, 2-(-1), X-Y=[1-1, d(X,Y)=<X - Y, X - Y>=02+32+(-1)2 = 10 5 BigData and DM: Paris 7-8 September 2017

PYTHAGOREAN DECOMPOSITION    yiv    yiv N k  ck ,ck     K-Means criterion: DECOMPOSITION K V    k 1 iSk v 1 2 W ( S ,c )  ( yiv - c )  2 kv K V    k 1 iSk v 1   c 2 ( yiv - 2y c )  iv kv kv K V   k 1 v 1 K  k 1 2  yiv N k  ck ,ck   T  D( S ,c ) T = D(S,c) + W(S,c) Data_Scatter = “Explained”+”Unexplained” 6 BigData and DM: Paris 7-8 September 2017

�� �� � � � �, �� 𝑵� < �� , �� > where Nk is the number of K-Means minimizes: 𝑲 Complementary criterion: 𝑾 𝑺, � = � �, �� Maximize 𝑫 𝑺, � = 𝑲 �=� 𝑵� < �� , �� > �=� �∈��� over S and c. where Nk is the number of entities in S �� �� � � Data scatter = �, 𝒗 k = W(S,c) + D(S,c) <ck, ck> - Euclidean squared distance between 0 and ck Data scatter is constant while partitioning 7 BigData and DM: Paris 7-8 September 2017

COMPLEMENTARY CRITERION 𝑲 �=� Maximize 𝑫 � �, � = #��� < �� , �� > 0 is grand mean Pre-center data: <ck, ck> - Euclidean squared distance 0 to Look for anomalous & populated ck clusters!!! Further away from the origin ! 8 BigData and DM: Paris 7-8 September 2017

DETERMINING THE NUMBER OF CLUSTERS: TWO WAYS A) Extract clusters one-by-one: find it an anomalous cluster, then remove First Second B) Determine objects, both most distant and representative, in parallel; then k-means BigData and DM: Paris 7-8 September 2017 9

ANOMALOUS CLUSTER Cluster update: If d(yi,c)< d(yi,0), assign yi 1. 0. Initial center c is object, farthest away from 2. to Cluster update: If d(yi,c)< d(yi,0), assign yi S. 3. Centroid update: Within-S mean c c'. Otherwise, halt. c' if c'  c. Go to 2 with 10 BigData and DM: Paris 7-8 September 2017

FINDING AN ANOMALOUS CLUSTER (MIRKIN 1998, CHIANG&MIRKIN 2010) Anomalous cluster S with center c: max #S< �, � > 0 is Reference point (grand mean). Build Anomalous cluster S with center c 11 BigData and DM: Paris 7-8 September 2017

Anomalous Cluster is (almost) K-Means up to: (i) the number of clusters K=2: the the “main body” of “anomalous” one entities around 0; (ii) center of forcibly always at and the “main body” cluster 0; is (iii) natural initialization: c0 is at which is the farthest away from 0. entity 12 BigData and DM: Paris 7-8 September 2017

IK-MEANS 1. Pre-center the data matrix to grand-mean, set threshold t (=1 by default). Find Anomalous cluster and store its center and size. 2. 3. Remove set. Halt Initialize the Anomalous cluster from data if the set gets empty, else: go to k-means with centers of those 2. 4. anomalous clusters whose size  t 13 BigData and DM: Paris 7-8 September 2017

CLUSTERING EXPERIMENTS (CHIANG&MIRKIN 2010) iK-Means is superior in cluster recovery (Chiang, Mirkin, Journal of Classification, 2010) over Method Acronym Calinski and Harabasz index CH Hartigan rule HK Gap statistic GS Jump statistic JS Silhouette width SW Consensus distribution area CD Average distance between partitions DD Square error iK-Means LS Absolute error iK-Means LM 14 BigData and DM: Paris 7-8 September 2017

ACCELERATING WARD CLUSTERING Agglomerative Ward clustering: 1. 2. Start with trivial partition S={{1},{2},…, {N}} Given S, form S(k,l) by merging min (k,l) = W(S(k,l),c(k,l)) Sk and Sl – W(S,c) 3. Stop when S={1,2,…, N} 15 BigData and DM: Paris 7-8 September 2017

min (k,l) = W(S(k,l),c(k,l)) – W(S,c) ACCELERATING WARD CLUSTERING (AMORIM, MAKARENKOV, MIRKIN, 2016) A-Ward clustering: 1. 2. Start with S resulting from ik-means at t=1 Given S, form S(k,l) by merging Sk and Sl min (k,l) = W(S(k,l),c(k,l)) – W(S,c) Stop when S={1,2,…, N} Experimental result: 3. A-Ward always gives a larger K; is about 10-15 times faster than Ward at similar cluster recovery 16 BigData and DM: Paris 7-8 September 2017

PARALLEL CLUSTER CENTERS,1 Affinity Propagation (Frey&Dueck, s(i,j)=d2(i,j) r(i,i)=const a(i,i)=0 2008) 1. Similarity field “Responsibility” “Availability” 2. Exchange process: r(i,j)  (r(i,j)+ bij – maxlj [a(i,l) + bil])/2 a(i,j)  (a(i,j)+ + min[0, r(k,k)+ max (0, ��(��, ��)) ])/2 �� ≠�, � 3. Choose i with maximal responsibility. BigData and DM: Paris 7-8 September 2017 17

PARALLEL CLUSTER CENTERS,2 Affinity Propagation criterion (APC) at Complementary 1. Similarity field s(i,j)=<yi,yj> r(i,i)=<yi,yj> a(i,i)=0 “Responsibility” “Availability” 2. Exchange process: r(i,j)  (r(i,j)+ bij – maxlj [a(i,l) + bil])/2 a(i,j)  (a(i,j)+ + min[0, r(k,k)+ max (0, ��(��, ��)) ])/2 �� ≠�, � 3. Choose i with responsibility r(i,i)>mean BigData and DM: Paris 7-8 September 2017 18

PARALLEL CLUSTER CENTERS,3 Affinity Propagation criterion (APC) at Complementary 1. Similarity field s(i,j)=<yi,yj> r(i,i)=<yi,yj> “Responsibility” Why? Because: � 𝑲 �=� D(S,c) = < 𝒚 , 𝒚 > �,�∈ Sk � � #� �� 19 BigData and DM: Paris 7-8 September 2017

(k,l) = W(S(k,l),c(k,l)) – W(S,c) PARALLEL ANOMALOUS CLUSTERS: (PAC) (MIRKIN, TOKMAKOV, AMORIM, MAKARENKOV, IN PROGRESS) 1. Affinity Propagation at Complementary criterion (APC) followed by K-Means 2. A-Ward till a stop-condition based (k,l) = W(S(k,l),c(k,l)) – W(S,c) K-Means reiterated on 3. 20 BigData and DM: Paris 7-8 September 2017

PARALLEL ANOMALOUS CLUSTERS: sf=.75 sf=.50 sf=.25 Generated Gaussian K* clusters at some sf 0.06 0.08 0.08 0.04 0.06 0.06 0.04 0.04 0.02 0.02 0.02 u2 u2 u2 -0.02 -0.02 -0.02 -0.04 -0.04 -0.04 -0.06 -0.06 -0.06 -0.06 -0.04 -0.02 0 u1 0.02 0.04 0.06 -0.08 -0.06 -0.04 -0.02 0 u1 0.02 0.04 0.06 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 u1 -0.08 0.08 Experimental computation: At K* = 7, 15, 25 sf=0.75, 0.50, PAC finds K* exactly sf=0.25, PAC finds K* exactly at K*=15, 25. BigData and DM: Paris 7-8 September 2017 21

CONCLUSION The complementary criterion (CCC) has a natural meaning: Big anomalous clusters! Two ways to advance: Sequential: • • • ik-means with a good cluster recovery A-Ward: an effective agglomerative method • Parallel: • CCC based Affinity Propagation PAC, a k-means method definitively capturing K 22 BigData and DM: Paris 7-8 September 2017