Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Slides:



Advertisements
Similar presentations
Lindsey Bleimes Charlie Garrod Adam Meyerson
Advertisements

Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Variable Metric For Binary Vector Quantization UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Ismo Kärkkäinen and Pasi Fränti.
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
WEI-MING CHEN k-medoid clustering with genetic algorithm.
Clustering II.
Lecture 21: Spectral Clustering
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
1 Partitioning Algorithms: Basic Concepts  Partition n objects into k clusters Optimize the chosen partitioning criterion Example: minimize the Squared.
“A Comparison of Document Clustering Techniques” Michael Steinbach, George Karypis and Vipin Kumar (Technical Report, CSE, UMN, 2000) Mahashweta Das
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Clustering methods Course code: Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu,
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Self-organizing map Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Pasi Fränti Clustering Methods: Part.
Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern.
Machine Learning Problems Unsupervised Learning – Clustering – Density estimation – Dimensionality Reduction Supervised Learning – Classification – Regression.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Randomized Algorithms Pasi Fränti Treasure island Treasure worth awaits 5000 DAA expedition 5000 ? ? Map for sale: 3000.
SEARCHING UNIT II. Divide and Conquer The most well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances.
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Pasi Fränti and.
Fast search methods Pasi Fränti Clustering methods: Part 5 Speech and Image Processing Unit School of Computing University of Eastern Finland
FAST DYNAMIC QUANTIZATION ALGORITHM FOR VECTOR MAP COMPRESSION Minjie Chen, Mantao Xu and Pasi Fränti University of Eastern Finland.
Color Image Segmentation Speaker: Deng Huipeng 25th Oct , 2007.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Genetic algorithms (GA) for clustering Pasi Fränti Clustering Methods: Part 2e Speech and Image Processing Unit School of Computing University of Eastern.
DIVERSITY PRESERVING EVOLUTIONARY MULTI-OBJECTIVE SEARCH Brian Piper1, Hana Chmielewski2, Ranji Ranjithan1,2 1Operations Research 2Civil Engineering.
DATA CLUSTERING WITH KERNAL K-MEANS++ PROJECT OBJECTIVES o PROJECT GOAL  Experimentally demonstrate the application of Kernel K-Means to non-linearly.
Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Sorting.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
A Fast LBG Codebook Training Algorithm for Vector Quantization Presented by 蔡進義.
Vector Quantization Vector quantization is used in many applications such as image and voice compression, voice recognition (in general statistical pattern.
Faculty of Information Engineering, Shenzhen University Liao Huilian SZU TI-DSPs LAB Aug 27, 2007 Optimizer based on particle swarm optimization and LBG.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Multilevel thresholding by fast PNN based algorithm UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Olli Virmajoki and Pasi Fränti.
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
How to cluster data Algorithm review Extra material for DAA Prof. Pasi Fränti Speech & Image Processing Unit School of Computing University.
Genetic Algorithms for clustering problem Pasi Fränti
S.R.Subramanya1 Outline of Vector Quantization of Images.
Agglomerative clustering (AC)
Semi-Supervised Clustering
Centroid index Cluster level quality measure
Random Swap algorithm Pasi Fränti
Sorting by Tammy Bailey
Machine Learning University of Eastern Finland
Content-based Image Retrieval
Random Swap algorithm Pasi Fränti
K-means properties Pasi Fränti
Foundation of Video Coding Part II: Scalar and Vector Quantization
Clustering Wei Wang.
Randomized Algorithms
Pasi Fränti and Sami Sieranoja
Dimensionally distributed Pasi Fränti and Sami Sieranoja
Clustering methods: Part 10
Presentation transcript:

Clustering Methods: Part 2d Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based algorithms

Part I: Random Swap algorithm P. Fränti and J. Kivijärvi Randomised local search algorithm for the clustering problem Pattern Analysis and Applications, 3 (4), , 2000.

Pseudo code of Random Swap

Demonstration of the algorithm

Centroid swap

Local repartition

Fine-tuning by K-means 1st iteration

Fine-tuning by K-means 2nd iteration

Fine-tuning by K-means 3rd iteration

Fine-tuning by K-means 16th iteration

Fine-tuning by K-means 17th iteration

Fine-tuning by K-means 18th iteration

Fine-tuning by K-means 19th iteration

Fine-tuning by K-means Final result after 25 iterations

Implementation of the swap 1. Random swap: 2. Re-partition vectors from old cluster: 3. Create new cluster:

Random swap as local search Study neighbor solutions

Select one and move Random swap as local search

Fine-tune solution by hill-climbing technique! Role of K-means

Consider only local optima! Role of K-means

Effective search space Role of swap: reduce search space

Chain reaction by K-means after swap

Independency of initialization Results for T = 5000 iterations Worst Best Initial

Part II: Efficiency of Random Swap

Probability of good swap Select a proper centroid for removal: –There are M clusters in total: p removal =1/M. Select a proper new location: –There are N choices: p add =1/N –Only M are significantly different: p add =1/M In total: –M 2 significantly different swaps. –Probability of each different swap is p swap =1/M 2 –Open question: how many of these are good?

Number of neighbors Open question: what is the size of neighborhood (  )? Voronoi neighbors Neighbors by distance

Observed number of neighbors Data set S 2

Average number of neighbors

Probability of not finding good swap: Expected number of iterations Estimated number of iterations:

Estimated number of iterations depending on T S1S1 S2S2 S3S3 S4S4 Observed = Number of iterations needed in practice. Estimated = Estimate of the number of iterations needed for given q

Probability of success (p) depending on T

Probability of failure (q) depending on T

Observed probabilities depending on dimensionality

Bounds for the number of iterations Upper limit: Lower limit similarly; resulting in:

Multiple swaps (w) Probability for performing less than w swaps: Expected number of iterations:

Number of swaps needed Example from image quantization

Efficiency of the random swap Total time to find correct clustering: –Time per iteration  Number of iterations Time complexity of a single step: –Swap: O(1) –Remove cluster: 2M  N/M = O(N) –Add cluster: 2N = O(N) –Centroids: 2  (2N/M) + 2  + 2 = O(N/M) –(Fast) K-means iteration: 4  N = O(  N) * * See Fast K-means for analysis.

Time complexity and the observed number of steps

Time spent by K-means iterations

Effect of K-means iterations

Total time complexity Number of iterations needed (T): t = O(αN) Total time: Time complexity of a single step (t):

Time complexity: conclusions 1.Logarithmic dependency on q 2.Linear dependency on N 3.Quadratic dependency on M (With large number of clusters, can be too slow) 4.Inverse dependency on  (worst case  = 2) (Higher the dimensionality and higher the cluster overlap, faster the method)

Time-distortion performance

References Random swap algorithm: P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), , P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), 1139 ‑ 1148, August Pseudo code: Efficiency of Random swap algorithm: P. Fränti, O. Virmajoki and V. Hautamäki, “Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR’08), Tampa, FL, Dec 2008.

Part III: Example when 4 swaps needed

MSE = 4.2 * 10 9 MSE = 3.4 * st swap

MSE = 3.1* 10 9 MSE = 3.0 * nd swap

MSE = 2.3 * 10 9 MSE = 2.1 * rd swap

MSE = 1.9 * 10 9 MSE = 1.7 * th swap

MSE = 1.3 * 10 9 Final result

Part IV: Deterministic Swap

Deterministic swap Costs for the swap: From where to where?

Merge two existing clusters [Frigui 1997, Kaukoranta 1998] following the spirit of agglomerative clustering. Local optimization: remove the prototype that increases the cost function value least [Fritzke 1997, Likas 2003, Fränti 2006]. Smart swap: find two nearest prototypes, and remove one of them randomly [Chen, 2010]. Pairwise swap: locate a pair of inconsistent prototypes in two solutions [Zhao, 2012]. Cluster removal

1.Select an existing cluster –Depending on strategy: 1..M choices. –Each choice takes O(N) time to test. 2. Select a location within this cluster –Add new prototype –Consider only existing points Cluster addition

Select the cluster Cluster with the biggest MSE –Intuitive heuristic [Fritzke 1997, Chen 2010] –Computationally demanding: Local optimization –Try all clusters for the addition [Likas et al, 2003] –Computationally demanding: O(NM)-O(N 2 )

Select the location 1.Current prototype + ε [Fritzke 1997] 2.Furthest vector [Fränti et al 1997] 3.Any other split heuristic [Fränti et al, 1997] 4.Random location 5.Every possible location [Likas et al, 2003]

Complexity of swaps

Furthest point in cluster Prototype removed Cluster where added Furthest point selected

Initialization: O(MN) Swap Iteration –Finding nearest pair: O(M 2 ) –Calculating distortion: O(N) –Sorting clusters: O(M ∙ logM) –Evaluation of result: O(N) –Repartition and fine-tuning: O(  N) Total: O(MN+M 2 +I ∙  N) Number of iteration expected: < 2∙M Estimated total time: O(2M 2 N) Smart swap

Nearest prototypes Cluster with largest distortion

SmartSwap(X,M) → C,P C ← InitializeCentroids(X); P ←PartitionDataset(X, C); Maxorder ← log 2 M; order ← 1; WHILE order < Maxorder c i, c j ←FindNearestPair(C); S ← SortClustersByDistortion(P, C); c swap ←RandomSelect(c i, c j ); c location ← s order ; C new ← Swap(c swap, c location ); P new ← LocalRepartition(P, C new ); KmeansIteration(P new, C new ); IF f(C new ) < f(C), THEN order ← 1; C ←C new ; ELSE order ← order + 1; KmeansIteration(P, C); Smart swap pseudo code

Pairwise swap Unpaired prototypes Nearest neighbors of each other Nearest neighbor of the other set further than in the same set → Subject to swap

Combinations of random and deterministic swap VariantRemovalAddition RRRandom RDRandomDeterministic DRDeterministicRandom DDDeterministic D2RD2R + data update Random D2DD2D Deterministic + data update Deterministic

Summary of the time complexities Random removal Deterministic removal RRRDDRDDD2RD2RD2DD2D Removal O(1) O(MN) O(αN) Addition O(1)O(N)O(1)O(N)O(1)O(N) Repartition O(N) K-means O(αN) O(MN) O(αN)

Profiles of the processing time

Test data sets

Birch data sets Birch1Birch2Birch3

Experiments Bridge RD DD DR Random Swap

Experiments Bridge

Experiments Birch 2 Random Swap DD DR RD

Experiments Miss America

Quality comparisons (MSE) with 10 second time constraint 18:14:16:15:14:12:1 Average speed-up from RR to RD RD-variant Random Swap Repeated K-means Repeated Random Birch 2 ×10 6 Birch 1 ×10 8 Europe ×10 7 Miss America HouseBridge

Literature 1.P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), , P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), 1139 ‑ 1148, August P. Fränti, O. Virmajoki and V. Hautamäki, “Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR’08), Tampa, FL, Dec P. Fränti, M. Tuononen and O. Virmajoki, "Deterministic and randomized local search algorithms for clustering", IEEE Int. Conf. on Multimedia and Expo, (ICME'08), Hannover, Germany, , June P. Fränti and O. Virmajoki, "On the efficiency of swap-based clustering", Int. Conf. on Adaptive and Natural Computing Algorithms (ICANNGA'09), Kuopio, Finland, LNCS 5495, , April 2009.

5.J. Chen, Q. Zhao, and P. Fränti, "Smart swap for more efficient clustering", Int. Conf. Green Circuits and Systems (ICGCS’10), Shanghai, China, , June B. Fritzke, The LBG-U method for vector quantization – an improvement over LBG inspired from neural networks. Neural Processing Letters 5(1) (1997) P. Fränti and O. Virmajoki, "Iterative shrinking method for clustering problems", Pat. Rec., 39 (5), , May T. Kaukoranta, P. Fränti and O. Nevalainen "Iterative split-and- merge algorithm for VQ codebook generation", Optical Engineering, 37 (10), , October H. Frigui and R. Krishnapuram, "Clustering by competitive agglomeration". Pattern Recognition, 30 (7), , July Literature

10.A. Likas, N. Vlassis and J.J. Verbeek, "The global k-means clustering algorithm", Pattern Recognition 36, , PAM (Kaufman and Rousseeuw, 1987) 12.CLARA (Kaufman and Rousseeuw in 1990) 13.CLARANS (A Clustering Algorithm based on Randomized Search) (Ng and Han 1994) 14.R.T. Ng and J. Han, “CLARANS: A method for clustering objects for spatial data mining,” IEEE Transactions on knowledge and data engineering, 14 (5), September/October Literature