How to cluster data Algorithm review Extra material for DAA++ 18.2.2016 Prof. Pasi Fränti Speech & Image Processing Unit School of Computing University.

Slides:



Advertisements
Similar presentations
Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
PARTITIONAL CLUSTERING
Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
Clustering II.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Reduced Support Vector Machine
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
FLANN Fast Library for Approximate Nearest Neighbors
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Radial Basis Function Networks
Clustering Unsupervised learning Generating “classes”
CSE 185 Introduction to Computer Vision Pattern Recognition.
Clustering Methods: Part 2d Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.
Clustering methods Course code: Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu,
Self-organizing map Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Pasi Fränti Clustering Methods: Part.
Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Randomized Algorithms Pasi Fränti Treasure island Treasure worth awaits 5000 DAA expedition 5000 ? ? Map for sale: 3000.
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Pasi Fränti and.
Fast search methods Pasi Fränti Clustering methods: Part 5 Speech and Image Processing Unit School of Computing University of Eastern Finland
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Genetic algorithms (GA) for clustering Pasi Fränti Clustering Methods: Part 2e Speech and Image Processing Unit School of Computing University of Eastern.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Vector Quantization Vector quantization is used in many applications such as image and voice compression, voice recognition (in general statistical pattern.
Vector Quantization CAP5015 Fall 2005.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Multilevel thresholding by fast PNN based algorithm UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Olli Virmajoki and Pasi Fränti.
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Genetic Algorithms for clustering problem Pasi Fränti
Clustering Categorical Data
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
S.R.Subramanya1 Outline of Vector Quantization of Images.
Agglomerative clustering (AC)
Data Transformation: Normalization
Semi-Supervised Clustering
The Johns Hopkins University
Centroid index Cluster level quality measure
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Random Swap algorithm Pasi Fränti
Haim Kaplan and Uri Zwick
Machine Learning University of Eastern Finland
Clustering (3) Center-based algorithms Fuzzy k-means
K-means and Hierarchical Clustering
K Nearest Neighbor Classification
Random Swap algorithm Pasi Fränti
K-means properties Pasi Fränti
Foundation of Video Coding Part II: Scalar and Vector Quantization
Randomized Algorithms
Text Categorization Berlin Chen 2003 Reference:
Pasi Fränti and Sami Sieranoja
Hairong Qi, Gonzalez Family Professor
Clustering methods: Part 10
Presentation transcript:

How to cluster data Algorithm review Extra material for DAA Prof. Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND

University of Eastern Finland Joensuu Joki = a river Joen = of a river Suu = mouth Joensuu = mouth of a river

Research topics Voice biometric Clustering methods Location-based application Clustering algorithms Clustering validity Graph clustering Gaussian mixture models Speaker recognition Voice activity detection Applications Mobile data collection Route reduction and compression Photo collections and social networks Location-aware services & search engine Lossless compression and data reduction Image denoising Ultrasonic, medical and HDR imaging Image processing & compression

Research achievements Voice biometric Clustering methods Location-based application State-of-the-art algorithms! 4 PhD degrees 5 Top publications Results used by companies in Finland State-of-the-art algorithms in niche areas 6 PhD degrees 8 Top publications Image processing & compression NIST SRE submission ranked #2 in four categories in NIST SRE Top-1 most downloaded publication in Speech Communication Oct-Dec 2009 Results used in Forensics

Application example 1 Color reconstruction Image with compression artifacts Image with original colors

Application example 2 speaker modeling for voice biometrics Training data Feature extraction and clustering Matti Mikko Tomi Speaker models Tomi Matti Feature extraction Best match: Matti ! Mikko ?

Speaker modeling Speech dataResult of clustering

Application example 3 Image segmentation Normalized color plots according to red and green components. Image with 4 color clusters red green

Application example 4 Quantization Quantized signal Original signal Approximation of continuous range values (or a very large set of possible discrete values) by a small set of discrete symbols or integer values

Color quantization of images Color imageRGB samples Clustering

Application example 5 Clustering of spatial data

Clustered locations of users

Clustering of photos Timeline clustering

Clustering GPS trajectories Mobile users, taxi routes, fleet management

Conclusions from clusters Cluster 1: Office Cluster 2: Home

Part I: Clustering problem

Definitions and data Set of N data points: X={x 1, x 2, …, x N } Set of M cluster prototypes (centroids): C={c 1, c 2, …, c M }, P={p 1, p 2, …, p M }, Partition of the data:

K-means algorithm X = Data set C = Cluster centroids P = Partition K-Means (X, C) → (C, P) REPEAT C prev ← C; FOR all i ∈ [1, N] DO p i ← FindNearest(x i, C); FOR all j ∈ [1, k] DO c j ← Average of x i  p i = j; UNTIL C = C prev Optimal partition Optimal centoids

Distance and cost function Euclidean distance of data vectors: Mean square error:

Clustering result as partition Illustrated by Voronoi diagram Illustrated by Convex hulls Cluster prototypes Partition of data

Cluster prototypes Partition of data Centroids as prototypes Partition by nearest prototype mapping Duality of partition and centroids

Cluster missingClusters missing Too many clusters Incorrect cluster allocation Incorrect number of clusters Challenges in clustering

How to solve? Solve the clustering:   Given input data (X) of N data vectors, and number of clusters (M), find the clusters.   Result given as a set of prototypes, or partition. Solve the number of clusters:   Define appropriate cluster validity function f.   Repeat the clustering algorithm for several M.   Select the best result according to f. Solve the problem efficiently. Algorithmic problem Mathematical problem Computer science problem

Part II: Clustering algorithms

Algorithm 1: Split P. Fränti, T. Kaukoranta and O. Nevalainen, "On the splitting method for vector quantization codebook generation", Optical Engineering, 36 (11), , November 1997.

Motivation   Efficiency of divide-and-conquer approach   Hierarchy of clusters as a result   Useful when solving the number of clusters Challenges   Design problem 1: What cluster to split?   Design problem 2: How to split?   Sub-optimal local optimization at best Divisive approach

Split-based (divisive) clustering

  Heuristic choices:   Cluster with highest variance (MSE)   Cluster with most skew distribution (3 rd moment)   Locally optimal:   Tentatively split all clusters   Select the one that decreases MSE most!   Complexity of the choice:   Heuristics take the time to compute the measure   Optimal choice takes only twice (2  ) more time!!!   The measures can be stored, and only two new clusters appear at each step to be calculated. Select cluster to be split Use this !

Selection example Biggest MSE… … but dividing this decreases MSE more

Selection example Only two new values need to be calculated

How to split   Centroid methods:   Heuristic 1: Replace C by C-  and C+    Heuristic 2: Two furthest vectors.   Heuristic 3: Two random vectors.   Partition according to principal axis:   Calculate principal axis   Select dividing point along the axis   Divide by a hyperplane   Calculate centroids of the two sub-clusters

Splitting along principal axis pseudo code Step 1:Calculate the principal axis. Step 2:Select a dividing point. Step 3:Divide points by a hyper plane. Step 4:Calculate centroids of the new clusters.

Example of dividing Dividing hyper plane Principal axis

Optimal dividing point pseudo code of Step 2 Step 2.1: Calculate projections on the principal axis. Step 2.2: Sort vectors according to the projection. Step 2.3: FOR each vector x i DO: - Divide using x i as dividing point. - Calculate distortion of subsets D 1 and D 2. Step 2.4: Choose point minimizing D 1 +D 2.

Finding dividing point   Calculating error for next dividing point:  Update centroids: Can be done in O(1) time!!!

Sub-optimality of the split

Example of splitting process Dividing hyper plane Principal axis 2 clusters3 clusters

4 clusters5 clusters Example of splitting process

6 clusters7 clusters Example of splitting process

8 clusters9 clusters Example of splitting process

10 clusters11 clusters

Example of splitting process 12 clusters13 clusters

Example of splitting process MSE = clusters15 clusters

K-means refinement Result after re-partition: MSE = 1.39 Result after K-means: MSE = 1.33 Result directly after split: MSE = 1.94

Time complexity Number of processed vectors, assuming that clusters are always split into two equal halves: Assuming unequal split to n max and n min sizes:

Number of vectors processed: At each step, sorting the vectors is bottleneck: Time complexity

P. Fränti, T. Kaukoranta, D-F. Shen and K-S. Chang, "Fast and memory efficient implementation of the exact PNN", IEEE Trans. on Image Processing, 9 (5), , May Algorithm 2: Pairwise Nearest Neighbor

Agglomerative clustering Single link  Minimize distance of nearest vectors Complete link  Minimize distance of two furthest vectors Ward’s method  Minimize mean square error  In Vector Quantization, known as Pairwise Nearest Neighbor (PNN) method

PNN algorithm [Ward 1963: Journal of American Statistical Association] Merge cost: Local optimization strategy: Nearest neighbor search is needed: (1) finding the cluster pair to be merged (2) updating of NN pointers

Pseudo code

Overall example of the process M=5000 M=4999 M=4998. M=50. M=16 M=15 M=5000M=50 M=16 M=15

Detailed example of the process

Example - 25 Clusters MSE ≈ 1.01*10 9

Example - 24 Clusters MSE ≈ 1.03*10 9

Example - 23 Clusters MSE ≈ 1.06*10 9

Example - 22 Clusters MSE ≈ 1.09*10 9

Example - 21 Clusters MSE ≈ 1.12*10 9

Example - 20 Clusters MSE ≈ 1.16*10 9

Example - 19 Clusters MSE ≈ 1.19*10 9

Example - 18 Clusters MSE ≈ 1.23*10 9

Example - 17 Clusters MSE ≈ 1.26*10 9

Example - 16 Clusters MSE ≈ 1.30*10 9

Example - 15 Clusters MSE ≈ 1.34*10 9

Example of distance calculations

Storing distance matrix   Maintain the distance matrix and update rows for the changed cluster only!   Number of distance calculations reduces from O(N 2 ) to O(N) for each step.   Search of the minimum pair still requires O(N 2 ) time  still O(N 3 ) in total.   It also requires O(N 2 ) memory.

Heap structure for fast search [Kurita 1991: Pattern Recognition]  Search reduces O(N)  O(logN).  In total: O(N 2 logN)

Maintain nearest neighbor (NN) pointers [Fränti et al., 2000: IEEE Trans. Image Processing] Time complexity reduces to O(N 3 )  Ω (  N 2 )

Processing time comparison With NN pointers

Combining PNN and K-means N M M0M0 PNN K-means Standard PNN Random 1 M M0M0 N Number of clusters

  P. Fränti, O. Virmajoki and V. Hautamäki, "Fast agglomerative clustering using a k-nearest neighbor graph". IEEE Trans. on Pattern Analysis and Machine Intelligence, 28 (11), , November   P. Fränti and O. Virmajoki, "Iterative shrinking method for clustering problems", Pattern Recognition, 39 (5), , May   T. Kaukoranta, P. Fränti and O. Nevalainen, "Vector quantization by lazy pairwise nearest neighbor method", Optical Engineering, 38 (11), , November   O. Virmajoki, P. Fränti and T. Kaukoranta, "Practical methods for speeding-up the pairwise nearest neighbor method ", Optical Engineering, 40 (11), , November Further improvements

Algorithm 3: Random Swap P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), , 2000.

Random swap algorithm (RS)

Demonstration of the algorithm

Centroid swap

Local repartition

Fine-tuning by K-means 1st iteration

Fine-tuning by K-means 2nd iteration

Fine-tuning by K-means 3rd iteration

Fine-tuning by K-means 16th iteration

Fine-tuning by K-means 17th iteration

Fine-tuning by K-means 18th iteration

Fine-tuning by K-means 19th iteration

Fine-tuning by K-means Final result after 25 iterations

Implementation of the swap 1. Random swap: 2. Re-partition vectors from old cluster: 3. Create new cluster:

Independency of initialization Results for T = 5000 iterations Worst Best Initial Bridge

Probability of good swap Select a proper centroid for removal:   There are M clusters in total: p removal =1/M. Select a proper new location:   There are N choices: p add =1/N   Only M are significantly different: p add =1/M In total:   M 2 significantly different swaps.   Probability of each different swap is p swap =1/M 2   Open question: how many of these are good?

  Probability of not finding good swap: Expected number of iterations  Estimated number of iterations:

Estimated number of iterations depending on T S1S1S1S1 S2S2S2S2 S3S3S3S3 S4S4S4S4 Observed = Number of iterations needed in practice. Estimated = Estimated number of iterations needed for the given q value.

Probability of success (p) depending on T

Probability of failure (q) depending on T

Observed probabilities depending on dimensionality

Bounds for the number of iterations Upper limit: Lower limit similarly; resulting in:

Multiple swaps (w) Probability for performing less than w swaps: Expected number of iterations:

Efficiency of the random swap Total time to find correct clustering:   Time per iteration  Number of iterations Time complexity of single step:   Swap: O(1)   Remove cluster: 2M  N/M = O(N)   Add cluster: 2N = O(N)   Centroids: 2  (2N/M) + 2  + 2 = O(N/M)   (Fast) K-means iteration: 4  N = O(  N) * * See Fast K-means for analysis.

Observed K-means iterations

K-means iterations

Time complexity and the observed number of steps

Total time complexity Number of iterations needed (T): t = O(αN) Total time: Time complexity of single step (t):

Time complexity: conclusions Logarithmic dependency on q Linear dependency on N Quadratic dependency on M (With large number of clusters, it can be too slow and faster variant might be needed.) Inverse dependency on  (worst case  = 2) (Higher the dimensionality, faster the method)

References Random swap algorithm: P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), , P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), 1139 ‑ 1148, August Pseudo code: Efficiency of Random swap algorithm: P. Fränti, O. Virmajoki and V. Hautamäki, “Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR’08), Tampa, FL, Dec 2008.

Part III: Efficient solution

Stopping criterion? Ends up to a local minimum Divisive Agglomerative

Strategies for efficient search using random swap   Brute force: solve clustering for all possible number of clusters.   Stepwise: as in brute force but start using previous solution and iterate less.   Criterion-guided search: Integrate validity directly into the cost function.

Brute force search strategy Number of clusters Search for each separately 100 %

Stepwise search strategy Number of clusters Start from the previous result %

Criterion guided search Number of clusters Integrate with the cost function! 3-6 %

Conclusions Define the problem Cost function f. Measures the goodness of clusters, or alternatively (dis)similarity between two objects. Solve the problem Select the best algorithm for minimizing f Homework Number of clusters: Q. Zhao and P. Fränti, "WB-index: a sum-of- squares based index for cluster validity", Data & Knowledge Engineering, 92: 77-89, Validation: P. Fränti, M. Rezaei and Q. Zhao, "Centroid index: Cluster level similarity measure", Pattern Recognition, 47 (9), , Sept

Thank you Time for questions!