Agglomerative clustering (AC)

Slides:



Advertisements
Similar presentations
Basic Gene Expression Data Analysis--Clustering
Advertisements

Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh 8 th International Conference on Computer.
Data Mining Techniques: Clustering
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
1 Machine Learning: Symbol-based 10d More clustering examples10.5Knowledge and Learning 10.6Unsupervised Learning 10.7Reinforcement Learning 10.8Epilogue.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Ch. 11: Optimization and Search Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 some slides from Stephen Marsland, some images.
Clustering Unsupervised learning Generating “classes”
Computer Vision James Hays, Brown
Clustering Methods: Part 2d Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.
Clustering methods Course code: Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu,
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
Self-organizing map Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Pasi Fränti Clustering Methods: Part.
IMAGE COMPRESSION USING BTC Presented By: Akash Agrawal Guided By: Prof.R.Welekar.
Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern.
1 Converting Categories to Numbers for Approximate Nearest Neighbor Search 嘉義大學資工系 郭煌政 2004/10/20.
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
CSE 185 Introduction to Computer Vision Pattern Recognition 2.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Pasi Fränti and.
Fast search methods Pasi Fränti Clustering methods: Part 5 Speech and Image Processing Unit School of Computing University of Eastern Finland
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Genetic algorithms (GA) for clustering Pasi Fränti Clustering Methods: Part 2e Speech and Image Processing Unit School of Computing University of Eastern.
Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.
CURE: EFFICIENT CLUSTERING ALGORITHM FOR LARGE DATASETS VULAVALA VAMSHI PRIYA.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Vector Quantization Vector quantization is used in many applications such as image and voice compression, voice recognition (in general statistical pattern.
Community detection via random walk Draft slides.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Multilevel thresholding by fast PNN based algorithm UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Olli Virmajoki and Pasi Fränti.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
How to cluster data Algorithm review Extra material for DAA Prof. Pasi Fränti Speech & Image Processing Unit School of Computing University.
Genetic Algorithms for clustering problem Pasi Fränti
S.R.Subramanya1 Outline of Vector Quantization of Images.
Clustering Anna Reithmeir Data Mining Proseminar 2017
Semi-Supervised Clustering
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Divide-and-Conquer MST
Random Swap algorithm Pasi Fränti
Machine Learning University of Eastern Finland
Boosting Nearest-Neighbor Classifier for Character Recognition
DHT Routing Geometries and Chord
Random Swap algorithm Pasi Fränti
Advisor: Chin-Chen Chang1, 2 Student: Wen-Chuan Wu2
DATA MINING Introductory and Advanced Topics Part II - Clustering
K-means properties Pasi Fränti
Clustering Wei Wang.
Cluster Analysis.
Randomized Algorithms
Pasi Fränti and Sami Sieranoja
Mean-shift outlier detection
Hierarchical Clustering
Dimensionally distributed Pasi Fränti and Sami Sieranoja
Clustering methods: Part 10
Presentation transcript:

Agglomerative clustering (AC) Clustering algorithms: Part 2c Agglomerative clustering (AC) Pasi Fränti 25.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND

Agglomerative clustering Categorization by cost function Single link Minimize distance of nearest vectors Complete link Minimize distance of two furthest vectors Ward’s method Minimize mean square error In Vector Quantization, known as Pairwise Nearest Neighbor (PNN) method We focus on this

Pseudo code

Pseudo code PNN(X, M) → C, P FOR i←1 TO N DO p[i]←i; c[i]←x[i]; O(N) REPEAT a,b ← FindSmallestMergeCost(); MergeClusters(a,b); m←m-1; UNTIL m=M; O(N) O(N2) N times T(N) = O(N3)

Ward’s method [Ward 1963: Journal of American Statistical Association] Merge cost: Local optimization strategy: Nearest neighbor search: Find the cluster pair to be merged Update of NN pointers

Example of distance calculations

Example of the overall process . M=50 M=16 M=15 M=16 M=15

Detailed example of the process

Example - 25 Clusters MSE ≈ 1.01*109

Example - 24 Clusters MSE ≈ 1.03*109

Example - 23 Clusters MSE ≈ 1.06*109

Example - 22 Clusters MSE ≈ 1.09*109

Example - 21 Clusters MSE ≈ 1.12*109

Example - 20 Clusters MSE ≈ 1.16*109

Example - 19 Clusters MSE ≈ 1.19*109

Example - 18 Clusters MSE ≈ 1.23*109

Example - 17 Clusters MSE ≈ 1.26*109

Example - 16 Clusters MSE ≈ 1.30*109

Example - 15 Clusters MSE ≈ 1.34*109

Storing distance matrix Maintain the distance matrix and update rows for the changed cluster only! Number of distance calculations reduces from O(N2) to O(N) for each step. Search of the minimum pair still requires O(N2) time  still O(N3) in total. It also requires O(N2) memory.

Heap structure for fast search [Kurita 1991: Pattern Recognition] Search reduces O(N2)  O(logN). In total: O(N2 logN)

Store nearest neighbor (NN) pointers [Fränti et al. , 2000: IEEE Trans Store nearest neighbor (NN) pointers [Fränti et al., 2000: IEEE Trans. Image Processing] Time complexity reduces to O(N 3)  Ω (N 2)

Pseudo code PNN(X, M) → C, P FOR i←1 TO N DO p[i]←i; c[i]←x[i]; O(N) NN[i]← FindNearestCluster(i); REPEAT a ← SmallestMergeCost(NN); b ← NN[i]; MergeClusters(C,P,NN,a,b,); UpdatePointers(C,NN); UNTIL m=M; O(N) O(N2) O(N) O(N) http://cs.uef.fi/pages/franti/research/pnn.txt

Example with NN pointers [Virmajoki 2004: Pairwise Nearest Neighbor Method Revisited ]

Example Step 1

Example Step 2

Example Step 3

Example Step 4

Example Final

Time complexities of the variants

Number of neighbors (τ)

Processing time comparison With NN pointers

Algorithm: Lazy-PNN T. Kaukoranta, P. Fränti and O. Nevalainen, "Vector quantization by lazy pairwise nearest neighbor method", Optical Engineering, 38 (11), 1862-1868, November 1999

Monotony property of merge cost [Kaukoranta et al Monotony property of merge cost [Kaukoranta et al., Optical Engineering, 1999] Merge costs values are monotonically increasing: d(Sa, Sb)  d(Sa, Sc)  d(Sb, Sc)  d(Sa, Sc)  d(Sa+b, Sc)

Additional data structure Lazy variant of the PNN Store merge costs in heap. Update merge cost value only when it appears at top of the heap. Processing time reduces about 35%. Method Ref. Time complexity Additional data structure Space compl. Trivial PNN [10] O(d∙N3) - O(N) Distance matrix [6] O(d∙N2+ N3) O(N2) Kurita’s method [5] O(d∙N2+ N2∙logN) Dist. matrix + heap -PNN [1] O(d∙N2) NN-table Lazy-PNN [4]

Combining PNN and K-means

Algorithm: Iterative shrinking P. Fränti and O. Virmajoki “Iterative shrinking method for clustering problems“ Pattern Recognition, 39 (5), 761-765, May 2006.

Agglomerative clustering based on merging

Agglomeration based on cluster removal [Fränti and Virmajoki, Pattern Recognition, 2006]

Merge versus removal

Pseudo code of iterative shrinking (IS)

Cluster removal in practice Find secondary cluster: Calculate removal cost for every vector:

Partition updates

Complexity analysis Number of vectors per cluster: If we iterate until M=1: Adding the processing time per vector:

Algorithm: PNN with kNN-graph P. Fränti, O. Virmajoki and V. Hautamäki, "Fast agglomerative clustering using a k-nearest neighbor graph". IEEE Trans. on Pattern Analysis and Machine Intelligence, 28 (11), 1875-1881, November 2006

Agglomerative clustering with kNN graph

Example of 2NN graph

Example of 4NN graph

Graph using double linked lists

Merging a and b

Effect on calculations number of steps STAGE Theoretical Observed -PNN Single link Double link Single link Double link Find pair N 1 8 357 3 Merge k2 + logN k2 + k + logN 8 367 200 305 Remove last k + logN LogN 8 349 102 45 Find neighbors kN k 41 769 204 Update costs N (1+)  + /klogN 48 538 198 187 TOTAL O(N2) O(kN2) O(NlogN) 81 970 42 274 746

Processing time as function of k (number of neighbors in graph)

Time distortion comparison -PNN (229 s) Trivial-PNN (>9999 s) Graph-PNN (1) MSE = 5.36 Graph-PNN (2) Graph created by MSP Graph created by D-n-C

Conclusions Simple to implement, good clustering quality Straightforward algorithm slow O(N3) Fast exact (yet simple) algorithm O(τN2) Beyond this possible: O(τ∙N∙logN) complexity Complicated graph data structure Compromizes the exactness of the merge

Literature P. Fränti, T. Kaukoranta, D.-F. Shen and K.-S. Chang, "Fast and memory efficient implementation of the exact PNN", IEEE Trans. on Image Processing, 9 (5), 773-777, May 2000. P. Fränti, O. Virmajoki and V. Hautamäki, "Fast agglomerative clustering using a k-nearest neighbor graph". IEEE Trans. on Pattern Analysis and Machine Intelligence, 28 (11), 1875-1881, November 2006. P. Fränti and O. Virmajoki, "Iterative shrinking method for clustering problems", Pattern Recognition, 39 (5), 761-765, May 2006. T. Kaukoranta, P. Fränti and O. Nevalainen, "Vector quantization by lazy pairwise nearest neighbor method", Optical Engineering, 38 (11), 1862-1868, November 1999. T. Kurita, "An efficient agglomerative clustering algorithm using a heap", Pattern Recognition 24 (3) (1991) 205-209.

Literature J. Shanbehzadeh and P.O. Ogunbona, "On the computational complexity of the LBG and PNN algorithms". IEEE Transactions on Image Processing 6 (4), 614‑616, April 1997. O. Virmajoki, P. Fränti and T. Kaukoranta, "Practical methods for speeding-up the pairwise nearest neighbor method ", Optical Engineering, 40 (11), 2495-2504, November 2001. O. Virmajoki and P. Fränti, "Fast pairwise nearest neighbor based algorithm for multilevel thresholding", Journal of Electronic Imaging, 12 (4), 648-659, October 2003. O. Virmajoki, Pairwise Nearest Neighbor Method Revisited, PhD thesis, Computer Science, University of Joensuu, 2004. J.H. Ward, Hierarchical grouping to optimize an objective function, J. Amer. Statist.Assoc. 58 (1963) 236-244.