Download presentation
Presentation is loading. Please wait.
Published byLeon Dawson Modified over 9 years ago
1
Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Pasi Fränti and Olli Virmajoki to be presented at: Data Mining 2003
2
Problem setup Given N data vectors X={x 1, x 2, …, x N }, partition the data set into M clusters 1.Clustering: find the location of the clusters. 2. Vector quantization: approximate the original data by a set of code vectors.
3
Agglomerative clustering PNN: Pairwise Nearest Neigbor method Merges two clusters Preserves hierarchy of clusters IS: Iterative shrinking method Removes one cluster Repartition data vectors in removed cluster
4
Iterative Shrinking
5
Iterative Shrinking algorithm (IS)
6
Local optimization of the IS Finding secondary cluster: Removal cost of single vector:
7
Generalization to the case of unknown number of clusters Measure variance-ratio F-test for every intermediate clustering from M=1..N. Select the clustering with minimum F-ratio as final clustering. No additional computing – except the calculation of the F-ratio.
8
Example for (Data set 3)
9
Example for Data set 4
10
Genetic algorithm Generate S initial solutions. REPEAT T times Select best solutions to survive. Generate new solutions by crossover Fine-tune solutions END-REPEAT Output the best solution found.
11
Illustration of crossover + = Crossover
12
GAIS algorithm
13
Effect of crossover
14
Convergence of GA with F-ratio
15
Image datasets Bridge (256 256) d = 16 N = 4096 M = 256 Miss America (360 288) d = 16 N = 6480 M = 256 House (256 256) d = 3 N = 34112 * M = 256
16
Synthetic data sets Data set S 1 d = 2 N = 5000 M = 15 Data set S 2 d = 2 N = 5000 M = 15 Data set S 3 d = 2 N = 5000 M = 15 Data set S 4 d = 2 N = 5000 M = 15
17
Comparison with image data Popular methods Previous GA NEW! Simplest of the good ones
18
Comparison with synthetic data Most separable clusters Most overlapping between clusters
19
What does it cost? Bridge Random:~0 s K-means:8 s SOM: 6 minutes GA-PNN:13 minutes GAIS – short:~1 hour GAIS – long:~3 days
20
Conclusions Slower but better clustering algorithm. BEST known clustering algorithm in minimizing MSE Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.