Genetic algorithms (GA) for clustering Pasi Fränti Clustering Methods: Part 2e Speech and Image Processing Unit School of Computing University of Eastern Finland
General structure Genetic Algorithm: Generate S initial solutions REPEAT Z iterations Select best solutions Create new solutions by crossover Mutate solutions END-REPEAT
Components of GA Representation of solution Selection method Crossover method Mutation Most critical !
Representation of solution Partition (P): –Optimal centroid can be calculated from P. –Only local changes can be made. Codebook (C): –Optimal partition can be calculated from C. –Calculation of P takes O(NM) slow. Combined (C, P): –Both data structures are needed anyway. –Computationally more efficient.
Selection method To select which solutions will be used in crossover for generating new solutions. Main principle: good solutions should be used rather than weak solutions. Two main strategies: –Roulette wheel selection –Elitist selection. Exact implementation not so important.
Roulette wheel selection Select two candidate solutions for the crossover randomly. Probability for a solution to be selected is weighted according to its distortion:
Elitist selection Elitist approach using zigzag scanning among the best solutions Main principle: select all possible pairs among the best candidates.
Crossover methods Different variants for crossover: Random crossover Centroid distance Pairwise crossover Largest partitions PNN Local fine-tuning: All methods give new allocation of the centroids. Local fine-tuning must be made by K-means. Two iterations of K-means is enough.
Random crossover Solution 1Solution 2 + Select M/2 centroids randomly from the two parent.
New Solution: How to create a new solution? Picking M/2 randomly chosen cluster centroids from each of the two parents in turn. How many solutions are there? 36 possibilities how to create a new solution. What is the probability to select a good one? Not high, some are good but K-Means is needed, most are bad. See statistics. Parent solution A Parent solution B Data point Centroid Explanation M – number of clusters Parent AParent BRating c 2, c 4 c 1, c 4 Optimal c 1, c 2 c 3, c 4 Good (K-Means) c 2, c 3 Bad Some possibilities: M = 4 c1c1 c4c4 c3c3 c2c c1c1 c4c4 c3c3 c2c2 Rough statistics: Optimal: 1 Good:7 Bad:28
Parent solution A Parent solution B c1c1 c4c4 c3c3 c2c c1c1 c4c4 c3c3 c2c2 c1c1 c3c3 c2c2 c4c4 Child solution (optimal)Child solution (good)Child solution (bad) c1c1 c3c3 c2c2 c4c4 c1c1 c2c2 c4c4 c3c3
Centroid distance crossover [Pan, McInnes, Jack, 1995: Electronics Letters ] [Scheunders, 1997: Pattern Recognition Letters ] For each centroid, calculate its distance to the center point of the entire data set. Sort the centroids according to the distance. Divide into two sets: central vectors (M/2 closest) and distant vectors (M/2 furthest). Take central vectors from one codebook and distant vectors from the other.
Parent solution A Parent solution B New solution: Variant (a) Take cental vectors from parent solution A and distant vectors from parent solution B OR Variant (b) Take distant vectors from parent solution A and central vectors from parent solution B Data point Centroid Explanation M – number of clusters Centroid of entire dataset A: d(c 4, C ed ) < d(c 2, C ed ) < d(c 1, C ed ) < d(c 3, C ed ) B: d(c 1, C ed ) < d(c 3, C ed ) < d(c 2, C ed ) < d(c 4, C ed ) 1) Distances d(c i, C ed ): 2) Sort centroids according to the distance: A: c 4, c 2, c 1, c 3, B: c 1, c 3, c 2, c 4 3) Divide into two sets (M = 4): A: central vectors: c 4, c 2, distant vectors: c 1, c 3 B: central vectors: c 1, c 3, distant vectors: c 2, c c1c1 c2c2 c3c3 c4c4 C ed c1c1 c2c2 c3c3 c4c4 c2c2 c4c4 c2c2 c4c4 c1c1 c3c3 c1c1 c3c3
Child - variant (a) Child – variant (b) New solution: Variant (a) Take cental vectors from parent solution A and distant vectors from parent solution B OR Variant (b) Take distant vectors from parent solution A and central vectors from parent solution B Data point Centroid Explanation M – number of clusters Centroid of entire dataset c2c2 c4c4 c2c2 c4c4 c1c1 c3c3 c1c1 c3c c1c1 c2c2 c3c3 c4c4 C ed c1c1 c2c2 c3c3 c4c4
Pairwise crossover [Fränti et al, 1997: Computer Journal] Greedy approach: For each centroid, find its nearest centroid in the other parent solution that is not yet used. Among all pairs, select one of the two randomly. Small improvement: No reason to consider the parents as separate solutions. Take union of all centroids. Make the pairing independent of parent.
Initial parent solutions Pairwise crossover example MSE=8.79 10 9 MSE=11.92 10 9
Pairwise crossover example Pairing between parent solutions MSE=7.34 10 9
Pairing without restrictions MSE=4.76 10 9 Pairwise crossover example
Largest partitions [Fränti et al, 1997: Computer Journal] Select centroids that represent largest clusters. Selection by greedy manner. (illustration to appear later)
PNN crossover for GA [Fränti et al, 1997: The Computer Journal] Initial 2 After PNN Union PNN Combined Initial 1
The PNN crossover method (1) [Fränti, 2000: Pattern Recognition Letters]
The PNN crossover method (2)
Importance of K-means (Random crossover) Best Worst Bridge
Effect of crossover method (with k-means iterations) Bridge
Effect of crossover method (with k-means iterations) Binary data (Bridge2)
Mutations Purpose is to implement small random changes to the solutions. Happens with a small probability. Sensible approach: change the location of one centroid by the random swap! Role of mutations is to simulate local search. If mutations are needed crossover method is not very good.
Effect of k-means and mutations Mutations alone better than random crossover! K-means improves but not vital
Pseudo code of GAIS [Virmajoki & Fränti, 2006: Pattern Recognition]
PNN vs. IS crossovers Further improvement of about 1%
Optimized GAIS variants GAIS short (optimized for speed): -Create new generations only as long as the best solution keeps improving (T=*). -Use a small population size (Z=10) -Apply two iterations of k ‑ means (G=2). GAIS long (optimized for quality): -Create a large number of generations (T=100) -Large population size (Z=100) -Iterate k ‑ means relatively long (G=10).
Comparison of algorithms
Variation of the result
Time vs. quality comparison Bridge
Conclusions Best clustering obtained by GA. Crossover method most important. Mutations not needed.
References 1.P. Fränti and O. Virmajoki, "Iterative shrinking method for clustering problems", Pattern Recognition, 39 (5), , May P. Fränti, "Genetic algorithm with deterministic crossover for vector quantization", Pattern Recognition Letters, 21 (1), 61-68, January P. Fränti, J. Kivijärvi, T. Kaukoranta and O. Nevalainen, "Genetic algorithms for large scale clustering problems", The Computer Journal, 40 (9), , J. Kivijärvi, P. Fränti and O. Nevalainen, "Self-adaptive genetic algorithm for clustering", Journal of Heuristics, 9 (2), , J.S. Pan, F.R. McInnes and M.A. Jack, VQ codebook design using genetic algorithms. Electronics Letters, 31, , August P. Scheunders, A genetic Lloyd-Max quantization algorithm. Pattern Recognition Letters, 17, , 1996.