Academic Report De-Shuang Huang Intelligent Computing Lab, Hefei Institute of Intelligent Machines, Chinese Academy of Sciences Department of Automation, University of Science and Technology of China March 2006
Part I : A Multi-Sub-Swarm PSO Algorithm for Multimodal Function Optimization Part II : A Brief Introduction to ICL
Outlines 1. Particle Swarm Optimization (PSO) 2. Niche Techniques and Development 3. Novel Adaptive Sequential Niche Technique 4. Multi-Sub-Swarm PSO Algorithm 5. Conclusions Part I : A Multi-Sub-Swarm PSO Algorithm for Multimodal Function Optimization
1. Particle Swarm Optimization Particle Swarm Optimization Particle Swarm Optimization (PSO) algorithm was developed in 1995 by James Kennedy and Russ Eberhart It was inspired by social behavior of bird flocking or fish schooling PSO was applied to the concept of social interaction to problem solving
W is called as inertia weight, C1 and C2 are positive constants, referred to as cognitive and social parameters; rand1 (*) and rand2 (*) are random numbers, respectively, uniformly distributed in [0..1] pbest gbest v (k) v (k+1) (1) (2) 1.1 PSO Algorithm
1.2 The Current Researches in PSO The researches on PSO generally can be categorized into five parts: Algorithms (Binary PSO algorithms) Topology (Design different types of neighborhood structures with PSO) Parameters (The inertia weight W, constriction coefficient factor and the impact of these parameters on the performance of PSO algorithm) Hybrid PSO algorithms (Combine the PSO with the other techniques) Applications (Constrained optimization, multiobjective optimization, neural network training, etc.)
Swarm Topology Two general types of neighborhood structures were investigated, gbest and lbest (Eberhart, Simpson, and Dobbins, 1996) I 4 I 0 I 1 I 2 I 3 I 4 I 0 I 1 I 2 I 3
The global model converges fast, but with potential to converge to the local minimum, while the local model might have more chances to find better solutions slowly (Kennedy 1999, Kennedy, Eberhart and Shi 2001) A lot of researchers have worked on improving its performance by designing or implementing different types of neighborhood structures in PSO Kennedy and Mendes tested PSO with different neighborhoods Mendes and Kennedy proposed a fully informed particle swarm optimization algorithm
Parameters Velocity changes of a PSO consist of three parts, the “social” part, the “cognitive” part, and the momentum part A PSO with well-selected parameter set can have good performance Shi and Eberhart (Shi and Eberhart 1998, 1999) introduced a linearly decreasing inertia weight to the PSO, then they further designed fuzzy systems to nonlinearly change the inertia weight (Shi and Eberhart 2001)
Constriction coefficient factor was developed by Clearc with the hope that it can insure a PSO to converge (Clerc 1999, Clerc and Kennedy 2002) (3) (4) where
Hybrid PSO algorithms Some evolutionary computation techniques were merged into PSO algorithm Applying selection operation to PSO (Angeline, 1998) Applying crossover operation to PSO (Løvbjerg, Rasmussen and Krink 2001) Applying mutation operation to PSO (Miranda and Fonseca 2002,Løvbjerg and Krink, 2002 ) Other evolutionary operations were incorporated into PSO algorithm Either PSO algorithm, GA, or hill-climbing search algorithm can be applied to a different sub-population of individuals (Krink and Løvbjerg, 2002) Differential evolution(DE) was combined with PSO (Hendtlass,2001).
Non-evolutionary techniques have been incorporated into PSO A Cooperative Particle Swarm Optimizer (CPSO) was developed by Van Den Bergh and Engelbrecht (2004) The population of particles is divided into subpopulations which would breed within their own sub-population so that the diversity of the population can be increased (Løvbjerg,Rasmussen and Krink 2001) Deflation and stretching techniques(Parsopoulos and Vrahatis, 2004).
Applications Constrained optimization problems A straight forward approach that is used to convert the constrained optimization problem into a non-constrained optimization (Parsopoulos and Vrahatis, 2002) Preserve feasible solutions and repair the infeasible solutions (Hu and Eberhart, 2002) Hybrid algorithms that usually employs some information decoding strategy (Ray and Liew, 2001).
Multiobjective optimization problems(MOP) Convert a MOP to a single objective optimization problem using weight (Parsopoulos and Vrahatis, 2002) Record a set of better performing particles and then move towards particles randomly selected from the set instead of the neighborhood best in the original PSO to maintain a diversity of population and therefore maintain a well distribution along the Pareto front (Ray and Liew, 2002, Coello Coello and Lechuga, 2002) Optimizes one objective at a time (Hu and Eberhart, 2002)
Evolve weights and structures of neural networks Evolve neural networks (Eberhart and Shi, 1998,Kennedy, Eberhart and Shi, 2001) Analyze human tumor (Eberhart and Hu, 1999) Leaf shape Matching (J.X. Du and D. S. Huang, 2005) A Hybrid PSO-backpropagation Algorithm for Feedforward Neural Network Training (J.R. Zhang and D. S. Huang, et.al, 2005).
2. Niche Techniques The definition of Niche The term Niche is borrowed from Ecology Horn’s definition: form of cooperation around finite, limited resources, resulting in the lack of competition between such areas, and causing the formation of species for each niche The target of niche technique is to attempt to find multiple solutions to optimization problems Each resource can be considered as a niche in optimization problem, and each subpopulation exploiting a niche can be considered as a species.
Aim: find all optima (global and/or local) of the objective function Motivation: Provide the decision maker not a single optimal solution but also a set of good solutions Find all solutions with local optimal style Applications: Systems design DNA sequence analysis Multimodal function optimization
Ordinary optimization techniques Aim: find a global optimum Evolutionary approach: population concentrates on the global optima (single powerful species) Premature convergence: bad Niche optimization techniques Aim: find all (global/ local) optima Evolutionary approach: different species are formed, each one of which identifies an optimum Premature convergence: not so bad
2.1 The Origin of Niche Techniques Preselection (Cavicchio, 1970) Preselection (Cavicchio, 1970) Modification to the replacement step of a classical GA In preselection, not all the generated offsprings are chosen for the new population. Only the offspring with higher fitness than their parents replaces their parents in the next generation Like the other traditional GAs, this technique does not keep stable species or subpopulations for many generations and it only converges to one optimum.
2.2 Crowding (De Jong, 1975) The offspring replaces similar individuals from the population. An offspring is inserted into the population according to: First, a group of crowding factor (CF: it indicates the size of the group) individuals are selected at random from the population. Second, the bit strings in the offspring chromosome are compared with those of the CF individuals in the group using the Hamming distance or other measurement. The group member that is most similar to the offspring is replaced by the offspring.
2.3 Sharing (Goldberg and Richardson 1987) New fitness is calculated by (5) In sharing, the fitness value of an individual is adjusted according to the number of individuals in its neighborhood or niche A sharing function assumes a value between 0 and 1 for any distance value between any two individuals i and j in the population Sharing function counts the individuals in a given neighborhood: (6)
2.4 The Flaws of Crowding and Sharing Crowding technique Crowding technique The algorithm will be likely to lose some of the optima because of the replacement error Sharing technique Sharing technique The computational complexity is in the order of O(n 2 ) Depend on some prior knowledge of multimodal problem (a single niche radius must be specified in advance)
2.5 Improvements to Crowding Technique Deterministic Crowding (Mahfoud, 1992) The individuals to mate at random Each of the two offsprings is first paired with one of the parents; this pairing is not done randomly, rather the pairing is done in such a manner that the offspring is paired with the most similar parent Then each offspring is compared with its paired parent and the individual with the higher fitness is allowed to stay in the population and the other is eliminated.
Restricted Tournament Selection (Harik, 1995) Both parents are selected at random from the population Offspring are inserted into the population by choosing a group of individuals from the population at random with replacement. Then the individual in the group which is most similar to the offspring is selected The offspring replaces the chosen individual in the population if its fitness value is higher, otherwise, the offspring is eliminated.
Cluster analysis (Yin and Germay, 1993) Reduce the complexity of sharing to O(n) After each generation, an adaptive clustering algorithm groups the individuals in the population into a number of clusters dynamically These clusters are then used to compute the sharing value and iterate the fitness value for each individual in the population. 2.6 Improvements to Sharing Technique
2.7 Niche Techniques Based on Other Algorithms Multipopulation Differential Evolution Algorithm (Zaharie, 2004) A multi- subpopulations are carefully initialized Then the multi-resolution approach is used to avoid specifying a niche radius. Niching Particle Swarm Optimization (Brits, 2002) The algorithm uses only cognitive model to train a main swarm When a particle fitness shows very little change over a small number of iterations, the algorithm then will create a sub-swarm around the particle in a small area so that the sub-swarm can be trained to locate multiple solutions.
2.8 Sequential and Parallel Niche Techniques Sequential niche technique (Beasley, 1993) Iterative application of a GA At each iteration an optimum is identified The fitness function is iterated based on those already found optima Parallel niche techniques Divide the population into communicating subpopulations which evolves in parallel Each subpopulation corresponds to a species whose aim is to populate a niche in the fitness landscape and to identify an optimum.
2.9 Disadvantages to Sequential Niche (SN) Some disadvantages of the sequential niche techniques (Mahfoud, 1995) Parallel niche techniques are faster than SN technique SN technique iterates optima, the rest optima become increasingly difficult to locate SN technique is likely to locate the same solutions repeatedly Parallel niche techniques can easily be implemented on parallel machines, but SN techniques can not.
2.10 Defects to Current Niche Techniques The drawbacks of these niche methods (including improvement versions) cannot be completely avoided. The replacement errors (Crowding techniques) Dependent on some prior knowledge (Sharing techniques) The reason: Lack of an effective niche identification technique (NIT) What will happen if there is an effective niche identification technique? Some of the problems can be easily solved.
2.11 Niche Identification Technique (NIT) Analyze the topology structure of a multimodal function for identifying a niche Hill valley function (Ursem, 1999) This function uses multi-samples between any two points of the search space. If the fitness of any interior samples is smaller than the minimal fitness of two points, then the function will determine that the two points are to belong to different niches. Niche identification techniques (Lin, et al, 2002)
2.12 Defects of NIT Defects of NIT Key defects: these NITs usually need plenty of extra function evaluations The false judgment may be happen These NITs cannot be directly applied in two popular niche techniques mentioned above These niche techniques use a whole large population to explore the search space If the NIT is employed, the whole function evaluation number will increase astronomically In general, an excessive function evaluation is not possibly accepted.
Open Problems How to find a more effective and efficient NIT ( The next research objective) How to decrease extra function evaluations using existing NITs? Divide a larger population into multi-sub-population Some new methods must be employed, which can effectively reduce the function evaluations.
3. Novel Adaptive SN Technique On one hand, the niche technique must sacrifice some global search ability through confining individuals to only explore their own niches The sharing technique guarantees to set niche radius. The crowding technique only replaces the most similar individual. On the other hand, the niche technique also needs some global exploring information to guide every individual to explore the whole search space. A Dilemma to Current Niche Techniques
Exploring Information Exchange If the population size is adequately large, then the exploring information exchange is not necessary. A good exploring information communicating among the whole population is very important for difficult problems.
The Unique Advantages of SN The traditional parallel niche techniques seem to behave inadequate exploring information capability among the population On the contrary, the SN technique has a good exploring information exchange mechanism, where one sub- population cannot explore a space searched by another sub-population From this point, it can be seen that the SN technique has its unique advantages.
3.1 Novel Adaptive SN Technique The basic idea of adaptive SN technique The technique uses multiple sub-swarm to detect optimal solutions sequentially To encourage a new sub-swarm flying to a new place in search space, the algorithm modified the raw fitness function The hill valley function was used to determine how to change the fitness of a particle in a sub-swarm run currently Sequential dynamic update niche radius algorithm is used to decrease extra function evaluations.
3.2 Sequential Dynamic Niche Radius Algorithm 1.Create and initialize a sub-swarm PSO algorithm with a larger niche radius 2.Train the sub-swarm until convergence 3.Repeat 4.Create and initialize a new sub-swarm with a larger niche radius 5. For every particle in this new sub-swarm 6. If the distance between the particle and the best particle of sub-swarm launched before is smaller than that niche radius 7. Use hill valley function to judge whether or not they belong to one niche 9. If two particles are not belong to one niche, update that the niche radius, otherwise modified the fitness of the particle 10. Train the sub-swarm 11. Until all sub-swarm convergence
3.3 Experimental results The final niche radius of Scekel’s Foxhole function, where 25 sub- swarms are used
The adaptive SN PSO algorithm can find the optima of some multimodal test functions without any prior knowledge As Mahfoud pointed out, the running speed is slow Then we implement a parallel niche PSO algorithm Neurocomputing (accepted)
4. Multi-Sub-Swarm PSO Algorithm Multi-sub-swarm is running simultaneously The different sub-swarms can compete with each other, and the winner after competing will continue to explore the original district while the loser will be obliged to explore another district. To avoid two sub-swarm to detect a same optimum, the particle intruding other niche will be punished.
4.1 MSSPSO Algorithm 1. Create and initialize N sub-swarm of PSO algorithm with a larger niche radius 2. If two best particle of different sub-swarm locating in the same niche,comparing their fitness, the smaller one sub-swam will be re- initialized 3. For every particle and the memorial position X in other sub-swarm, decreasing the fitness of the particle 4. Train every sub-swarm and update the best particle of each sub- swarm 5. Update and Compensate the new niche radius
4.2 Experimental Functions (7) (8) (9) (10) and
4.3 Performance Criteria Maximum Peak Ratio: The sum of the fitness values of the local optima identified by the niche technique divided by the sum of the fitness values of the actual optima of a multimodal problem (Miller and Shaw, 1996)
Chi-Square-Like Performance Criteria: The chi-square- like criteria measures the deviation between the population distribution and an ideal proportionally populated distribution (Deb and Glodberg, 1989)
Number of Fitness Function Evaluations In most real world applications, the computational cost of fitness function is very expensive The niche technique must confine the number of the function evaluations in certain range The adaptive ability Most niche technique is very sensitive to some parameters. An inappropriate parameter possibly depresses the performance of a niche technique So, the adaptive ability is another important measure for niche technique.
Maximum Peak Ratio Test functionMaximum Peak Ratio F11 F21 F F
Chi-square-like deviation IEEE TEC (will submit)
Number of Function Evaluations (NFE) Test Function Populatio n size Nb. Of the sub-swarm Ordinary NFE F F F F
5. Conclusions The proposed method can well imitate the ecosystem of nature, and the different sub-population can compete with each other The proposed method constructs a dynamic niche radius algorithm (DNRA), which can hugely reduce the extra function evaluation The proposed method integrates the sequential technique with the parallel one The proposed method has a good performance The proposed method has strongly adaptive ability.
Future Works: How to choose a more efficient and effective niche identification technique A constructive method must be rebuilt to cover any shape niche How to apply the proposed method to many hard real-world multimodal problems.
ALL OVER THE END