CAGE: A Tool for Parallel Genetic Programming Applications Gianluigi Folino.

Slides:



Advertisements
Similar presentations
Population-based metaheuristics Nature-inspired Initialize a population A new population of solutions is generated Integrate the new population into the.
Advertisements

Practical techniques & Examples
CS6800 Advanced Theory of Computation
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Spie98-1 Evolutionary Algorithms, Simulated Annealing, and Tabu Search: A Comparative Study H. Youssef, S. M. Sait, H. Adiche
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Data Mining Techniques Outline
Doug Downey, adapted from Bryan Pardo, Machine Learning EECS 349 Machine Learning Genetic Programming.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2004.
Image Registration of Very Large Images via Genetic Programming Sarit Chicotay Omid E. David Nathan S. Netanyahu CVPR ‘14 Workshop on Registration of Very.
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Genetic Programming.
Genetic Programming Chapter 6. A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Genetic Programming GP quick overview Developed: USA.
1 Reasons for parallelization Can we make GA faster? One of the most promising choices is to use parallel implementations. The reasons for parallelization.
Genetic Algorithm.
Issues with Data Mining
Neural and Evolutionary Computing - Lecture 10 1 Parallel and Distributed Models in Evolutionary Computing  Motivation  Parallelization models  Distributed.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
Schemata Theory Chapter 11. A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Theory Why Bother with Theory? Might provide performance.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
What is Genetic Programming? Genetic programming is a model of programming which uses the ideas (and some of the terminology) of biological evolution to.
Optimization Problems - Optimization: In the real world, there are many problems (e.g. Traveling Salesman Problem, Playing Chess ) that have numerous possible.
Applying Genetic Algorithm to the Knapsack Problem Qi Su ECE 539 Spring 2001 Course Project.
Evolving Virtual Creatures & Evolving 3D Morphology and Behavior by Competition Papers by Karl Sims Presented by Sarah Waziruddin.
1 “Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for optimal combinations of things, solutions.
GENETIC ALGORITHM A biologically inspired model of intelligence and the principles of biological evolution are applied to find solutions to difficult problems.
G ENETIC P ROGRAMMING Ranga Rodrigo March 17,
Genetic Algorithms. Evolutionary Methods Methods inspired by the process of biological evolution. Main ideas: Population of solutions Assign a score or.
EE749 I ntroduction to Artificial I ntelligence Genetic Algorithms The Simple GA.
Biologically inspired algorithms BY: Andy Garrett YE Ziyu.
Genetic Programming A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Chapter 6.
Parallel Genetic Algorithms By Larry Hale and Trevor McCasland.
Automated discovery in math Machine learning techniques (GP, ILP, etc.) have been successfully applied in science Machine learning techniques (GP, ILP,
Introduction Genetic programming falls into the category of evolutionary algorithms. Genetic algorithms vs. genetic programming. Concept developed by John.
D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
Genetic Programming Using Simulated Natural Selection to Automatically Write Programs.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Genetic Algorithms. Solution Search in Problem Space.
Genetic Algorithms An Evolutionary Approach to Problem Solving.
Genetic Algorithms And other approaches for similar applications Optimization Techniques.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Presented By: Abdul Aziz Ghazi  Roll No:  Presented to: Sir Harris.
CEng 713, Evolutionary Computation, Lecture Notes parallel Evolutionary Computation.
Genetic Programming.
Chapter 14 Genetic Algorithms.
Selected Topics in CI I Genetic Programming Dr. Widodo Budiharto 2014.
Genetic Algorithms.
Evolution strategies and genetic programming
A Distributed Genetic Algorithm for Learning Evaluation Parameters in a Strategy Game Gary Matthias.
Artificial Intelligence Chapter 4. Machine Evolution
Multi-Objective Optimization
Artificial Intelligence Chapter 4. Machine Evolution
Boltzmann Machine (BM) (§6.4)
Genetic Programming Chapter 6.
Genetic Programming.
Genetic Programming Chapter 6.
Genetic Programming Chapter 6.
Beyond Classical Search
Coevolutionary Automated Software Correction
Presentation transcript:

CAGE: A Tool for Parallel Genetic Programming Applications Gianluigi Folino

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 2Outline  Introduction  What is GP  How GP works  Interesting results  Parallel GP  Parallel Model for Evolutionary Algorithms  Implementation of CAGE (cellular model)  Convergence analysis  Scalability Analysis  Future Works (some ideas)  Data Mining and Classification  Grid Computing  Information Retrieval

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 3 Problem Solving and GP Genetic Programming is a general concept to solve problems. Genetic Programming is a general concept to solve problems. Heuristic to find a global optimum in a search space. Heuristic to find a global optimum in a search space. Weak method including little knowledge about the problem to solve. Weak method including little knowledge about the problem to solve. Mimic the process of natural evolution for the emergence of complex structure (solutions). Mimic the process of natural evolution for the emergence of complex structure (solutions). In a population of computer programs (candidate solutions), only best fit programs survive evolution. In a population of computer programs (candidate solutions), only best fit programs survive evolution.

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 4 Individual representation and search space The user choose the functions and the terminals necessary to solve a problem. The user choose the functions and the terminals necessary to solve a problem. The search space is composed from all the possible programs generated recursively from the functions and the terminals chosen. The search space is composed from all the possible programs generated recursively from the functions and the terminals chosen. An computer program (individual) is represented as a parse tree. An computer program (individual) is represented as a parse tree. If (time > 10) ris = ; ris = ;else ris = ; ris = ;

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 5 How GP works. Genetic programming, uses four steps to solve problems: I. Generate an initial population of random compositions of the functions and terminals of the problem (computer programs). II. Execute each program in the population and assign it a fitness value according to how well it solves the problem. III. Create a new population of computer programs by applying genetic operators (mutation, crossover, etc.) to some selected tree (best fit trees are selected most likely) IV. The best computer program that appeared in any generation, the best-so-far solution, is designated as the result of genetic programming.

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 6 Flow Chart of GP

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 7 Crossover example X + Y + 3 x / y x / 2y x + y

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 8 Mutation example X + Y + 3 X + Y + Y * (X / 2)

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 9 Preparatory steps  Determine the representation scheme: – set of terminals (ex: {x, y, z}) – set of functions (ex: {=, +, -, *, /})  Determine the fitness measure.  Determine the parameters – Population size, number of generations – Number of atoms in program – Probability of crossover, mutation, reproduction  Determine the criterion for terminating a run (max number of generations or exact solution).

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 10 GP Best Results   Creation of four different algorithms for the transmembrane segment identification problem for proteins.   Automatic decomposition of the problem of synthesizing a crossover filter.   Synthesis of 60 and 96 decibel amplifiers.   Creation of soccer-playing program that ranked in the middle of the field of 34 human-written programs in the Robo Cup 1998 competition.

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 11 GP Best Results and atypical fitness evaluation Art and GP Board Games and GP

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 12 Why Parallel GP  The search of solutions is implicitly parallel  Hard problems require large populations  Time requirements and Memory Requirements for GP (scalability)  Locality in the selection operator could help mantaining diversity (convergence)

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 13 Models of Parallel GP (Global) Master Slave SlaveSlave Slave  No distribution of population  Master: select, cross, mutate  Slaves: evaluate fitness  Convergence is the same of sequential GP

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 14 Models of Parallel GP (Island and Cellular) Island Model Cellular Model

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 15CAGE   CAGE (CellulAr GEnetic programming tool) is a parallel tool for the development of genetic programs (GP).   It is implemented using the cellular model on a general purpose distributed memory parallel computer.   CAGE is written in C, using the MPI Libraries for the communications between the processors.   It can also run on a PC with Linux operating system.

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 16 CAGE Implementation Processor 0 Processor 1 Processor 2 single individual The population is arranged in a two-dimensional grid, where each point represents a program tree. represents a program tree. CAGE uses a one-dimensional domain decomposition along the x direction.

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 17 CAGE Implementation For each element in the grid: Mutation and unary operators are applied to the current tree Crossover choices as second parent, the best tree among the neighbours (Moore neighbourhood). It is applied a policy of replacement. The chosen individual is put in the new population in the same position of the old one. We have three replacement policies (applied to the result of crossover):   Greedy   Direct   Probabilistic (SA)

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 18 Convergence analysis CAGE was tested on some standard problems: Symbolic Regression, Discovery of Trigonometric Identities, Symbolic Integration, Even 5-Parity, Artificial Ant and Royal Tree. We averaged the tests over 20 runs and used a population of 3200 individual (1600 for Symbolic Regression and Integration). Parameter used in the experiments (selection method was Greedy for CAGE and fitness proportionate for canonical).

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 19 Symbolic regression consists in searching for a non trivial mathematical expression that, given a set of value x i for the independent variable, it always assumes the corresponding values yi for the dependent variable. The target function for our experiments is: X 4 + X 3 + X 2 + X A sample of 20 points with the X i in the range [-1 1] was chosen to compute the fitness. Symbolic Regression

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 20 Symbolic Regression CAGE vs CanonicalDifferent population sizes

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 21 Symbolic Integration Symbolic Integration consists in searching for a symbolic mathematical expression that is the integral of a given curve. The target function for our experiments was: cosx + 2x + 1 A sample of 50 points with X i in the range [0 2  ] was chosen to compute the fitness.

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 22 Symbolic Integration CAGE vs CanonicalDifferent population sizes

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 23 Even-4 and Even-5 Parity In the Even-4 and Even-5 Parity we want to obtain a boolean function that receives 4 (5) boolean variables and gives true only if an even number of variables is true. The fitness cases are the 2 4 (2 5 ) combinations of the variables. The fitness is the sum of the Hamming distance between the goal function and the solution found.

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 24 Even-4 Parity CAGE vs CanonicalDifferent population sizes

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 25 Even-5 Parity CAGE vs CanonicalDifferent population sizes

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 26 Ant (Santa Fe Trail) The ant problem consists in finding the best strategy for an ant that wants to eat all the food contained in a 32x32 matrix. We used the Santa Fe trail containing 89 pieces of food. The fitness is the sum of pieces not eaten in a fixed number of moves.

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 27 Ant (Santa Fe Trail) CAGE vs CanonicalDifferent population sizes

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 28 Royal Tree The Royal Tree Problem is composed from a series of functions a, b, c, … with increasing arity. The fitness is the score of the root. Each function computes the scores by summing the weighted scores of the children. If the child is not a perfect tree, the score is multiplied for a penalty factor. The problem has a unique solution, we stopped at level-e tree (326 nodes and of score).

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 29 Royal Tree CAGE vs CanonicalDifferent population sizes

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 30 Related Work  No approaches using the grid model and a few with the island model can be found in literature.  1,000-Pentium Beowulf-Style Cluster Computer for Genetic Programming  Niwa and Iba describe a parallel island model realised on a MIMD supercomputer and show experimental results for three different topologies: ring, one way and two way torus (the best).  Punch discusses the conflict results using multiple populations, for the Ant and the Royal Tree problem.  We run CAGE with the same parameters of these two islands model in order to compare the convergence.

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 31 Niwa and Iba (cos2x) CAGENiwa and Iba We obtained a fitness value of 0.1 in the 20 th generations, instead of 62 th of Niwa (ring topology).

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 32 Niwa and Iba (Even-4 Parity) CAGENiwa and Iba At the 100 th generation Niwa has a fitness of 1.1, while our approach is very close to 0.

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 33 Convergence Analysis (fitness diffusion)

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 34 Scalability (Isoefficiency metric) Experimental method Experimental method  External criterium: evaluating how much a priori known similarities are recognized  Classes of structurally homogeneous documents all documents in each class are conform to the same DTD different classes correspond to different DTDs Test results Test results  Picture of the similarity matrix, where pixel the grey levels are proportional to the corresponding values in the matrix darker pixels correspond to higher similarity values  Quantitative measures average intra-class similarity, for each class average inter-class similarity, for each couple of classes

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 35 Scalability results

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 36 Classification (Preliminary results)  Genetic Programming is suitable for Data Classification.  Good capacity to generalise.  The dimension of solutions is smaller than See5.  Needs large populations for real datasets.  Bagging and boosting to partition datasets.

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 37 DECISION TREES AND GP NODES  ATTRIBUTES  FUNCTIONS ARCS  ATTRIBUTE VALUES  ARITY OF THE FUNCTIONS LEAFS  CLASSES  TERMINAL

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 38 Grid and Parallel Asynchronous GP  Using grid for supercomputing (idle, etc...)  Computational grid needs applications  Problems (Different computational power,… Drawbacks Classical parallel algorithms: need large bandwidth, synchronism, etc..)  Parallel Asynchronous Cellular GP

What is GP Scalability Convergence CAGE Future WorksWhy Parallel GP 39 Information Retrieval Query Expansion and Specific Domain Search Engine Problem: How do we combine the words? Answer: Use GP to add keywords with operators AND, not, or, near). Problem: How do we combine the words? Answer: Use GP to add keywords with operators AND, not, or, near). Alternative: Specify query using natural language and specifying with operators. Alternative: Specify query using natural language and specifying with operators.