Combinatorial Optimization on the Computational Grid Experiments on Grid5000 Nouredine Melab ( ) Member of Grid5000 steering committee Laboratoire d’Informatique Fondamentale de Lille Parallel Cooperative Optimization Research Group INRIA DOLPHIN Project
Combinatorial optimization problems High-dimensional and complex optimization problems in many areas of industrial concern Parallel hybrid optimization methods allow to efficiently provide effective solutions, but they remain insufficient for large problems … … Need of large scale parallelism (Grid computing) (Multi-Objective) Const. (Mono-Objective) ()
A taxonomy of optimization methods Exact algorithmsHeuristics Branch and X Dynamic Programming A* Specific Heuristics Meta-heuristics Single Solution Population of solutions Local Search Simulated Annealing Tabu Search Evolutionary Algorithms Scatter, Swarm search Near-optimal solutions for large problem instances Optimal solutions for small problem instances
Design and implementation of Grid-based algorithms … Meta-heuristics (near-optimal) Parallel hybrid design … solving challenging problems in combinatorial optimization Exact algorithms Parallel design Implementation Cooperation Implementation Protein Structure Prediction Flow-Shop scheduling problem Supported by ANR-GRID DOCK Supported by ACI-GRID DOC-G Combinatorial Optimization on the Computational Grid Experiments on Grid5000 Supported by ANR-GRID CHOC
Meta-heuristics: Parallel models and hybridization mechanisms Parallel models They allow to improve efficiency and effectiveness Population-based meta-heuristics Island model, parallel evaluation of the population, parallel evaluation of a single solution Single solution-based meta-heuristics Multi-start model, parallel exploration of the neighborhood, parallel evaluation of a single solution Hybridization mechanisms … … allow to combine different methods for better robustness and effectiveness, but are CPU-time intensive N. Melab, E-G. Talbi, S. Cahon, E. Alba and G. Luque. Parallel Meta-heuristics: Algorithms and Frameworks. Chapter 6 in “Parallel Combinatorial Optimization”, Wiley Series on Parallel and Distributed Computing, ISBN: , Nov 2006.
“Gridification” of parallel hybrid meta-heuristics Major properties of computational grids Multi-administrative domain, heterogeneity, dynamic availability of resources, large scale Major adaptations of the different models and mechanisms Asynchronous design and implementation Granularity management and load balancing Checkpointing-based fault tolerance (a memory for each model) Adaptation of the parameters of each model (e.g. migration topology for the island model) N. Melab, S. Cahon and E-G. Talbi. Grid Computing for Parallel Bioinspired Algorithms. Journal of Parallel and Distributed Computing (JPDC), Elsevier Science, Vol.66(8), Pages , 2006.
Our contributions Multi-Objective EO (MOEO) for the design of multi-objective evolutionary algorithms Moving Objects (MO) for the design of local search algorithms ParadisEO for parallel hybrid metaheuristics PARAllel and DIStributed Evolving Objects Message passing (MPI, PVM) Clusters, Networks of Workstations, Multi-programming (PThreads) Shared Memory Multi-processors (SMP) Parallel distributed computing Clusters of SMPs (CLUMPS) Grid computing Condor-MW and Globus (MPICH-G2) EO MOMOEO PVM, PThreads MPI (LAM, CH) Condor-MW Globus S. Cahon, N. Melab and E-G. Talbi. ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics. Journal of Heuristics, Elsevier Science, Vol.10(3), pages , May Evolving Objects framework ( EO ) European project (Geneura Team, INRIA, LIACS) Transparent use
ParadisEO-G4: ParadisEO on Globus 4 Design and implementation Gridification of the parallel models and hybridization mechanisms provided in ParadisEO MPICH-G2 as the communication library Deployment on the computational Grid (Grid5000) Building of system image for Globus 4 including MPICH-G2 Virtual Globus Grid on Grid5000 for the Grid-based deployment of the parallel hybrid meta-heuristics provided in ParadisEO
Design and implementation of Grid-based algorithms … Meta-heuristics (near-optimal) Parallel hybrid design … solving challenging problems in combinatorial optimization Exact algorithms Parallel design Implementation Cooperation Implementation Protein Structure Prediction Flow-Shop scheduling problem Supported by ANR-GRID DOCK Supported by ACI-GRID DOC-G Combinatorial Optimization on the Computational Grid Experiments on Grid5000 Supported by ANR-GRID CHOC
Protein Structure Prediction on the Grid Modelling The problem consists in finding … … the ground-state (tertiary stable) conformation of a protein from its primary structure composed of a sequence of amino-acids (residues) Modelled as a bi-objective optimization problem Candidate solutions: Molecular conformations (geometries) – vectors of torsion angles Molecular conformation with lower free energies (bonded atoms and non-bonded atoms)
Protein Structure Prediction on the Grid Complexity and landscape analysis For a molecule of 40 residues with 10 conformations per residue, conformations are obtained in average … years are required at conformations explored per second! Landscape analysis Multi-modal landscape Need of parallel hybrid (global and local) meta- heuristics and Grid computing
Parallel evaluation of the population High-level co-evolutionary hybridization Multi-start model High-level co-evolutionary hybridization Cooperative GAs (Island model) Parallel asynchronous hierarchical hybrid meta-heuristic A-A. Tantar, N. Melab, E-G. Talbi, O. Dragos and B. Parain. A Parallel Hybrid Genetic Algorithm for Protein Structure Prediction on the Computational Grid. FGCS, Elsevier Science, Vol.23(3), , ∂1∂1 ∂2∂2 ∂n∂n ∂' 1 ∂' 2 ∂' n Genetic Algorithm Population Local Search Optimized Individual
Grid5000: 7 sites, Avg. 800 CPUs – Execution time: 1h – Cumul. time: 1 month Preliminary experimental results on Grid5000 Implementation with ParadisEO-G4 Protein: Tryptophan-cage from Protein Data Bank (PDB - 1L2Y) Average Quality Improvement: 62%
Interconnection Grid5000-DAS Benefits More resources for dealing with very large proteins with grid-based meta-heuristics New scientific challenge: scalability of ParadisEO-G Requirements Need of a virtual Globus grid between Grid5000 and DAS Common certification authority ? Get longer the default run time of jobs in DAS Deployment time of the virtual Globus grid ~ 10 minutes Only 5 minutes for the combinatorial optimization process on DAS !!
Design and implementation of Grid-based algorithms … Meta-heuristics (near-optimal) Parallel hybrid design … solving challenging problems in combinatorial optimization Exact algorithms Parallel design Implementation Cooperation Implementation Protein Structure Prediction Flow-Shop scheduling problem Supported by ANR-GRID DOCK Supported by ACI-GRID DOC-G Combinatorial Optimization on the Computational Grid Experiments on Grid5000 Supported by ANR-GRID CHOC
Parallel models for exact optimization (B&B inspired) B&B = Exploration + bounding of tree nodes Parallel models Parallel multi-parametric model Parallel exploration of the search tree Parallel evaluation of the bounds Parallel evaluation of a single bound/solution Parallel exploration of the search tree Massive parallelism needing a computational grid Gridification is required
Efficient work distribution during the exploration Need of low cost communications of work units Efficient checkpointing-based Fault tolerance Search of an exact solution in a volatile environment Low cost communication and storage of work units Efficient termination detection May be implicit The proposed approach: objectives
The approach uses a special coding … Node number Work unit (collection of nodes) = an interval Principles of the approach [0,2] [3,5] [0,5] The approach is Dispatcher-Worker based on the work stealing paradigm Dispatcher: maintains a pool of work units (intervals) and the global solution found so far Worker: performs B&B on a given interval and updates the global solution Work distribution and check-pointing Communication of intervals (two numbers) Two efficient operators: folding and unfolding of intervals
Design and implementation of Grid-based algorithms … Meta-heuristics (near-optimal) Parallel hybrid design … solving challenging problems in combinatorial optimization Exact algorithms Parallel design Implementation Cooperation Implementation Protein Structure Prediction Flow-Shop scheduling problem Supported by ANR-GRID DOCK Supported by ACI-GRID DOC-G Combinatorial Optimization on the Computational Grid Experiments on Grid5000 Supported by ANR-GRID CHOC
N jobs to be scheduled on M machines Each machine can not be simultaneously assigned to two jobs (colors) Jobs (colors) must be scheduled in the same order on all machines One objective must be minimized Cmax: Makespan (Total completion time) M1M1 M2M2 M3M3 The Flow Shop Scheduling Problem 4 jobs on 3 machines
Network of the campus of Université de Lille1 123 FIL (Lille1) 170 IUT A A grid of more than 2000 processors Grid5000 node at Lille RENATER NR... NR Other sites of GRID’5000 Grid’5000 Front-end IP forwarding NAT Dispatcher on a computation node
Experimental results Standard Taillard ’ s benchmark: Ta jobs on 20 machines Best known solution: 3681, Ruiz & Stutzle, 2004 Exact solution: 3679, Mezmaz, Melab & Talbi, 2006 Running wall clock time: 25 days 46 minCPU time on a single processor: 22 years 185 days 16 hours Avg. num. of exploited processors: 328Maximum number of exploited processors: Parallel efficiency: 97 %Bordeaux (88), Orsay (360), Sophia (190), Lille (98), Toulouse (112), Rennes (456), Univ. Lille1 (304) M. Mezmaz, N. Melab, E-G. Talbi. A Grid-enabled Branch and Bound Algorithm for Solving Challenging Combinatorial Optimization Problems. Research Report, INRIA 5945, July 2006 (
Interconnection Grid5000-DAS Benefits More resources for solving efficiently and optimally larger problem instances with grid-based combinatorial optimization New scientific challenge: scalability (limits and solutions) The dispatcher has never crashed on Grid5000 (up to 2500 processors) Requirements Avoiding the special configuration of the front-end to allow transparent inter-grid communications between the dispatcher and the workers Viewing DAS as a Grid5000 site and vice versa ? Best-effort reservation mode in DAS Long-running problems Using the nodes as long as they are not requested for reservation