Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 December 12, 2009 Robust Asynchronous Optimization for Volunteer Computing Grids Department of Computer Science Department of Physics, Applied Physics.

Similar presentations


Presentation on theme: "1 December 12, 2009 Robust Asynchronous Optimization for Volunteer Computing Grids Department of Computer Science Department of Physics, Applied Physics."— Presentation transcript:

1 1 December 12, 2009 Robust Asynchronous Optimization for Volunteer Computing Grids Department of Computer Science Department of Physics, Applied Physics and Astronomy Rensselaer Polytechnic Institute E-Science 2009 December 12, Oxford, UK Travis Desell, Malik Magdon-Ismail, Boleslaw Szymanski, Carlos Varela, Heidi Newberg, Nathan Cole

2 December 12, 20092 Overview Introduction Motivation Driving Scientific Application Asynchronous Genetic Search Why asynchronous? Methodology Recombination Particle Swarm Optimization ‏Generic Optimization Framework Approach Architecture Results Convergence Rates Re-computation rates Conclusions & Future Work Questions?

3 December 12, 20093 Motivation Scientists need easily accessible distributed optimization tools Distribution is essential for scientific computing Scientific models are becoming increasingly complex Rates of data acquisition are far exceeding increases in computing power Traditional optimization strategies not well suited to large scale computing Lack scalability and fault tolerance

4 December 12, 20094 Astro-Informatics Observing from inside the Milky Way provides 3D data: SLOAN digital sky survey has collected over 10 TB data. Can determine it's structure – not possible for other galaxies. Very expensive – evaluating a single model of the Milky Way with a single set of parameters can take hours or days on a typical high-end computer. Models determine where different star streams are in the Milky Way, which helps us understand better its structure and how it was formed. What is the structure and origin of the Milky Way galaxy?

5 Computed Paths of Sagittarius Stream

6 December 12, 20096 Separation of Concerns Distributed Computing Optimization Scientific Modeling “Plug-and-Play” Simple & generic interfaces Generic Optimization Framework

7 December 12, 20097 Two Distribution Strategies Asynchronous evaluations Results may not be reported or reported late No processor dependencies o Faults can be ignored Grids & Internet Single parallel evaluation Always uses most evolved population Can use traditional methods o Faults require recalculation o Grids require load balancing Supercomputers & Grids

8 December 12, 20098 Asynchronous Architecture‏ Scientific Models Distributed Evaluation Framework Search Routines … Evaluator Creation Data Initialisation Integral Function Integral Composition Likelihood Function Likelihood Composition Initial Parameters Optimised Parameters BOINC (Internet)‏SALSA/Java (RPI Grid)‏ Evaluator (N)‏ Work Request Results Work Request ResultsWork Evolutionary Methods Genetic Search Particle Swarm Optimisation … Evaluator (1)‏

9 December 12, 20099 GMLE Architecture (Parallel-Asynchronous)‏ Results … Distribute Parameters Combine Results Evaluator (1)‏ Evaluator (N)‏ Evaluator (2)‏ Work Request Worker (1)‏ Results … Distribute Parameters Combine Results Evaluator (1)‏ Evaluator (M)‏ Evaluator (2)‏ Work Request Worker (Z)‏ … Communication Layer BOINC - HTTPGrid - TCP/IPSupercomputer - MPI Search Routines MPI

10 December 12, 200910 Issues With Traditional Optimization Traditional global optimization techniques are evolutionary, but dependent on previous steps and are iterative Current population is used to generate the next population Dependencies and iterations limit scalability and impact performance With volatile hosts, what if an individual in the next generation is lost? Redundancy is expensive Scalability limited by population size

11 December 12, 200911 Asynchronous Optimization Strategy Use an asynchronous methodology No dependencies on unknown results No iterations Continuously updated population N individuals are generated randomly for the initial population Fulfil work requests by applying recombination operators to the population Update population with reported results

12 December 12, 200912 Asynchronous Search Strategy‏ Work QueuePopulation Workers Generate members from population Request work when queue is low Parameter Set (1)‏ Parameter Set (2)‏ Parameter Set (n)‏.......... Fitness (1)‏ Fitness (2)‏.......... Fitness (n)‏ Unevaluated Parameter Set (1)‏ Unevaluated Parameter Set (2)‏ Unevaluated Parameter Set (m)‏.......... Report results and update population Send work Request work

13 December 12, 200913 Asynchronous Genetic Search Operators (1)‏ Average Simple operator for continuous problems Generated parameters are the average of two randomly selected parents Mutation Takes a parent and generates a mutation by randomly selecting a parameter and mutating it

14 December 12, 200914 Asynchronous Genetic Search Operators (2)‏ Double Shot - two parents generate three children Average of the parents Outside the less fit parent, equidistant to parent and average Outside the more fit parent, equidistant to parent and average

15 December 12, 200915 Asynchronous Genetic Search Operators (3)‏ Probabilistic Simplex N parents generate one or more children Points placed randomly along the line created by the worst parent, and the centroid (average) of the remaining parents

16 16 December 12, 2009 Particle Swarm Optimization Particles ‘fly’ around the search space. They move according to their previous velocity and are pulled towards the global best found position and their locally best found position. Analogies: cognitive intelligence (local best knowledge) social intelligence (global best knowledge) 16

17 17 December 12, 2009 Particle Swarm Optimization PSO: v i (t+1) = w * v i (t) + c 1 * r 1 * (l i - p i (t)) + c 2 * r 2 * (g - p i (t)) p i (t+1) = p i (t) + v i (t+1) w, c 1, c 2 = constants r 1, r 2 = random float between 0 and 1 v i (t) = velocity of particle i at iteration t p i (t) = position of particle i at iteration t l i = best position found by particle i g = global best position found by all particles 17

18 18 December 12, 2009 Asynchronous PSO Generating new positions does not necessarily require the fitness of the previous position 1. Generate new particle or individual positions to fill work queue 2. Update local and global best on results PSO: If result improves particle’s local best, update local best, particle’s position and velocity of the result 18

19 19 December 12, 2009 Particle Swarm Optimization (Example) 19 previous: p i (t-1) current: p i (t) local best global best c 1 * (l i - p i (t)) c 2 * (g - p i (t)) w * v i (t) velocity: v i (t) possible new positions

20 20 December 12, 2009 Particle Swarm Optimization (Example) 20 previous: p i (t-1) current: p i (t) local best global best c 2 * (g - p i (t)) w * v i (t) velocity: v i (t) possible new positions Particle finds a new local best position and the global best position previous: p i (t-1) current: p i (t) local best global best velocity: v i (t) new position

21 21 December 12, 2009 Particle Swarm Optimization (Example) 21 c 2 * (g - p i (t)) w * v i (t) possible new positions c 1 * (l i - p i (t)) previous: p i (t-1) current: p i (t) local best global best velocity: v i (t) Another particle finds the global best position

22 22 December 12, 2009 22 Population Fitness (1) Fitness (2) Fitness (n)................ Individual (1) Individual (2) Individual (n)................ Unevaluated Individuals Unevaluated Individual (1) Unevaluated Individual (2) Unevaluated Individual (n)................ Workers (Fitness Evaluation) Report results and update population Request Work Send Work Generate individuals when queue is low Local and global best updated if new individual has better fitness Select individual to generate new individual from in round-robin manner Asynchronous PSO

23 Computing Environment: Milkyway@home http://milkyway.cs.rpi.edu BOINC Einstein@home, SETI@home, etc >50,000users; 80,000 CPUs; 600 teams; from 99 countries; Second largest BOINC computation (among 100’s) About 500 Teraflops Donate your idle computer time to help perform our calculations. December 12, 2009 23

24 24 December 12, 2009 MilkyWay@Home – Growth of Power 24

25 December 12, 200925 Computing Environments - BOINC MilkyWay@Home: http://milkyway.cs.rpi.edu/MilkyWay@Home Multiple Asynchronous Workers Approximately 10,000 – 30,000 volunteered computers engages at a time Asynchronous architecture used Asynchronous Evaluation Volunteered computers can queue up to 20 pending individuals Population updated when results reported Individuals may be reported slowly or not at all

26 December 12, 200926 Handling of Work Units by the BOINC‏ Server

27 User Participation Users do more than volunteer computing resources (Citizen’s Science): Open-source code gives users access to the MilkyWay@Home application Users have submitted many bug reports, fixes, and performance enhancements A user even created an ATI GPU capable version of the MilkyWay@Home application Forums provide opportunities for users to learn about astronomy and computer science December 12, 2009

28 Malicious/Incorrect Result Verification With open-source application code, users can compile their own compiler-optimized versions and many do. However, there is also the possibility of users returning malicious results BOINC traditionally uses redundancy on every result to verify their correctness. This requires at least 2 results for every work unit! Asynchronous search doesn't require all work units to be verified, only those which improve the population We reduce the redundancy by comparing a result against the current partial results. December 12, 2009

29 29 December 12, 2009 Limiting Redundancy (Genetic Search) 29 60% verification found best solutions Increased verification reduces reliability Reliability and convergence by number of parents seems dependent on verification rate

30 30 December 12, 2009 Limiting Redundancy (PSO) 30 30% verification found best solutions Increased verification reduces reliability Not as dramatically as AGS Lower inertia weights give better results

31 31 December 12, 2009 Optimization Method Comparison 31 APSO found better solutions than AGS. APSO needed lower verification rates and was less effected by different verification rates.

32 December 12, 200932 Conclusions Asynchronous search is effective on large scale computing environments Fault tolerant without expensive redundancy Asynchronous evaluation on heterogeneous environment increases diversity BOINC converges almost as fast as the BlueGene, while offering more availability and computational power Even computers with slow result report rates are useful Particle Swarm and Simplex-Genetic Hybrid methods provide significant improvement in convergence

33 December 12, 200933 Future Work Optimization Use report times to determine how to generate individuals Simulate asynchrony for benchmarks Automate selection of parameters Distributed Computing Parallel asynchronous workers Handle Malicious “Volunteers” Continued Collaboration http://www.nasa.gov

34 December 12, 200934 Questions?


Download ppt "1 December 12, 2009 Robust Asynchronous Optimization for Volunteer Computing Grids Department of Computer Science Department of Physics, Applied Physics."

Similar presentations


Ads by Google