Genetic Algorithms by using MapReduce Fei Teng Doga Tuncay 12/5/2011
Outline Onemax problem Hadoop genetic algorithm Twister genetic algorithm Performance discussion References
Onemax problem Tries to maximize the number of ones of a bitstring. Formally, can be described as finding a string that maximizes the following equation:
Hadoop genetic algorithm Make hadoop to support iterative mapreduce – Start new job for each iteration – Put iterative output in HDFS – Override interfaces to make customized value type – Map input key-value pair – Reduce input key-value pair
Hga dataflow Client Mappers Reducers HDFS Sub populations … Initial population
Twister genetic algorithm Twister supports iterative sematic in nature – No file system and hard disk I/O involved – Use combiner to restore next generation population – Override interfaces to make new value type – Map output key-value pair – Reduce output key-value pair
Twister workflow Twister Driver Sub popul ation Map Reducer Map Combiner Intermediate New sub populations
Hadoop/Twister performance Testing configuration – Futuregrid 8 nodes x 8 cores CPU: 2.93G Mem: 24GB – Input size: 5120 genes – Gene length: 2KB – Both converge on the optimal point
Tga performance test Reducer is the key of performance – Because mappers just simply count the number of ones in each gene and emit them Testing environment – Quarry cluster – Ten nodes Mem: 16GB memory CPU: 2.33G x 8 cores
Tga performance results
Tga performance results(cont’d)
Discussion Hadoop GATwister GA PerformanceLow for GAHigh for GA ProgrammabilityStraightforward because the existence of HDFS and not easy to make mistake Must have a clear understanding about what is static data and what is the data flow of dynamic data Iterative supportNoYes ScalabilityGood according to [2]Good Configuration and testMany parameters to set and support unite test Easy to deploy but test mainly based on “printf” AdministrationAdmin and moniter by web brower Mainly by checking deamon/driver’s output
References [1] Chao Jin, Christian Vecchiola and Rajkumar Buyya MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms [2] Abhishek Verma, Xavier Llora, David E. Goldberg, Scaling Simple and Compact Genetic Algorithms using MapReduce [3] [4] Di-Wei Huang, Jimmy Lin, Scaling Populations of a Genetic Algorithm for Job Shop Scheduling Problems using MapReduce
Thank you Questions?