Presentation is loading. Please wait.

Presentation is loading. Please wait.

PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS Amanda Peters MIT 18.337 5/13/2009.

Similar presentations


Presentation on theme: "PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS Amanda Peters MIT 18.337 5/13/2009."— Presentation transcript:

1 PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS Amanda Peters MIT 18.337 5/13/2009

2 Outline Motivation Motivation Model Model GPU Implementation GPU Implementation Blue Gene Implementation Blue Gene Implementation Hardware Hardware Results Results Future Work Future Work

3 Motivation Why does cooperation evolve? Why does cooperation evolve? Examples: Examples: Total War vs. Limited War Total War vs. Limited War Quorum Sensing Bacteria Quorum Sensing Bacteria Pathogens Pathogens Goal of the project: Goal of the project: Create computational model to test role of behavioral strategies and related variables Create computational model to test role of behavioral strategies and related variables

4 Model Focus on finding evolutionarily stable strategies Focus on finding evolutionarily stable strategies Five strategies: Five strategies: Mouse Mouse Hawk Hawk Bully Bully Retaliator Retaliator Prober-Retaliator Prober-Retaliator Payoffs Payoffs Win +60 Win +60 Seriously Injured -100 Seriously Injured -100 Small Injuries Each -2 Small Injuries Each -2 Emerge from Short Game uninjured +20 Emerge from Short Game uninjured +20

5 Why parallelize it? Reduce computational time Reduce computational time Enable trials of more strategies Enable trials of more strategies Enable analysis of different variables roles Enable analysis of different variables roles Introduce more actions to the action space Introduce more actions to the action space

6 CUDA Implementation Embarrassingly parallel code Embarrassingly parallel code Distribute rounds of the game to different threads Distribute rounds of the game to different threads Only payoff array in global memory Only payoff array in global memory Copy it back for post processing Copy it back for post processing

7 Sample Code __global__ void gameGPU(int player1, int player2, float* d_payoff1, float* d_payoff2,float* rand_si, int max_rounds){ //Thread index __global__ void gameGPU(int player1, int player2, float* d_payoff1, float* d_payoff2,float* rand_si, int max_rounds){ //Thread index __global__ void gameGPU(int player1, int player2, float* d_payoff1, float* d_payoff2,float* rand_si, int max_rounds){ //Thread index //Thread index const int tid=blockDim.x * blockIdx.x + threadIdx.x; const int tid=blockDim.x * blockIdx.x + threadIdx.x; //Total number of threads in grid //Total number of threads in grid const int THREAD_N = blockDim.x * gridDim.x; const int THREAD_N = blockDim.x * gridDim.x; int max_moves=500; int max_moves=500; for (int round = tid; round < max_rounds; round += THREAD_N) for (int round = tid; round < max_rounds; round += THREAD_N) { play_round(player1, player2, d_payoff1[round], d_payoff2[round], rand_si[round],max_moves); play_round(player1, player2, d_payoff1[round], d_payoff2[round], rand_si[round],max_moves); }}

8 Blue Gene Implementation

9 System Overview

10 Design Fundamentals Low Power PPC440 Processing Core Low Power PPC440 Processing Core System-on-a-chip ASIC Technology System-on-a-chip ASIC Technology Dense Packaging Dense Packaging Ducted, Air Cooled, 25 kW Racks Ducted, Air Cooled, 25 kW Racks Standard proven components for reliability and cost Standard proven components for reliability and cost

11

12 BG/P 2.8/5.6 GF/s 4 MB 2 processors 2 chips, 1x2x1 5.6/11.2 GF/s 1.0 GB (32 chips 4x4x2) 16 compute, 0-2 IO cards 90/180 GF/s 16 GB 32 node cards 2.8/5.6 TF/s 512 GB 180/360 TF/s 32 TB (For the original 64 rack system) Rack System Node card Compute card Chip Blue Gene/L

13 13.6 GF/s 8 MB EDRAM 4 processors 1 chip, 20 DRAMs 13.6 GF/s 2.0 (or 4.0) GB DDR 32 Node Cards 14 TF/s 2 TB System 1 PF/s 144 TB Cabled 8x8x16 Rack Compute Card Chip 435 GF/s 64 GB (32 chips 4x4x2) 32 compute, 0-1 IO cards Node Card Blue Gene/P Key Differences:  4 cores per chip  Speed bump  72 racks (+8)

14 BG System Overview: Integrated system Lightweight kernel on compute nodes Linux on I/O nodes handling syscalls Optimized MPI library for high speed messaging Control system on Service Node with private control network Compilers and job launch on Front End Nodes

15 Blue Gene/L interconnection networks 3 Dimensional Torus Interconnects all compute nodes (65,536) Interconnects all compute nodes (65,536) Virtual cut-through hardware routing Virtual cut-through hardware routing 1.4Gb/s on all 12 node links (2.1 GB/s per node) 1.4Gb/s on all 12 node links (2.1 GB/s per node) Communications backbone for computations Communications backbone for computations 0.7/1.4 TB/s bisection bandwidth, 67TB/s total bandwidth 0.7/1.4 TB/s bisection bandwidth, 67TB/s total bandwidth Global Collective Network One-to-all broadcast functionality One-to-all broadcast functionality Reduction operations functionality Reduction operations functionality 2.8 Gb/s of bandwidth per link; Latency of tree traversal 2.5 µs 2.8 Gb/s of bandwidth per link; Latency of tree traversal 2.5 µs ~23TB/s total binary tree bandwidth (64k machine) ~23TB/s total binary tree bandwidth (64k machine) Interconnects all compute and I/O nodes (1024) Interconnects all compute and I/O nodes (1024) Low Latency Global Barrier and Interrupt Round trip latency 1.3 µs Round trip latency 1.3 µs Control Network Boot, monitoring and diagnostics Boot, monitoring and diagnosticsEthernet Incorporated into every node ASIC Incorporated into every node ASIC Active in the I/O nodes (1:64) Active in the I/O nodes (1:64) All external comm. (file I/O, control, user interaction, etc.) All external comm. (file I/O, control, user interaction, etc.)

16 C/MPI Implementation of Code Static Partitioning of work units Static Partitioning of work units work_unit = number_rounds/partition_size work_unit = number_rounds/partition_size Each node will get a chunk of the data Each node will get a chunk of the data Loops that in serial iterate over the length of the game will now be split up to handle specific rounds Loops that in serial iterate over the length of the game will now be split up to handle specific rounds ‘Bookkeeping Node’ ‘Bookkeeping Node’ MPI Collectives to coalesce data MPI Collectives to coalesce data

17 Pseudo Code Foreach species: Foreach species: gamePlay(var1…); gamePlay(var1…); MPI_Reduce(var1…); MPI_Reduce(var1…); If (rank==0) Calculate_averages(); If (rank==0) Calculate_averages(); If (rank==0) Print_game_results; If (rank==0) Print_game_results;

18 Results

19 Game Dynamics Evolutionarily Stable Strategies: Retaliator ~Prober-Retaliator Result: ‘Limited War’ is a stable and dominant strategy given individual selection

20 CUDA Implementation 97% time reduction

21 CUDA Implementation

22 Blue Gene Implementation 99% time reduction

23 Blue Gene Implementation

24 Future Directions Investigate more behavioral strategies Investigate more behavioral strategies Increase action space Increase action space CUDA implementation: data management CUDA implementation: data management Blue Gene implementation: Blue Gene implementation: Examine superlinearity Examine superlinearity Test larger problem sizes Test larger problem sizes Optimize single node performance Optimize single node performance


Download ppt "PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS Amanda Peters MIT 18.337 5/13/2009."

Similar presentations


Ads by Google