Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali.

Similar presentations


Presentation on theme: "Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali."— Presentation transcript:

1 Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali Liu, and Jianjiang Li Information Engineering School, University of Science and Technology Beijing, Beijing, P.R.China

2 Outline 1 Motivation 2 Related Works
3 Spatial Decomposition Coloring (SDC) Approach 4 Short-Range Forces Calculations of EAM using SDC method 5 Experiments and Discussion 6 Conclusion and Future Directions

3 1 Motivation The process of molecular dynamics simulations
calculate forces calculate new positions of atoms set init_state Fig the process of molecular dynamics simulations.

4 1 Motivation the intensive computation appears in short-range force calculations procedure of MD simulations Neighbor-list method decreases the intensive computation largely. It make each atom only interacts with atoms in its neighbor region. Newton’s third law can have the force computations. And it brings the reduction operations on irregular arrays Fig. 2 codes of force caluclations.

5 2 Related Works --- parallel reduction operations on irregular arrays
Some types of solutions enclosing reduction operation in a critical section privating the reduction array using redundant computations strategy

6 2 Related Works --- parallel reduction operations on irregular arrays
enclosing reduction operation in a critical section create a critical section in inner loop straight and easy to implement parallelization. high synchronization cost arose by critical region, atomic or lock involved in inner loop

7 2 Related Works --- parallel reduction operations on irregular arrays
private the reduction array each thread have to update share array in critical region according the value of its private array it reduce times of entering into critical region and reduce synchronization cost. high memory overhead of private array limit number of particles allowed in simulations compete for cache space and decrease program speed

8 2 Related Works --- parallel reduction operations on irregular arrays
redundant computations strategy does not use Newton’s third law. So each pair interaction has to be calculated twice. the high parallelizability since data dependence has been removed between the loop iterations there are double computations and that neighbor list requires more memory space.

9 3 Spatial Decomposition Coloring (SDC) Approach
Spatial Decomposition (SD) method distributed memory multi-processors involving several hundreds of processors change all array declarations and all loop bounds, and explicitly codes the periodic transfer of the boundary data between processors. It is simple to implement SD in OpenMP.

10 3 Spatial Decomposition Coloring (SDC) Approach
SD method places a restriction on parallelism in OpenMP. synchronization will be required to ensure that multiple threads do not attempt to update the same atom simultaneously. Fig. 3 SD method.

11 3 Spatial Decomposition Coloring (SDC) Approach
SDC method SDC method consists of the following steps Step 1): Split domain Step 2): Coloring subdomains Step 3): Parallel Computing

12 3 Spatial Decomposition Coloring (SDC) Approach
SDC method SDC method consists of the following steps Step 1): Split domain Split the spatial domain into subdomains. Length of a subdomain must be longer than diameter. Number of subdomains in dimension decomposed should be even.

13 3 Spatial Decomposition Coloring (SDC) Approach
SDC method SDC method consists of the following steps Step 2): Coloring subdomains The number of subdomains with each color must be equal each subdomain is surrounded only by those subdomains with different colors.

14 3 Spatial Decomposition Coloring (SDC) Approach
SDC method SDC method consists of the following steps Step 3): Parallel Computing Calculations of forces on subdomains with one color can be run in parallel. a barrier should be given for waiting all threads to complete computation on this color. Calculations on subdomains with different colors must run in a serial fashion.

15 3 Spatial Decomposition Coloring (SDC) Approach
SDC method advantage neighbor list usually doesn’t be updated in every time-step Cost of SDC method is very lowest. higher-dimensional decomposition method creates more subdomains. scalable and suitable on multi-core and many-core architectures. disadvantage Spatial Decomposition method Overload imbalance  under condition of simulation system has uniformity of density

16 4 Short-Range Forces Calculations of EAM using SDC method
EAM method short-range forces the intensive computation three computational phases the most time consuming parts are 1 and 3 Fig. 4 short-range forces in EAM method.

17 4 Short-Range Forces Calculations of EAM using SDC method
The parallel procedure of short-range forces calculations using SDC method 1) Run electron density computations using SDC method 2) Calculate embedding function value and their derivative in parallel 3) Run force calculations using SDC method

18 4 Short-Range Forces Calculations of EAM using SDC method
force calculations based on SDC method L1: computations on subdomains with different color L2 : computations on subdomains with same color L3 deals with all atoms that constitute a subdomain L4 deals with neighbors of a atom Fig. 5 forces calculations using SDC.

19 5 Experiments and Discussion
Experimental environment Four Intel Xeon(R) Quad-core E7320 (L2 Cache 4MB) processors, 16 GB memory OS is Fedora release 9 with kernel The compiler is gcc Experimental cases observe micro-deformation behaviors of pure Fe metals material ---came from XMD program under periodic boundary conditions initial state -- body-centered cubic (bcc) lattice arrangement test cases Small-scale case (1): ,000 atoms Medium-scale case (2): ,302 atoms Large-scale case (3): 1,062,882 atoms Large-scale case (4): 3,456,000 atoms

20 Medium case (2) on 2~16 cores
Table The Speedups of Spatial Decomposition Coloring (SDC) Methods Speedup Small case (1) on 2~16 cores Medium case (2) on 2~16 cores 2 3 4 8 12 16 SDC (one-dim) 1.71 2.46 3.07 4.17 1.84 2.64 3.37 6.24 6.33 SDC (two-dim) 1.70 4.74 5.90 6.43 2.65 3.39 6.20 8.89 10.90 SDC (three-dim) 1.66 2.40 2.99 4.61 5.74 6.30 1.82 3.36 6.16 8.76 10.78 Large case (3) on 2~16 cores Large case (4) on 2~16 cores 1.86 2.76 3.67 6.82 9.76 9.59 1.88 2.79 3.66 9.97 9.82 1.87 2.78 3.64 6.74 9.73 12.31 2.80 3.65 6.77 9.84 12.42 2.75 6.64 9.65 12.29 12.34

21 5 Experiments and Discussion
the scalability of our SDC method. performance of multi-dimensional SDC method has been improved with the increase in the number of cores and the increase in the number of atoms. performance of SDC methods. We can see that two-dimensional SDC method achieves highest efficiency. two-dimensional decomposition algorithm strives to make subdomains with small surface area and large volume, which results in better cache locality compared to the one-dimensional decomposition strategy. three-dimensional SDC method slightly degrades the performance due to the more overhead of fork-join threads and scheduling.

22 Fig The speedup of two-dimensional Spatial Decomposition Coloring (SDC) method, Critical Section (CS) method, Share Array Privatization (SAP) method and Redundant Computations (RC) method.

23 5 Experiments and Discussion
SDC method achieves a nearly linear speedup and highest speedup than other methods The reason of nearly linear speedup is that the low synchronization cost of implicit barriers in our method can be amortized over a large amount of computation. CSmethod achieves lowest efficiency. CS method encloses reduction operations on irregular array in critical section. SAPmethod performance degrade with the increase of the number of executing cores. memory overhead+synchronization overhead RC VS SDC there is nearly two-fold computation work for the short-range force calculations in RC method than in SDC method, the efficiency of RC method is low than that of SDC method.

24 Conclusion and Future Directions
A scalable spatial decomposition coloring (SDC) method To solve a class of short-range force calculations problems on shared memory multi-core platforms It is scalable not only to large simulation system but also to many-core architectures Future directions To study SDC method on NUMA memory architecture To implement SDC method using MPI+OpenMP in multi-core cluster

25 Thank You !


Download ppt "Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali."

Similar presentations


Ads by Google