Presentation is loading. Please wait.

Presentation is loading. Please wait.

One-sided Communication Implementation in FMO Method J. Maki, Y. Inadomi, T. Takami, R. Susukita †, H. Honda, J. Ooba, T. Kobayashi, R. Nogita, K. Inoue.

Similar presentations


Presentation on theme: "One-sided Communication Implementation in FMO Method J. Maki, Y. Inadomi, T. Takami, R. Susukita †, H. Honda, J. Ooba, T. Kobayashi, R. Nogita, K. Inoue."— Presentation transcript:

1 One-sided Communication Implementation in FMO Method J. Maki, Y. Inadomi, T. Takami, R. Susukita †, H. Honda, J. Ooba, T. Kobayashi, R. Nogita, K. Inoue and M. Aoyagi Computing and Communications Center, Kyushu University † Fukuoka Industry, Science & Technology Foundation

2 Overview of FMO method (1) FMO(Fragment Molecular Orbital method) has been developed by Kitaura (AIST) and co-workers to calculate electronic states of a macromolecule such as a protein and nucleotide The target molecule is divided into fragments (monomers) ab initio molecular orbital (MO) calculation is carried out on each fragment and fragment pair (dimer). Usually executed with high parallel efficiency (>90%) The method can execute all-electron calculation on a protein molecule with 10,000 atoms.

3 Example of fragmentation

4 Hamiltonian monomer calculation Overview of FMO method (2) environmental electrostatic potential ( potential from other monomers) monomer calculation depends on other monomers electron density electron density of monomer J J J J iterated until all electron densities unchanged (SCC procedure)

5 Overview of FMO method (3) Schrødinger equation basis functions MO coefficients density matrix elements RHF (Restricted Hartree-Fock) SCF method Molecular Orbital (MO) electron density number of basis functions

6 Hamiltonian dimer calculation Total energy Overview of FMO method (4) environmental electrostatic potential Schrødinger equation not all dimers are calculated by SCF

7 dimer-es approximation For distant dimers, SCF calculation are not carried out Energy is calculated by electrostatic approximation monomer SCF dimer ES (non-SCF) dimer

8 SCC procedure convergence check of SCC procedure already converged not yet converged electronic structure calculation for monomer initial density calculation electronic structure calculation for SCF-dimer ES dimer calculation total energy calculation For all monomers For all SCF-dimers For all separated dimers Flow chart of FMO calculation

9 SCC procedure convergence check of SCC procedure already converged not yet converged electronic structure calculation for monomer initial density calculation electronic structure calculation for SCF-dimer ES-dimer calculation total energy calculation update monomer density matrices refer monomer matrices needed in electronic structure calculation refer two monomer matrices needed in dimer-ES calculation refer monomer matrices needed in electronic structure calculation update monomer density matrices For all monomers For all SCF-dimers For all separated dimers Update all monomer density matrices Refer some monomer density matrices Density requirements and update in FMO calculation

10 potential from the electrons of fragment K 4-center two electron integral calculations for all environment monomers are prohibitive matrix element of (needed in SCF calculation) involves 4 center tow electron integrals order of calculation costs number of basis functions Approximation of calculation (1)

11 esp-aoc approximation Approximation of calculation (2) overlapmatrix of fragment K Mulliken AO population of ( )

12 Approximation of calculation (3) esp-ptc approximation Mulliken atomic charge of the nucleus A point charge approx. of number of environment monomers for which 4-center two electron integral calculations are carried out

13 Hypothetical petascale computing environment Number of fragments: 10,000-100,000 ~1,000)(the current largest Number of CPUs: 1PFlops = 100,000 CPU (current one CPU peak performance ~10GFlops)

14 Memory requirement in FMO R IJ 1,000350MB4MB 5,0001.7GB95MB 10,0003.4GB380MB 50,00017GB9.3GB 100,00034GB37GB Number of fragments It is difficult for each process to store all necessary data Memory per one CPU core> 10 GB × The size of one density matrix ≒ 350KB R IJ Interfragment distance data ≒ / 2

15 OSC implementation in OpenFMO (1) node00 01 node01 23 node02 45 node03 67 node04 89 Group00Group01Group02 In its execution, only the process of rank 0 is used as a server process for dynamic load balancing. All the other processes are worker processes and are divided into groups and the two-level parallelization is used as the other implementations.

16 OSC implementation in OpenFMO (2) node00 01 node01 23 node02 45 node03 67 node04 89 Group00Group01Group02 memory window created by MPI_Win density matrix data MPI_Get

17 Estimation of communication cost (1) Assumptions 1.The sizes of all density matrices are the same.All groups have the same number of worker processes. 2.The time to put or get one density matrix is equal to the average point-to-point communication time. 3.MPI_Bcast implements a binomial tree algorithm so that the time to broadcast one density matrix over processes is obtained by. 4.And for OSC scheme, dynamic load balancing works well and the delay due to competing put or get requests is ignorable. Therefore all groups execute the same number of monomer and dimer jobs and this means that all groups have the same

18 SCF dimers SCC iterations neighboring monomers (average) 17 20 10,000-100,000 96,000 32 7 12 Estimation of communication cost (2) monomers with which a monomer forms a SCF dimer point-to-point communication time to transfer one density matrix

19 number of SCF dimers total number of dimers total number of get requests Estimation of communication cost (3) number of monomer get requests number of SCFdimer get requests OSC scheme number of ES dimers number of ES dimer get requests

20 Estimation of communication cost (4) Bcast scheme total cost per one group cost of MPI_Bcast cost for one get requests total cost of put requests total cost per one group OSC scheme

21 Estimation of communication cost (5)

22 Simulation using skelton code (1) FMO calculation was simulated using skelton code –quantum calculation parts are removed Skelton code accumulates estimated time of : –molecular integral calculation –Fock matrix builging, SCF procedure Time of each communication are estimated and accumulated –MPI_Send, MPI_recv, MPI_get, MPI_put estimated from measurements –MPI_Bcast, MPI_Allreduce estimated from point to point communication time by assuming binomial tree, butterfly algorithm

23 Simulation using skelton code (2) integral calculation time estimation formula integrals required for environmental potential 1 electron integrals required for SCF Fock builging time estimation A =1.12072E-05 B =6.03297E-08 2 electron (3center) A =5.28555E-05 B =2.60718E-07 A =7.57123E-05 B =9.33117E-07 A =5.84313E-05 B =1.12603E-06 2 electron (4center) A =0.00050 B =3.26399E-06

24 simulation using skelton codes (3) OSC cost < Bcast cost > 512with Aquaporin protein 6-31G* basis set PDBID 2F2B, =492, 14,492 atoms > 32)( =4, 8, 16, 32, 64 =16 Linux cluster in RIKEN Super Combined Cluster (RSCC) was used

25 Summary OSC of MPI-2 standard has been implemented in a new FMO code to reduce the memory requirement per process. These results show OSC scheme is prospective for the petascale computing environment. Evaluation of communication costs shows that OSC scheme has an advantage over the Bcast scheme for =10,000-100,000, =96,000. by broadcast unnecessary. OSC implementation also makes the exchange of Simulations using skelton code also show that OSC scheme has an advantage in communication cost with large.


Download ppt "One-sided Communication Implementation in FMO Method J. Maki, Y. Inadomi, T. Takami, R. Susukita †, H. Honda, J. Ooba, T. Kobayashi, R. Nogita, K. Inoue."

Similar presentations


Ads by Google