One-sided Communication Implementation in FMO Method J. Maki, Y. Inadomi, T. Takami, R. Susukita †, H. Honda, J. Ooba, T. Kobayashi, R. Nogita, K. Inoue.

Slides:



Advertisements
Similar presentations
Severs AIST Cluster (50 CPU) Titech Cluster (200 CPU) KISTI Cluster (25 CPU) Climate Simulation on ApGrid/TeraGrid at SC2003 Client (AIST) Ninf-G Severs.
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
0 Jack SimonsJack Simons, Henry Eyring Scientist and Professor Chemistry Department University of Utah Electronic Structure Theory Session 7.
Ionization of the Hydrogen Molecular Ion by Ultrashort Intense Elliptically Polarized Laser Radiation Ryan DuToit Xiaoxu Guan (Mentor) Klaus Bartschat.
Molecular Bonding Molecular Schrödinger equation
Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 03 Some materials adapted from Prof. Keith E. Gubbins:
Molecular orbitals for polyatomic systems. The molecular orbitals of polyatomic molecules are built in the same way as in diatomic molecules, the only.
CHE Inorganic, Physical & Solid State Chemistry Advanced Quantum Chemistry: lecture 4 Rob Jackson LJ1.16,
Molecular Quantum Mechanics
Introduction to Molecular Orbitals
Chapter 3 Electronic Structures
Quantum Mechanics and Force Fields Hartree-Fock revisited Semi-Empirical Methods Basis sets Post Hartree-Fock Methods Atomic Charges and Multipoles QM.
Computational Chemistry
Molecular Modeling: Semi-Empirical Methods C372 Introduction to Cheminformatics II Kelsey Forsythe.
Ion Solvation Thermodynamics from Simulation with a Polarizable Force Field Gaurav Chopra 07 February 2005 CS 379 A Alan GrossfeildPengyu Ren Jay W. Ponder.
Introduction to ab initio methods I Kirill Gokhberg.
CRYSTAL February 26, 1999 V.R Saunder, R. Dovesi, C. Roetti, M. Causa, N.M. Harrison, R. Orlando, C. M. Zicovish-Wilson Oleg Sychev.
Introduction to the very basic computational aspects of the modern Quantum Chemistry for Software Engineers Alexander A. Granovsky The PC GAMESS/Firefly.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
Chemistry 6440 / 7440 SCF Convergence. Resources Schlegel, H. B.; McDouall, J. J. W.; Do you have SCF Stability and convergence problems?. in "Computational.
Topic Overview One-to-All Broadcast and All-to-One Reduction
TRIP ASSIGNMENT.
Large-Scale Density Functional Calculations James E. Raynolds, College of Nanoscale Science and Engineering Lenore R. Mullin, College of Computing and.
九州大学 KYUSHU UNIVERSITY ICCMSE07 in Corfu, Greece: Sep , 2007 Multi-physics Extension of OpenFMO Framework Toshiya Takami, J. Maki, J. Ooba, Y. Inadomi,
Simple Load Balancing CS550 Operating Systems. Announcements Project will be posted – TBA This project will use the client-server model and will require.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Direct Self-Consistent Field Computations on GPU Clusters Guochun.
Мэдээллийн Технологийн Сургууль Монгол Улсын Их Сургууль Some features of creating GRID structure for simulation of nanotransistors Bolormaa Dalanbayar,
Molecular Modeling: Semi-Empirical Methods C372 Introduction to Cheminformatics II Kelsey Forsythe.
Atomic units The atomic units have been chosen such that the fundamental electron properties are all equal to one atomic unit. (me=1, e=1, = h/2 = 1,
Accelerating Knowledge-based Energy Evaluation in Protein Structure Modeling with Graphics Processing Units 1 A. Yaseen, Yaohang Li, “Accelerating Knowledge-based.
Physical Chemistry 2 nd Edition Thomas Engel, Philip Reid Chapter 23 The Chemical Bond in Diatomic Molecules.
Institute for Mathematical Modeling RAS 1 Dynamic load balancing. Overview. Simulation of combustion problems using multiprocessor computer systems For.
Norm Conserving Pseudopotentials and The Hartree Fock Method Eric Neuscamman Mechanical and Aerospace Engineering 715 May 7, 2007.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
PDCS 2007 November 20, 2007 Accelerating the Complex Hessenberg QR Algorithm with the CSX600 Floating-Point Coprocessor Yusaku Yamamoto 1 Takafumi Miyata.
Tests and tools for ENEA GRID Performance test: HPL (High Performance Linpack) Network monitoring A.Funel December 11, 2007.
Quantum Chemistry and First Principles Molecular Dynamics on GPUs Todd J. Martinez Dept of Chemistry and PULSE Institute, Stanford University SLAC National.
On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.
PSI-SIM: System Performance Evaluation Environment for Next-Generation Supercomputers K. Inoue, H. Shibamura, R. Susukita, Y. Inadomi, H. Honda, Y. Yu,
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Chemistry 700 Lectures. Resources Grant and Richards, Foresman and Frisch, Exploring Chemistry with Electronic Structure Methods (Gaussian Inc., 1996)
Quantum Mechanics/ Molecular Mechanics (QM/MM) Todd J. Martinez.
Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.
Last hour: Electron Spin Triplet electrons “avoid each other”, the WF of the system goes to zero if the two electrons approach each other. Consequence:
Gravitational N-body Simulation Major Design Goals -Efficiency -Versatility (ability to use different numerical methods) -Scalability Lesser Design Goals.
How do you build a good Hamiltonian for CEID? Andrew Horsfield, Lorenzo Stella, Andrew Fisher.
4. One – and many electronic wave functions (multplicty) 5. The Hartree-Fock method and its limitations (Correlation Energy) 6. Post Hartree-Fock methods.
Lecture 8. Chemical Bonding
Università di Perugia Enabling Grids for E-sciencE Status of and requirements for Computational Chemistry NA4 – SA1 Meeting – 6 th April.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Start. Technische Universität Dresden Physikalische Chemie Gotthard Seifert Tight-binding Density Functional Theory DFTB an approximate Kohn-Sham DFT.
Restricted and Unrestricted Hartree-Fock method Sudarshan Dhungana Phys790 Seminar (Feb15,2007)
2/09/2015PHY 752 Spring Lecture 111 PHY 752 Solid State Physics 11-11:50 AM MWF Olin 107 Plan for Lecture 11: Reading: Chapter 9 in MPM Approximations.
©2011, Jordan, Schmidt & Kable Lecture 13 Lecture 13 Self-consistent field theory This is how we do it.
Lecture 9. Many-Electron Atoms
Jun Doi IBM Research – Tokyo Early Performance Evaluation of Lattice QCD on POWER+GPU Cluster 17 July 2015.
Molecular Bonding Molecular Schrödinger equation
PHY 752 Solid State Physics
Eigenspectrum calculation of the non-Hermitian O(a)-improved Wilson-Dirac operator using the Sakurai-Sugiura method H. Sunoa, Y. Nakamuraa,b, K.-I. Ishikawac,
Statistical Mechanics and Multi-Scale Simulation Methods ChBE
FMO-QMC strategy using CASINO
Electronic Structure Theory
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Hartree Self Consistent Field Method
Memory System Performance Chapter 3
PHY 752 Solid State Physics
Parallel computing in Computational chemistry
N-Body Gravitational Simulations
Presentation transcript:

One-sided Communication Implementation in FMO Method J. Maki, Y. Inadomi, T. Takami, R. Susukita †, H. Honda, J. Ooba, T. Kobayashi, R. Nogita, K. Inoue and M. Aoyagi Computing and Communications Center, Kyushu University † Fukuoka Industry, Science & Technology Foundation

Overview of FMO method (1) FMO(Fragment Molecular Orbital method) has been developed by Kitaura (AIST) and co-workers to calculate electronic states of a macromolecule such as a protein and nucleotide The target molecule is divided into fragments (monomers) ab initio molecular orbital (MO) calculation is carried out on each fragment and fragment pair (dimer). Usually executed with high parallel efficiency (>90%) The method can execute all-electron calculation on a protein molecule with 10,000 atoms.

Example of fragmentation

Hamiltonian monomer calculation Overview of FMO method (2) environmental electrostatic potential ( potential from other monomers) monomer calculation depends on other monomers electron density electron density of monomer J J J J iterated until all electron densities unchanged (SCC procedure)

Overview of FMO method (3) Schrødinger equation basis functions MO coefficients density matrix elements RHF (Restricted Hartree-Fock) SCF method Molecular Orbital (MO) electron density number of basis functions

Hamiltonian dimer calculation Total energy Overview of FMO method (4) environmental electrostatic potential Schrødinger equation not all dimers are calculated by SCF

dimer-es approximation For distant dimers, SCF calculation are not carried out Energy is calculated by electrostatic approximation monomer SCF dimer ES (non-SCF) dimer

SCC procedure convergence check of SCC procedure already converged not yet converged electronic structure calculation for monomer initial density calculation electronic structure calculation for SCF-dimer ES dimer calculation total energy calculation For all monomers For all SCF-dimers For all separated dimers Flow chart of FMO calculation

SCC procedure convergence check of SCC procedure already converged not yet converged electronic structure calculation for monomer initial density calculation electronic structure calculation for SCF-dimer ES-dimer calculation total energy calculation update monomer density matrices refer monomer matrices needed in electronic structure calculation refer two monomer matrices needed in dimer-ES calculation refer monomer matrices needed in electronic structure calculation update monomer density matrices For all monomers For all SCF-dimers For all separated dimers Update all monomer density matrices Refer some monomer density matrices Density requirements and update in FMO calculation

potential from the electrons of fragment K 4-center two electron integral calculations for all environment monomers are prohibitive matrix element of (needed in SCF calculation) involves 4 center tow electron integrals order of calculation costs number of basis functions Approximation of calculation (1)

esp-aoc approximation Approximation of calculation (2) overlapmatrix of fragment K Mulliken AO population of ( )

Approximation of calculation (3) esp-ptc approximation Mulliken atomic charge of the nucleus A point charge approx. of number of environment monomers for which 4-center two electron integral calculations are carried out

Hypothetical petascale computing environment Number of fragments: 10, ,000 ~1,000)(the current largest Number of CPUs: 1PFlops = 100,000 CPU (current one CPU peak performance ~10GFlops)

Memory requirement in FMO R IJ 1,000350MB4MB 5,0001.7GB95MB 10,0003.4GB380MB 50,00017GB9.3GB 100,00034GB37GB Number of fragments It is difficult for each process to store all necessary data Memory per one CPU core> 10 GB × The size of one density matrix ≒ 350KB R IJ Interfragment distance data ≒ / 2

OSC implementation in OpenFMO (1) node00 01 node01 23 node02 45 node03 67 node04 89 Group00Group01Group02 In its execution, only the process of rank 0 is used as a server process for dynamic load balancing. All the other processes are worker processes and are divided into groups and the two-level parallelization is used as the other implementations.

OSC implementation in OpenFMO (2) node00 01 node01 23 node02 45 node03 67 node04 89 Group00Group01Group02 memory window created by MPI_Win density matrix data MPI_Get

Estimation of communication cost (1) Assumptions 1.The sizes of all density matrices are the same.All groups have the same number of worker processes. 2.The time to put or get one density matrix is equal to the average point-to-point communication time. 3.MPI_Bcast implements a binomial tree algorithm so that the time to broadcast one density matrix over processes is obtained by. 4.And for OSC scheme, dynamic load balancing works well and the delay due to competing put or get requests is ignorable. Therefore all groups execute the same number of monomer and dimer jobs and this means that all groups have the same

SCF dimers SCC iterations neighboring monomers (average) , ,000 96, Estimation of communication cost (2) monomers with which a monomer forms a SCF dimer point-to-point communication time to transfer one density matrix

number of SCF dimers total number of dimers total number of get requests Estimation of communication cost (3) number of monomer get requests number of SCFdimer get requests OSC scheme number of ES dimers number of ES dimer get requests

Estimation of communication cost (4) Bcast scheme total cost per one group cost of MPI_Bcast cost for one get requests total cost of put requests total cost per one group OSC scheme

Estimation of communication cost (5)

Simulation using skelton code (1) FMO calculation was simulated using skelton code –quantum calculation parts are removed Skelton code accumulates estimated time of : –molecular integral calculation –Fock matrix builging, SCF procedure Time of each communication are estimated and accumulated –MPI_Send, MPI_recv, MPI_get, MPI_put estimated from measurements –MPI_Bcast, MPI_Allreduce estimated from point to point communication time by assuming binomial tree, butterfly algorithm

Simulation using skelton code (2) integral calculation time estimation formula integrals required for environmental potential 1 electron integrals required for SCF Fock builging time estimation A = E-05 B = E-08 2 electron (3center) A = E-05 B = E-07 A = E-05 B = E-07 A = E-05 B = E-06 2 electron (4center) A = B = E-06

simulation using skelton codes (3) OSC cost < Bcast cost > 512with Aquaporin protein 6-31G* basis set PDBID 2F2B, =492, 14,492 atoms > 32)( =4, 8, 16, 32, 64 =16 Linux cluster in RIKEN Super Combined Cluster (RSCC) was used

Summary OSC of MPI-2 standard has been implemented in a new FMO code to reduce the memory requirement per process. These results show OSC scheme is prospective for the petascale computing environment. Evaluation of communication costs shows that OSC scheme has an advantage over the Bcast scheme for =10, ,000, =96,000. by broadcast unnecessary. OSC implementation also makes the exchange of Simulations using skelton code also show that OSC scheme has an advantage in communication cost with large.