PuReMD: Purdue Reactive Molecular Dynamics Software Ananth Grama Center for Science of Information Computational Science and Engineering Department of.

Slides:



Advertisements
Similar presentations
PERFORMANCE EVALUATION OF USER-REAXC PACKAGE Hasan Metin Aktulga Postdoctoral Researcher Scientific Computing Group Lawrence Berkeley National Laboratory.
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Formulation of an algorithm to implement Lowe-Andersen thermostat in parallel molecular simulation package, LAMMPS Prathyusha K. R. and P. B. Sunil Kumar.
A Digital Laboratory “In the real world, this could eventually mean that most chemical experiments are conducted inside the silicon of chips instead of.
Ionization of the Hydrogen Molecular Ion by Ultrashort Intense Elliptically Polarized Laser Radiation Ryan DuToit Xiaoxu Guan (Mentor) Klaus Bartschat.
10/21/20091 Protein Explorer: A Petaflops Special-Purpose Computer System for Molecular Dynamics Simulations Makoto Taiji, Tetsu Narumi, Yousuke Ohno,
OpenFOAM on a GPU-based Heterogeneous Cluster
" Characterizing the Relationship between ILU-type Preconditioners and the Storage Hierarchy" " Characterizing the Relationship between ILU-type Preconditioners.
Molecular Dynamics Simulation (a brief introduction)
The Protein Folding Problem David van der Spoel Dept. of Cell & Mol. Biology Uppsala, Sweden
PRISM Mid-Year Review Reactive Atomistics, Molecular Dynamics.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
Joo Chul Yoon with Prof. Scott T. Dunham Electrical Engineering University of Washington Molecular Dynamics Simulations.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
PuReMD: A Reactive (ReaxFF) Molecular Dynamics Package Hassan Metin Akgulta, Sagar Pandit, Joseph Fogarty, Ananth Grama Acknowledgements:
Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali.
Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Algorithms and Software for Large-Scale Simulation of Reactive Systems _______________________________ Ananth Grama Coordinated Systems Lab Purdue University.
PuReMD: Purdue Reactive Molecular Dynamics Package Hasan Metin Aktulga and Ananth Grama Purdue University TST Meeting,May 13-14, 2010.
Reactive Molecular Dynamics: Progress Report Hassan Metin Aktulga 1, Joseph Fogarty 2, Sagar Pandit 2, and Ananth Grama 3 1 Lawrence Berkeley Lab 2 University.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Algorithmic and Numerical Techniques for Atomistic Modeling Hasan Metin Aktulga PhD Thesis Defense June 23 rd, 2010.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
PuReMD: A Reactive (ReaxFF) Molecular Dynamics Package Ananth Grama and Metin Aktulga
CFD Lab - Department of Engineering - University of Liverpool Ken Badcock & Mark Woodgate Department of Engineering University of Liverpool Liverpool L69.
© 2007 SET Associates Corporation SAR Processing Performance on Cell Processor and Xeon Mark Backues, SET Corporation Uttam Majumder, AFRL/RYAS.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Acurate determination of parameters for coarse grain model.
E-science grid facility for Europe and Latin America E2GRIS1 André A. S. T. Ribeiro – UFRJ (Brazil) Itacuruça (Brazil), 2-15 November 2008.
ALGORITHM IMPROVEMENTS AND HPC 1. Basic MD algorithm 1. Initialize atoms’ status and the LJ potential table; set parameters controlling the simulation;
Molecular simulation methods Ab-initio methods (Few approximations but slow) DFT CPMD Electron and nuclei treated explicitly. Classical atomistic methods.
Algorithms, Numerical Techniques, and Software for Atomistic Modeling.
Molecular Simulation of Reactive Systems. _______________________________ Sagar Pandit, Hasan Aktulga, Ananth Grama Coordinated Systems Lab Purdue University.
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
Algorithms and Software for Large-Scale Simulation of Reactive Systems _______________________________ Metin Aktulga, Sagar Pandit, Alejandro Strachan,
MODELING MATTER AT NANOSCALES 3. Empirical classical PES and typical procedures of optimization Classical potentials.
Reactive Molecular Dynamics: Algorithms, Software, and Applications. Ananth Grama Computer Science, Purdue University
Preliminary CPMD Benchmarks On Ranger, Pople, and Abe TG AUS Materials Science Project Matt McKenzie LONI.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
PuReMD Design Initialization – neighbor-list, bond-list, hydrogenbond-list and Coefficients of QEq matrix Bonded interactions – Bond-order, bond-energy,
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale
Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.
Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.
PuReMD: Purdue Reactive Molecular Dynamics Software Ananth Grama Center for Science of Information Computational Science and Engineering Department of.
1 Nanoscale Modeling and Computational Infrastructure ___________________________ Ananth Grama Professor of Computer Science, Associate Director, PRISM.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
GPU Acceleration of Particle-In-Cell Methods B. M. Cowan, J. R. Cary, S. W. Sides Tech-X Corporation.
ReaxFF for Vanadium and Bismuth Oxides
Generalized and Hybrid Fast-ICA Implementation using GPU
Analysis of Sparse Convolutional Neural Networks
Computational Techniques for Efficient Carbon Nanotube Simulation
Ioannis E. Venetis Department of Computer Engineering and Informatics
ECE408 Fall 2015 Applied Parallel Programming Lecture 21: Application Case Study – Molecular Dynamics.
David Gleich, Ahmed Sameh, Ananth Grama, et al.
Applying Twister to Scientific Applications
Comparison to LAMMPS-REAX
Molecular Modelling - Lecture 3
Algorithms and Software for Large-Scale Simulation of Reactive Systems
Supported by the National Science Foundation.
Mid-Year Review Template March 2, 2010 Purdue University
Molecular simulation methods
Computational Techniques for Efficient Carbon Nanotube Simulation
Algorithms and Software for Large-Scale Simulation of Reactive Systems
Presentation transcript:

PuReMD: Purdue Reactive Molecular Dynamics Software Ananth Grama Center for Science of Information Computational Science and Engineering Department of Computer Science ayg@cs.purdue.edu

ReaxFF vs. Classical MD ReaxFF is classical MD in spirit basis: atoms (electron & nuclei) parameterized interactions: from DFT (ab-initio) atomic movements: Newtonian mechanics many common interactions bonds valence angles (a.k.a. 3-body interactions) dihedrals (a.k.a. 4-body interactions) hydrogen bonds van der Waals, Coulomb

ReaxFF vs. Classical MD Static vs. dynamic bonds reactivity: bonds are formed/broken during simulations consequently: dynamic multibody interactions complex formulations of bonded interactions additional bonded interactions multi-body: lone pair & over/under-coordination – due to imperfect coord. 3-body: coalition & penalty – for special cases like double bonds 4-body: conjugation – for special cases Static vs. dynamic charges Charge equilibration -- large sparse linear system (NxN) – N = # of atoms updates at every step! Shorter time-steps (femtoseconds vs. tenths of femtoseconds) Result: a more complex and expensive force field

Generation of Neighbors List atom list 3D Neighbors grid atom list: reordered

Neighbors List Optimizations Size of the neighbor grid cell is important rnbrs: neighbors cut-off distance set cell size to half the rnbrs Reordering reduces look-ups significantly Atom to cell distance Verlet lists for delayed re-neighboring neighbors grid No need to check these cells with reordering! Looking for the neighbors of atoms in this box Just need to look inside these cells d If d > rnbrs, then no need to look into the cell

Elimination of Bond Order Derivative Lists BOij: bond order between atoms i and j at the heart of all bonded interactions dBOij/drk: derivative of BOij wrt. atom k non-zero for all k sharing a bond with i or j costly force computations, large memory space needed Analogy Eliminate dBO list delay computation of dBO terms accumulate force related coef. into CdBOij compute dBOij/drk at the end of the step fk += CdBOij x dBOij/drk

Look-up Tables Expensive non-bonded force computations Large number of interactions: rnonb ~ 10A vs. rbond ~ 4-5A but simple, pair-wise interactions Reduce non-bonded force computations create look-up tables cubic spline interpolation: for non-bonded energy & forces Experiment with a 6540 atom bulk water system Significant performance gain, high accuracy

Linear Solver for Charge Equilibration QEq Method: used for equilibrating charges approximation for distributing partial charges solution using Lagrange multipliers yields N = # of atoms H: NxN sparse matrix s & t fictitious charges: used to determine the actual charge qi Too expensive for direct methods  Iterative (Krylov sub-space) methods

Basic Solvers for QEq Sample systems Solvers: CG and GMRES bulk water: 6540 atoms, liquid lipid bilayer system: 56,800 atoms, biological system PETN crystal: 48,256 atoms, solid Solvers: CG and GMRES H has heavy diagonal: diagonal pre-conditioning slowly evolving environment : extrapolation from prev. solutions Poor Performance: # of iterations = # of matrix-vector multiplications actual running time in seconds fraction of total computation time tolerance level = 10-6 which is fairly satisfactory due to cache effects much worse at 10-10 tolerance level more pronounced here

ILU-based preconditioning ILU-based pre-conditioners: no fill-in, 10-2 drop tolerance effective (considering only the solve time) no fill-in + threshold: nice scaling with system size ILU factorization is expensive bulk water system bilayer system cache effects are still evident system/solver time to compute preconditioner solve time (s) total time (s) bulk water/ GMRES+ILU 0.50 0.04 0.54 bulk water/ GMRES+diagonal ~0 0.11

ILU-based preconditioning Observation: can amortize the ILU factorization cost slowly changing simulation environment re-usable pre-conditioners PETN crystal: solid, 1000s of steps! Bulk water: liquid, 10-100s of steps!

Memory Management Compact data-structures Dynamic and adaptive lists initially: allocate after estimation at every step: monitor & re-allocate if necessary Low memory foot-print, linear scaling with system size n-1 n n-1’s data n’s data in CSR format neighbors list Qeq matrix 3-body interactions n-1 n n-1’s data n’s data reserved for n-1 reserved for n in modified CSR bonds list hbonds list

Hexane (C6H14) Structure Comparison Validation Hexane (C6H14) Structure Comparison Excellent agreement!

Comparison to MD Methods Comparisons using hexane: systems of various sizes Ab-initio MD: CPMD Classical MD: GROMACS ReaxFF: SerialReax

Comparison with LAMMPS-Reax Qeq solver performance Time per time-step comparison Qeq solver performance Memory foot-print different QEq formulations similar results LAMMPS: CG / no preconditioner

Parallel Realization: PuReMD Built on the SerialReax platform Excellent per-timestep running time Linear scaling memory footprint Extends its capabilities to large systems, longer time-scales Scalable algorithms and techniques Demonstrated scaling to over 9K cores Parallel Reactive Molecular Dynamics: Numerical Methods and Algorithmic Techniques, Akgulta et al., 2013.

Parallelization: Outer-Shell b r r/2 full shell half shell midpoint-shell tower-plate shell choose full-shell due to dynamic bonding despite the comm. overhead

Parallelization: Boundary Interactions rshell= MAX (3xrbond_cut, rhbond_cut, rnonb_cut)

Parallelization: Messaging

Parallelization: Messaging Performance Performance Comparison: PuReMD with direct vs. staged messaging

PuReMD-GPU Design Initialization Bonded interactions neighbor-list, bond-list, hydrogen bond-list and Coefficients of QEq matrix Bonded interactions Bond-order, bond-energy, lone-pair, three-body, four-body and hydrogen-bonds Implementation of Non-bonded interactions Charge Equilibration Coulombs and Van der Waals forces

Neighbor List Generation Each cell with its neighboring cells and neighboring atoms Processing of neighbor list generation

Other Data Structures Iterating over the neighbor list to generate bond/hydrogen-bond/QEq coefficients All the above lists are redundant on GPUs Three kernels, one for each list Redundant traversal of neighbor list by kernels One kernel for generating all the list entries per atom pair

Bonded and Non-bonded Interactions Bonded interactions are computed using one thread per atom Number of bonds per atom typically low (< 10) Bond-list iterated to compute force/energy terms Expensive three-body and four body interactions because of number of bonds involved (three-body list) Hydrogen-bonds use multiple threads per atom (h-bonds per atom usually in 100’s)

Nonbonded Interactions Charge Equilibration One warp (32 threads) per row SpMV, CSR Format GMRES implemented using CUBLAS library Van der Waals and Coulombs interactions Iterate over neighbor lists Multiple thread per atom kernel

PG-PuReMD Multi-node, multi-GPU implementation Problem decomposition 3D Domain decomposition with wrap around links for periodic boundaries (Torus) Outer shell selection PG-PuReMD uses full-shell rshell = max(3 x rbond, rhbond, rnonb)

Experimental Results LBNL’s GPU Clusters: 50 node GPU cluster interconnected with QDR Infinitiband switch GPU Node: 2 Intel 5530 2.4 GHz QPI Quadcore Nehalem processors (8 cores/node), 24 GB RAM and Tesla C2050 Fermi GPUs with 3GB global memory Single CPU : Intel Xeon E5606, 2.13 GHz and 24 GB RAM

Experimental Setup CUDA 5.0 development environment “-arch=sm 20 -funroll-loops –O3” compiler options GCC 4.6 compiler, MPICH2 openmpi Cuda object files are linked with openmpi to produce the final executable 10-6 Qeq tolerance, 0.25 fs time step and NVE ensemble

Weak Scaling Results

Weak Scaling Results

Conclusions PuReMD is available for public download at https://www.cs.purdue.edu/puremd Thank You!