Algorithms, Numerical Techniques, and Software for Atomistic Modeling.

Slides:



Advertisements
Similar presentations
Simulazione di Biomolecole: metodi e applicazioni giorgio colombo
Advertisements

PERFORMANCE EVALUATION OF USER-REAXC PACKAGE Hasan Metin Aktulga Postdoctoral Researcher Scientific Computing Group Lawrence Berkeley National Laboratory.
Formulation of an algorithm to implement Lowe-Andersen thermostat in parallel molecular simulation package, LAMMPS Prathyusha K. R. and P. B. Sunil Kumar.
A Digital Laboratory “In the real world, this could eventually mean that most chemical experiments are conducted inside the silicon of chips instead of.
Survey of Molecular Dynamics Simulations By Will Welch For Jan Kubelka CHEM 4560/5560 Fall, 2014 University of Wyoming.
Molecular Dynamics Simulations of Radiation Damage in Amorphous Silica: Effects of Hyperthermal Atomic Oxygen Vanessa Oklejas Space Materials Laboratory.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
OpenFOAM on a GPU-based Heterogeneous Cluster
PRISM Mid-Year Review Reactive Atomistics, Molecular Dynamics.
Molecular Dynamics Molecular dynamics Some random notes on molecular dynamics simulations Seminar based on work by Bert de Groot and many anonymous Googelable.
Joo Chul Yoon with Prof. Scott T. Dunham Electrical Engineering University of Washington Molecular Dynamics Simulations.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
PuReMD: A Reactive (ReaxFF) Molecular Dynamics Package Hassan Metin Akgulta, Sagar Pandit, Joseph Fogarty, Ananth Grama Acknowledgements:
Ana Damjanovic (JHU, NIH) JHU: Petar Maksimovic Bertrand Garcia-Moreno NIH: Tim Miller Bernard Brooks OSG: Torre Wenaus and team.
Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali.
Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Algorithms and Software for Large-Scale Simulation of Reactive Systems _______________________________ Ananth Grama Coordinated Systems Lab Purdue University.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
PuReMD: Purdue Reactive Molecular Dynamics Package Hasan Metin Aktulga and Ananth Grama Purdue University TST Meeting,May 13-14, 2010.
Reactive Molecular Dynamics: Progress Report Hassan Metin Aktulga 1, Joseph Fogarty 2, Sagar Pandit 2, and Ananth Grama 3 1 Lawrence Berkeley Lab 2 University.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Algorithmic and Numerical Techniques for Atomistic Modeling Hasan Metin Aktulga PhD Thesis Defense June 23 rd, 2010.
Molecular Dynamics A brief overview. 2 Notes - Websites "A Molecular Dynamics Primer", F. Ercolessi
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
PuReMD: A Reactive (ReaxFF) Molecular Dynamics Package Ananth Grama and Metin Aktulga
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Acurate determination of parameters for coarse grain model.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Computational Chemistry Molecular Mechanics/Dynamics F = Ma Quantum Chemistry Schr Ö dinger Equation H  = E 
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
Plethora: A Wide-Area Read-Write Storage Repository Design Goals, Objectives, and Applications Suresh Jagannathan, Christoph Hoffmann, Ananth Grama Computer.
Molecular simulation methods Ab-initio methods (Few approximations but slow) DFT CPMD Electron and nuclei treated explicitly. Classical atomistic methods.
Molecular Simulation of Reactive Systems. _______________________________ Sagar Pandit, Hasan Aktulga, Ananth Grama Coordinated Systems Lab Purdue University.
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
Algorithms and Software for Large-Scale Simulation of Reactive Systems _______________________________ Metin Aktulga, Sagar Pandit, Alejandro Strachan,
MODELING MATTER AT NANOSCALES 3. Empirical classical PES and typical procedures of optimization Classical potentials.
Reactive Molecular Dynamics: Algorithms, Software, and Applications. Ananth Grama Computer Science, Purdue University
Parallel Solution of the Poisson Problem Using MPI
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
Co-processors for speeding up drug design algorithms Advait Jain Priyanka Jindal Pulkit Gambhir Under the guidance of: Prof. M Balakrishnan Prof. Kolin.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Dynameomics: Protein Mechanics, Folding and Unfolding through Large Scale All-Atom Molecular Dynamics Simulations INCITE 6 David A. C. Beck Valerie Daggett.
PuReMD Design Initialization – neighbor-list, bond-list, hydrogenbond-list and Coefficients of QEq matrix Bonded interactions – Bond-order, bond-energy,
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale
Reactive Molecular Dynamics: Progress Report Hassan Metin Aktulga 1, Joseph Fogarty 2, Sagar Pandit 2, and Ananth Grama 3 1 Lawrence Berkeley Lab 2 University.
PuReMD: Purdue Reactive Molecular Dynamics Software Ananth Grama Center for Science of Information Computational Science and Engineering Department of.
1 Nanoscale Modeling and Computational Infrastructure ___________________________ Ananth Grama Professor of Computer Science, Associate Director, PRISM.
Hierarchical Theoretical Methods for Understanding and Predicting Anisotropic Thermal Transport and Energy Release in Rocket Propellant Formulations Michael.
Xing Cai University of Oslo
Implementation of the TIP5P Potential
David Gleich, Ahmed Sameh, Ananth Grama, et al.
Comparison to LAMMPS-REAX
Development of the Nanoconfinement Science Gateway
Atomistic simulations of contact physics Alejandro Strachan Materials Engineering PRISM, Fall 2007.
Atomistic materials simulations at The DoE NNSA/PSAAP PRISM Center
Algorithms and Software for Large-Scale Simulation of Reactive Systems
PuReMD: Purdue Reactive Molecular Dynamics Software Ananth Grama Center for Science of Information Computational Science and Engineering Department of.
Course Outline Introduction in algorithms and applications
Supported by the National Science Foundation.
Mid-Year Review Template March 2, 2010 Purdue University
Molecular simulation methods
Algorithms and Software for Large-Scale Simulation of Reactive Systems
Presentation transcript:

Algorithms, Numerical Techniques, and Software for Atomistic Modeling

Molecular Dynamics Methods vs. ReaxFF ps ns ss ms nm mm mm Statistical & Continuum methods Classical MD methods Ab-initio methods ReaxFF

ReaxFF vs. Classical MD ReaxFF is classical MD in spirit basis: atoms (electron & nuclei) parameterized interactions: from DFT (ab-initio) atom movements: Newtonian mechanics many common interactions bonds valence angles (a.k.a. 3-body interactions) dihedrals (a.k.a. 4-body interactions) hydrogen bonds van der Waals, Coulomb

ReaxFF vs. Classical MD Static vs. dynamic bonds  reactivity: bonds are formed/broken during simulations  consequently: dynamic 3-body & 4-body interactions  complex formulations of bonded interactions  additional bonded interactions multi-body: lone pair & over/under-coordination – due to imperfect coord. 3-body: coalition & penalty – for special cases like double bonds 4-body: conjugation – for special cases Static vs. dynamic charges  large sparse linear system (NxN) – N = # of atoms  updates at every step! Shorter time-steps (femtoseconds vs. tenths of femtoseconds) Fully atomistic Result: a more complex and expensive force field

ReaxFF realizations: State of the Art Original Fortran Reax code reference code widely distributed and used experimental: sequential, slow static interaction lists: large memory footprint LAMMPS-REAX (Sandia) parallel engine: LAMMPS kernels: from the original Fortran code local optimizations only static interaction lists: large memory footprint Parallel Fortran Reax code from USC good parallel scalability poor per-timestep running time accuracy issues no public domain release

Sequential Realization: SerialReax Excellent per-timestep running time efficient generation of neighbors lists elimination of bond order derivative lists cubic spline interpolation: for non-bonded interactions highly optimized linear solver: for charge equilibration Linear scaling memory footprint fully dynamic and adaptive interaction lists Related publication: Reactive Molecular Dynamics: Numerical Methods and Algorithmic Techniques H. M. Aktulga, S. A. Pandit, A. C. T. van Duin, A. Y. Grama SIAM Journal on Scientific Computing (to appear)

SerialReax Components System Geometry Control Parameters Force Field Parameters Trajectory Snapshots System Status Update Program Log File SerialReax Initialization Read input data Initialize data structs Compute Bonds Corrections are applied after all uncorrected bonds are computed Bonded Interactions 1.Bonds 2.Lone pairs 3.Over/UnderCoord 4.Valence Angles 5.Hydrogen Bonds 6.Dihedral/ Torsion QEq Large sparse linear system PGMRES(50) or PCG ILUT-based pre-conditioners give good performance Evolve the System F total = F nonbonded + F bonded Update x & v with velocity Verlet NVE, NVT and NPT ensembles Reallocate Fully dynamic and adaptive memory management: efficient use of resources large systems on a single processor Neighbor Generation 3D-grid based O(n) neighbor generation Several optimizations for improved performance Init Forces Initialize the QEq coef matrix Compute uncorr. bond orders Generate H-bond lists vd Waals & electrostatics Single pass over the far nbr-list after charges are updated Interpolation with cubic splines for nice speed-up

Basic Solvers for QEq Sample systems bulk water: 6540 atoms, liquid lipid bilayer system: 56,800 atoms, biological system PETN crystal: 48,256 atoms, solid Solvers: CG and GMRES H has heavy diagonal: diagonal pre-conditioning slowly evolving environment : extrapolation from prev. solutions Poor Performance: tolerance level = which is fairly satisfactory much worse at tolerance level due to cache effects more pronounced here # of iterations = # of matrix- vector multiplications actual running time in seconds fraction of total computation time

ILU-based preconditioning ILU-based pre-conditioners: no fill-in, drop tolerance effective (considering only the solve time) no fill-in + threshold: nice scaling with system size ILU factorization is expensive bulk water system bilayer system cache effects are still evident system/solver time to compute preconditioner solve time (s) total time (s) bulk water/ GMRES+ILU bulk water/ GMRES+diagonal ~00.11

ILU-based preconditioning Observation: can amortize the ILU factorization cost slowly changing simulation environment  re-usable pre-conditioners PETN crystal: solid, 1000s of steps! Bulk water: liquid, s of steps!

Memory Management Compact data-structures Dynamic and adaptive lists initially: allocate after estimation at every step: monitor & re-allocate if necessary Low memory foot-print, linear scaling with system size n-1n n-1’s datan’s data in CSR format neighbors list Qeq matrix 3-body intrs n-1n n-1’s datan’s data reserved for n-1reserved for n in modified CSR bonds list hbonds list

Validation Hexane (C 6 H 14 ) Structure Comparison Excellent agreement!

Comparison to LAMMPS-Reax Time per time-step comparison Qeq solver performance Memory foot-print different QEq formulations similar results LAMMPS: CG / no preconditioner

Parallel Realization: PuReMD Built on the SerialReax platform Excellent per-timestep running time Linear scaling memory footprint Extends its capabilities to large systems, longer time-scales Scalable algorithms and techniques Demonstrated scaling to over 3K cores Related publication: Parallel Reactive Molecular Dynamics: Numerical Methods and Algorithmic Techniques H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama Parallel Computing (to appear)

Parallelization: Outer-Shell b r b r b r/2 b r full shellhalf shell midpoint-shell tower-plate shell

Parallelization: Outer-Shell b r b r b r/2 b r full shellhalf shell midpoint-shell tower-plate shell choose full- shell due to dynamic bonding despite the comm. overhead

Parallelization: Boundary Interactions r shell = MAX (3xr bond_cut, r hbond_cut, r nonb_cut )

Parallelization: Messaging

Parallelization: Messaging Performance Performance Comparison: PuReMD with direct vs. staged messaging

PuReMD: Performance and Scalability Weak scaling test Strong scaling test Comparison to LAMMPS-REAX Platform: Hera cluster at LLNL 4 AMD Opterons/node cores/node 800 batch nodes – cores, 127 TFLOPS/sec 32 GB memory / node Infiniband interconnect 42 nd on TOP500 list as of Nov 2009

PuReMD: Weak Scaling Bulk Water: 6540 atoms in a 40x40x40 A 3 box / core

PuReMD: Strong Scaling Bulk Water: atoms in a 80x80x80 A 3 box

Applications: Si/Ge/Si nanoscale bars Motivation Si/Ge/Si nanobars: ideal for MOSFETs as produced: biaxial strain, desirable: uniaxial design & production: understanding strain behavior is important Si Ge Width (W) Periodic boundary conditions Height (H) [100], Transverse [001], Vertical [010], Longitudinal Related publication: Strain relaxation in Si/Ge/Si nanoscale bars from molecular dynamics simulations Y. Park, H.M. Aktulga, A.Y. Grama, A. Strachan Journal of Applied Physics 106, 1 (2009)

Applications: Si/Ge/Si nanoscale bars Key Result: When Ge section is roughly square shaped, it has almost uniaxial strain! W = nm Si Ge Si average transverse Ge strainaverage strains for Si&Ge in each dimension Simple strain model derived from MD results

Applications: Water-Silica Interface Motivation a-SiO2: widely used in nano-electronic devices also used in devices for in-vivo screening understanding interaction with water: critical for reliability of devices Related publication: A Reactive Simulation of the Silica-Water Interface J. C. Fogarty, H. M. Aktulga, A. van Duin, A. Y. Grama, S. A. Pandit Journal of Chemical Physics 132, (2010)

Applications: Water-Silica Interface Silica Water SiOO OH H Key Result Silica surface hydroxylation as evidenced by experiments is observed. Proposed reaction: H2O + 2Si + O  2SiOH

Applications: Oxidative Damage in Lipid Bilayers Motivation Modeling reactive processes in biological systems ROS  Oxidative stress  Cancer & Aging System Preparation 200 POPC lipid + 10,000 water molecules and same system with 1% H 2 O 2 mixed Mass Spectograph: Lipid molecule weighs 760 u Key Result Oxidative damage observed: In pure water: 40% damaged In 1% peroxide: 75% damaged

Software Releases PuReMD currently in limited release (about 10 collaborating groups, including, Goddard, CalTech, Buehler, MIT, Strachan, Purdue, Pandit, USF, vanDuin, PSU, and others) Integrated as a fix to LAMMPS, currently in use by over 25 research groups

Ongoing Efforts Improvements to PureMD/ LAMMPS – Force field optimization – Performance Improvements – Scalability Enhancement

Future Directions Extending scaling envelope significantly requires radically redesigned algorithms to account for: – Significantly higher communication/ synchronization costs – Fault tolerance

Future Directions Fault-Oblivious Algorithms – Critical technology for platforms beyond petascale. – MTBF of components necessitates fault tolerance. – Hardware fault tolerance can be expensive. – Software fault tolerance through checkpointing is infeasible because of I/O bandwidth – Application checkpointing is also constrained by I/O bandwidth

Future Directions Radically different class of Fault Oblivious Algorithms: – Concept: Partition and specify a problem in n parts in such a way that if any m (<n) of these partitions run to completion, we can recover complete solution – Compute counterpart of erasure coded storage – Immensely useful concept at scale but profound challenges

Future Directions Fault Oblivious Algorithms: – Can we derive fault-oblivious algorithms for a sufficiently large class of problems that are efficient? Initial proof of concept for linear system solvers and eigenvalue problems – How are these algorithms specified? Programming Languages/ Compilation Techniques – What are suitable runtime systems? Conventional threads/ messaging APIs do not work!