Long-Time Molecular Dynamics Simulations through Parallelization of the Time Domain Ashok Srinivasan Florida State University

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Introduction to Monte Carlo Markov chain (MCMC) methods
It is very difficult to measure the small change in volume of the mercury. If the mercury had the shape of a sphere, the change in diameter would be very.
Arc-length computation and arc-length parameterization
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,
Formulation of an algorithm to implement Lowe-Andersen thermostat in parallel molecular simulation package, LAMMPS Prathyusha K. R. and P. B. Sunil Kumar.
Pattern Recognition and Machine Learning
Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.
DMEC Neurons firing Black trace is the rat’s trajectory. Red dots are spikes recorded from one neuron. Eventually a hexagonal activity pattern emerges.
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.
1 Variance Reduction via Lattice Rules By Pierre L’Ecuyer and Christiane Lemieux Presented by Yanzhi Li.
Chapter 11 Multiple Regression.
Ordinary least squares regression (OLS)
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Orthogonality and Least Squares
Molecular Dynamics Classical trajectories and exact solutions
Joo Chul Yoon with Prof. Scott T. Dunham Electrical Engineering University of Washington Molecular Dynamics Simulations.
Monte Carlo Methods in Partial Differential Equations.
Normalised Least Mean-Square Adaptive Filtering
Introduction to Monte Carlo Methods D.J.C. Mackay.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Gaussian process modelling
Frame by Frame Bit Allocation for Motion-Compensated Video Michael Ringenburg May 9, 2003.
Model Building III – Remedial Measures KNNL – Chapter 11.
Javier Junquera Molecular dynamics in the microcanonical (NVE) ensemble: the Verlet algorithm.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 11:00-12:00.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
Computational Nanotechnology A preliminary proposal N. Chandra Department of Mechanical Engineering Florida A&M and Florida State University Proposed Areas.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Brian Macpherson Ph.D, Professor of Statistics, University of Manitoba Tom Bingham Statistician, The Boeing Company.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
Progress in identification of damping: Energy-based method with incomplete and noisy data Marco Prandina University of Liverpool.
Accuracy Based Generation of Thermodynamic Properties for Light Water in RELAP5-3D 2010 IRUG Meeting Cliff Davis.
Mass Transfer Coefficient
Elementary Linear Algebra Anton & Rorres, 9th Edition
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Order of Magnitude Scaling of Complex Engineering Problems Patricio F. Mendez Thomas W. Eagar May 14 th, 1999.
Dynamics: Newton’s Laws of Motion
Comparative Study of NAMD and GROMACS
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
Linear Systems – Iterative methods
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
MS 15: Data-Aware Parallel Computing Data-Driven Parallelization in Multi-Scale Applications – Ashok Srinivasan, Florida State University Dynamic Data.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Introduction to Research 2007 Introduction to Research 2007 Ashok Srinivasan Florida State University Recent collaborators V.
Dynamics: Newton’s Laws of Motion. Force A force is a push or pull. An object at rest needs a force to get it moving; a moving object needs a force to.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Scalable Time-Parallelization of Molecular Dynamics Simulations in Nano Mechanics Y. Yu, Ashok Srinivasan, and N. Chandra Florida State University
Data-Driven Time-Parallelization in the AFM Simulation of Proteins L. Ji, H. Nymeyer, A. Srinivasan, and Y. Yu Florida State University
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Computational Techniques for Efficient Carbon Nanotube Simulation
ECE3340 Numerical Fitting, Interpolation and Approximation
Long-Time Molecular Dynamics Simulations in Nano-Mechanics through Parallelization of the Time Domain Ashok Srinivasan Florida State University
Molecular Modelling - Lecture 3
Simple Linear Regression
Computational Techniques for Efficient Carbon Nanotube Simulation
Computational issues Issues Solutions Large time scale
Approximation of Functions
Presentation transcript:

Long-Time Molecular Dynamics Simulations through Parallelization of the Time Domain Ashok Srinivasan Florida State University Aim: Simulate for long time spans Solution features: Use data from prior simulations to parallelize the time domain Acknowledgements: NSF, ORNL, NERSC, NCSA Collaborators: N. Chandra, L. Ji, H. Nymeyer, and Y. Yu

Outline Background –Limitations of Conventional Parallelization Time Parallelization –Other Time Parallelization Approaches –Data-Driven Time Parallelization Application to Nano-Mechanics Application to AFM Simulation of Proteins Conclusions and Future Work –Scaled to orders of magnitude larger number of processors than conventional parallelization

Background Molecular dynamics –In each time step, forces of atoms on each other modeled using some potential –After force is computed, update positions –Repeat for desired number of time steps Time steps size ~ 10 –15 seconds, due to physical and numerical considerations –Desired time range is much larger A million time steps are required to reach s ~ 500 hours of computing for ~ 40K atoms using GROMACS MD uses unrealistically large pulling speed –1 to 10 m/s instead of to10 -5 m/s

Limitations of Conventional Parallelization Conventional parallelization decomposes the state space across processors –It is effective for large state space –It is not effective when computational effort arises from a large number of time steps … or when granularity becomes very fine due to a large number of processors

Limitations of Conventional Parallelization Results on scalable codes –Does not scale efficiently beyond 10 ms/iteration If we want to simulate to a ms –Time step 1 fs  iterations  s ≈ 300 years If we scaled to 10  s per iteration –4 months computing time NAMD, 327K atom ATPase PME, Blue Gene, IPDPS 2006 NAMD, 92K atom ApoA1 PME, Blue Gene, IPDPS 2006 IBM Blue Matter, 43K Rhodopsin, Blue Gene, Tech Report 2005 Desmond, 92K atom ApoA1, SC 2006

Time Parallelization Other Time Parallelization Approaches –Dynamic Iterations/ Waveform Relaxation Slow convergence –Parareal Method Related to shooting methods Not shown effective in realistic settings Data-Driven Time-Parallelization –Use availability of prior data –Determine a relationship between current simulations and prior ones to parallelize the time domain

Other Time Parallelization Approaches Special case: Picard iterations –Ex: dy/dt = y, y(0) = 1 becomes dy n+1 /dt = y n (t), y 0 (t) = 1 In general –dy/dt = f(y,t), y(0) = y 0 becomes dy n+1 /dt = g(y n, y n+1, t), y 0 (t) = y 0 g(u, u, t) = f(u, t) g(y n, y n+1, t) = f(y n, t): Picard g(y n, y n+1, t) = f(y n+1, t): Converges in 1 iteration –Jacobi, Gauss-Seidel, and SOR versions of g defined Many improvements –Ex: DIRM combines above with reduced order modeling Exact N = 1 N = 2 N = 3 N = 4 Waveform Relaxation Variants

Parareal approach Based on an “approximate-verify-correct” sequence –An example of shooting methods for time-parallelization –Not shown to be effective in realistic situations Initial prediction Correction Initial computed result Second prediction

Data-Driven Time Parallelization Each processor simulates a different time interval Initial state is obtained by prediction, using prior data (except for processor 0) Verify if prediction for end state is close to that computed by MD Prediction is based on dynamically determining a relationship between the current simulation and those in a database of prior results If time interval is sufficiently large, then communication overhead is small

Problems with multiple time-scales Fine-scale computations (such as MD) are more accurate, but more time consuming –Much of the details at the finer scale are unimportant, but some are –Use the course-scale response of a similar prior simulation to predict the future states of the current one A simple schematic of multiple time scales

Verification of prediction Definition of equivalence of two states –Atoms vibrate around their mean position –Consider states equivalent if differences are within the normal range of fluctuations Mean positionDisplacement (from mean) Differences between trajectories that differ only due to the random number sequence in the AFM simulation of Titin

Application to Nano-Mechanics Carbon Nanotube Tensile Test Pull the CNT Determine stress-strain response and yield strain (when CNT starts breaking) using MD Blue: Exact 450K Red: 200 processors Experiments 1. CNT identical to prior results, but different strain-rate 1000-atoms CNT, 300 K 2. CNT identical to prior results, but different strain-rate and temperature 3. CNT differs in size from prior result, and simulated with a different strain-rate

Dimensionality Reduction Movement of atoms in a 1000-atom CNT can be considered the motion of a point in 3000-dimensional space Find a lower dimensional subspace close to which the points lie We use principal orthogonal decomposition –Find a low dimensional affine subspace Motion may, however, be complex in this subspace –Use results for different strain rates Velocity = 10m/s, 5m/s, and 1 m/s –At five different time points [U, S, V] = svd(Shifted Data) –Shifted Data = U*S*V T –States of CNT expressed as  + c 1 u 1 + c 2 u 2 uu  uu

Basis Vectors from POD CNT of length  100 A with 1000 atoms at 300 K u 1 (blue) and u 2 (red) for z u 1 (green) for x is not “significant” Blue: z Green, Red: x, y

Prediction When v is the only parameter Dynamic Prediction –Correct the above coefficients, by determining the error between the previously predicted and computed states Static Predictor –Independently predict change in each coordinate Use precomputed results for 40 different time points each for three different velocities –To predict for (t; v) not in the database Determine coefficients for nearby v at nearby strains Fit a linear surface and interpolate/extrapolate to get coefficients c 1 and c 2 for (t; v) Get state as  + c 1 u 1 + c 2 u 2 Green: 10 m/s, Red: 5 m/s, Blue: 1 m/s, Magenta: 0.1 m/s, Black: 0.1m/s through direct prediction

Verification: Error Thresholds Consider states equivalent if difference in position, potential energy, and temperature are within the normal range of fluctuations –Max displacement  0.2 A –Mean displacement  0.08 A –Potential energy fluctuation  0.35% –Temperature fluctuation  12.5 K

Stress-strain response at 0.1 m/s Blue: Exact result Green: Direct prediction with interpolation / extrapolation –Points close to yield involve extrapolation in velocity and strain Red: Time parallel results

Speedup Red line: Ideal speedup Blue: v = 0.1m/s Green: A different predictor v = 1m/s, using v = 10m/s CNT with 1000 atoms Xeon/ Myrinet cluster

Temperature and velocity vary Use 1000-atom CNT results –Temperatures: 300K, 600K, 900K, 1200K –Velocities: 1m/s, 5m/s, 10m/s Dynamically choose closest simulation for prediction Speedup  450K, 2m/s … Linear Stress-strain Blue: Exact 450K Red: 200 processors

CNTs of varying sizes Use a 1000-atom CNT, 10 m/s, 300K result –Parallelize 1200, 1600, 2000-atom CNT runs –Observe that the dominant mode is approximately a linear function of the initial z-coordinate Normalize coordinates to be in [0,1] z t+  t = z t + z’ t+  t  t, predict z’ Speedup atoms atoms __ 1200 atoms … Linear Stress-strain Blue: Exact 2000 atoms, 1m/s Red: 200 processors

Predict change in coordinates Express x’ in terms of basis functions –Example: x’ t+  t = a 0, t+  t + a 1, t+  t x t –a 0, t+  t, a 1, t+  t are unknown –Express changes, y, for the base (old) simulation similarly, in terms of coefficients b and perform least squares fit Predict a i, t+  t as b i, t+  t + R t+  t R t+  t = (1-  ) R t +  (a i, t - b i, t ) Intuitively, the difference between the base coefficient and the current coefficient is predicted as a weighted combination of previous weights We use  = 0.5 –Gives more weight to latest results –Does not let random fluctuations affect the predictor too much Velocity estimated as latest accurate result known

Application to AFM Simulation of Proteins Example System: Muscle Protein - Titin –Around 40K atoms, mostly water –Na + and Cl - added for charge neutrality –NVT conditions, Langevin thermostat, 400K –Force constant on springs: 400kJ/(mol  nm 2 ) –GROMACS used for MD simulations

Prediction Use prior results with higher velocity –Trajectories with different random number sequences –Predict based on prior result closest to current states Use only the last verified state Use several recent verified states Fit parameters to the log-Weibull distribution (1/b) e (a-x)/b-e (a-x)/b Location: a = Scale: b =

Speedup Speedup: Green - spatial on Xeon/Myrinet at NCSA, Blue - spatial on Opteron/GigE, Red - time, at NCSA Speedup with combined space (8-way) - time parallelization Green: Conventional parallelization One time interval is 10K time steps,  5 hours sequential time The parallel overheads, excluding prediction errors, are relatively insignificant Above results use last verified state to choose prior run Using several verified states parallelized almost perfectly on 32 processors with just time parallelization

Validation Spatially parallel Time parallel Mean (spatial), time parallel Experimental data

Typical Differences RMSD Solid: Between exact and a time parallel runs Dashed: Between conventional runs using different random number sequences Force Dashed: Time parallel runs Solid: Conventional runs

Conclusions and Future Work Conclusions –Data-driven time parallelization promises substantial improvement in scalability, especially when combined with conventional parallelization More effective in hard matter simulations Obtained granularity of 13.5  s per iteration in one simulation Promising for soft matter simulations too Future Work –Better prediction –Satisfy detailed balance References –See