Grad OS Course Project Kevin Kastner Xueheng Hu

Slides:



Advertisements
Similar presentations
Introductory Circuit Analysis Robert L. Boylestad
Advertisements

Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.
PERFECT COMPETITION Economics – Course Companion
Part II – TIME SERIES ANALYSIS C3 Exponential Smoothing Methods © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Sampling Distributions
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Central Limit Theorem.
APPLICATIONS OF DIFFERENTIATION 4. In Sections 2.2 and 2.4, we investigated infinite limits and vertical asymptotes.  There, we let x approach a number.
LIAL HORNSBY SCHNEIDER
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Measures of Central Tendency
Ana Damjanovic (JHU, NIH) JHU: Petar Maksimovic Bertrand Garcia-Moreno NIH: Tim Miller Bernard Brooks OSG: Torre Wenaus and team.
Calibration Guidelines 1. Start simple, add complexity carefully 2. Use a broad range of information 3. Be well-posed & be comprehensive 4. Include diverse.
Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement.
LIMITS AND DERIVATIVES 2. In Sections 2.2 and 2.4, we investigated infinite limits and vertical asymptotes.  There, we let x approach a number.  The.
REECH ME: Regional Energy Efficient Cluster Heads based on Maximum Energy Routing Protocol Prepared by: Arslan Haider. 1.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Optimizing Parallel Programming with MPI Michael Chen TJHSST Computer Systems Lab Abstract: With more and more computationally- intense problems.
Chapter 7: Sampling Distributions Section 7.2 Sample Proportions.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
The normal approximation for probability histograms.
The Population of Near-Earth Asteroids and Current Survey Completion Alan W. Harris MoreData! : The Golden Age of Solar System Exploration Rome,
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Analysis of Time Series
GRAPHS AND CHARTS ..
Chapter 8: Estimating with Confidence
Analysis of Quantitative Data
OPERATING SYSTEMS CS 3502 Fall 2017
Chapter 8: Estimating with Confidence
Probability and Statistics
Descriptive Statistics
Geometric sequences.
Distribution of the Sample Means
Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang Cornell University
8.1 Sampling Distributions
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Description of Data (Summary and Variability measures)
2.1 Equations of Lines Write the point-slope and slope-intercept forms
Introduction to Summary Statistics
Topic 5: Exploring Quantitative data
DSQR Training Process Capability & Performance
Inferential Statistics
Statistical Analysis Error Bars
Lesson 1: Summarizing and Interpreting Data
Regular Gaits and Optimal Velocities for Motor Proteins
COMP60621 Fundamentals of Parallel and Distributed Systems
Subject Name: Operating System Concepts Subject Number:
Chapter 8: Estimating with Confidence
Apparent Subdiffusion Inherent to Single Particle Tracking
Sean A. McKinney, Chirlmin Joo, Taekjip Ha  Biophysical Journal 
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Statistical Prediction and Molecular Dynamics Simulation
Chapter 8: Estimating with Confidence
Module Recognition Algorithms
Validating Solution Ensembles from Molecular Dynamics Simulation by Wide-Angle X- ray Scattering Data  Po-chia Chen, Jochen S. Hub  Biophysical Journal 
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
COMP60611 Fundamentals of Parallel and Distributed Systems
Chapter 8: Estimating with Confidence
Scalable light field coding using weighted binary images
Manuel Jan Roth, Matthis Synofzik, Axel Lindner  Current Biology 
Presumptions Subgroups (samples) of data are formed.
Excursions into Parallel Programming
Regular Gaits and Optimal Velocities for Motor Proteins
Presentation transcript:

Grad OS Course Project Kevin Kastner Xueheng Hu Performance and Scaling Effects of MD Simulations using NAMD 2.7 and 2.8 Grad OS Course Project Kevin Kastner Xueheng Hu

Introduction Molecular Dynamics (MD) MD is extremely computationally intensive Primarily due to the sheer size of the system Large system simulation can potentially take thousands of years on a modern desktop NAMD – Parallelized simulation tool for MD Recent release is 2.8 Our course project is mainly about investigating the performance attribute in molecular dynamics simulations. Molecular dynamics is a virtual simulation that depicts the movements of individual atoms and simple molecules in a given system. And MD simulations are usually very computationally intensive, which is primarily due to the sheer size of the systems being simulated. Large system simulations could potentially take thousands of years to complete on a modern desktop. The simulation tool that we’re using is NAMD, and the most recent release is 2. 8

GPCR Simulation Example Here are some videos of what a molecular dynamics simulation does. (Start left video) On the left, is a G-Protein Coupled Receptor, or GPCR protein in a 10 ns simulation, that is, it shows the amount of movement the actual protein would do in only 10 ns real-time. (Start right video) On the right is the same protein as on the left, except this one also shows all of the surrounding water and lipid atoms that are also being calculated in the MD simulation. It should also be noted that there are more atoms in the protein as well that are not shown. As all of these atoms are being considered when doing calculations in MD, you can see how it would be quite computationally intensive.

Summary of Work Completed Performance Comparison: NAMD 2.7 vs 2.8 Tested three different systems using each version, comparing efficiency of each How different size/complexity of the systems affect the performance of NAMD NAMD Scaling analysis Force Field Comparison Our work contains two main parts: First, we compared the performance of the newest version of NAMD, 2.8, and the previous version, 2.7, on three systems of differing size. Secondly, we explored how different sizes and complexities of the systems affect the performance of NAMD. - We performed a scaling analysis of both versions of NAMD, finding the optimum number of cores as well as peak performance for systems of varying size. - We also performed a force field comparison by running one of the system on NAMD using AMBER in addition to running it on CHARMM.

Performance Metrics Performance Efficiency Performance Efficiency per Core Normalized Performance Efficiency per Core x: core set; base: 12 Performance Efficiency: time used for the actual movement of the protein divided by the corresponding time used to complete the simulation for the same amount of movement. Performance Efficiency per core: takes the efficiency of a specific core set and divide it by the efficiency when the based core set is used Normalized performance per core: performance ration divided by core set ratio

Simulation Systems Octopamine Receptor, a GPCR 56824 atoms These are the three systems that we ran. a. The first is a octopamine receptor protein in a lipid membrane, solvated with water and ions, and containing about 57000 atoms. b. The next is a dihydrofolate reductase-thymidylate synthase fusion protein (DHFR-TS) solvated in water and containing about 80000 atoms. c. The last is ubiquitin solvated in water and containing about 7000 atoms. The simulation systems shown here were tested on the Kraken high-performance computing cluster, which was discussed by Dr. Timothy Stitt in one of our guest lectures. We tested the simulation efficiency of the two NAMD versions with varying core amounts, which I will refer to as core sets. We did 5 runs for each core set for each version of NAMD on three different systems using the CHARMM force field. Octopamine Receptor, a GPCR 56824 atoms (b) DHFR-TS Fusion Protein 82026 atoms (c) Ubiquitin 7051 atoms

Results - 57000 Atoms 57000 Atom Efficiency Here are the performance results for the octopamine receptor system containing about 57000 atoms. Shown here are the average performance efficiencies of all 5 runs, with corresponding standard deviation displayed as error bars. As can be seen here, NAMD 2.7 does the same or better than version 2.8 for up to and including 300 cores, supporting our results from earlier. However, an unexpected event happened, in that for the 396 and higher core sets, NAMD 2.8 did much better than NAMD 2.7. Furthermore, NAMD 2.7’s efficiency begins to decline at approximately the same point as NAMD 2.8’s efficiency has its most drastic increase, with the exception of the beginning, of course. We are as of yet uncertain as to why this occurs. This graph also gives us an approximate optimal number of cores for our 57000 atom system on each version of NAMD, with 2.7’s optimum being around 300 and 2.8’s being around 504. 57000 Atom Efficiency

57000 Atom Efficiency per Core Results - 57000 Atoms This chart shows the average estimated efficiency of each core in each core set compared to our baseline metric of 12 cores. Note that 12 cores are used instead of 1 as this is the lowest amount of cores that can be used in Kraken. As is expected, you see a decrease in the efficiency of each core with the increasing number of cores used. This is of course due to the need for increasing amount of communication between the cores to complete the task. In agreement with the previous chart, NAMD 2.7 makes better use of each core in each core set up to and including 300 cores. However, NAMD 2.8 outperforms 2.7 in the 396 through 1008 core sets, which had also been shown in the previous chart. 57000 Atom Efficiency per Core

Results - 80000 Atoms 80000 Atom Efficiency Here are the performance results for the DHFR-TS system containing about 80000 atoms, making it the largest system in our test set. In agreement with our previous results, NAMD 2.7 outperforming for the lower core sets, and NAMD 2.8 outperforming at the higher sets (in this case the dividing line appears to be at 192 through 300 cores). Once again, at the higher core sets NAMD 2.8 outperforms 2.7 by a higher amount than NAMD 2.7 ever outperforms 2.8 (as is shown on this Figure, NAMD 2.7’s outperformance at lower core sets is barely noticeable while 2.8’s is readily apparent at higher core sets). Something that is worthy of note is the large performance spike that appears at the 1008 core set for NAMD 2.8. 80000 Atom Efficiency

80000 Atom Efficiency per Core Results - 80000 Atoms This chart shows the estimated percentage efficiency per core for the 80000 atom system. As is expected, the efficiency of each core decreases with each increase in the total number of cores in the set, due to more time being spent communicating between cores. This graph also demonstrates the efficiency reversal between the two NAMD versions, though here is appears to be at the 504 core set. This is because NAMD 2.8 actually outperforms 2.7 slightly at the base set, so even though the two versions have equivalent efficiencies for the 192 and 300 core sets, NAMD 2.8 appears more inefficient. 80000 Atom Efficiency per Core

Results - 7000 Atoms 7000 Atom Efficiency Here are the performance results for the ubiquitin system containing about 7000 atoms, making it the smallest system in our test set. This system did not follow the trend of our other two systems, as NAMD 2.8 did better than or equal to 2.7 for nearly every core set. Even more interesting is that NAMD 2.8 outperforms 2.7 the most in the 96 through 192 core sets, instead of for the higher core sets as had been seen previously. Also note that tests higher than 2016 were performed, but had efficiencies less than or equal to what is shown at their respective 2016 core sets, so they were left out for better visibility of the lower end core sets. 7000 Atom Efficiency

7000 Atom Efficiency per Core Results - 7000 Atoms As it shows here the estimated percentage efficiency per core for the 7000 atom system. As is expected, the efficiency of each core decreases with each increase in the total number of cores in the set, due to more time being spent communicating between cores. This chart also demonstrates the large efficiency increase per core for NAMD 2.8 over NAMD 2.7 in the 96, 120, and 192 core sets. 7000 Atom Efficiency per Core

Results – NAMD Scaling Analysis Optimal Number of Cores Peak Performance We had mentioned before about potentially finding an optimum number of cores for each version of NAMD. These graphs were generated by taking the peak performance values for each system and plotting the size of the systems by either the number of cores (indicate left graph) or the efficiency (indicate right graph). Please note that even though the 7000 atom system had a higher efficiency in the 2016 core set for both versions of NAMD, the peak values found in the lower core sets were chosen as they have a better per core efficiency and would not be as wasteful for the kraken service units being allocated. For NAMD 2.7 an apparent optimum number of cores was found and appears to be at around 300 cores. However, we were not able to find an optimum number of cores for NAMD 2.8, due to the sudden jump in optimum cores for our largest system. An optimum peak performances did appear to be found and are shown. As expected, the smallest system performed the best and their appears to be a general decrease as the system size increases, though there appears to be a slight increase in performance in NAMD 2.8 from the 57000 atom system to the 80000 atom system. These results indicate that NAMD 2.8 was optimized for using larger core sets (indicate left graph) on larger systems (indicate right graph).

Results - Force Field Comparison NAMD 2.7 – 57000 atoms NAMD 2.8 – 57000 atoms Force field comparisons were also done to see if the differing types of parameterization by the force fields had any sort of influence on NAMD’s efficiency. The AMBER force field was used on the 57000 atom system and the performances displayed here for both versions of NAMD. For both versions of NAMD, performances were nearly identical for both force fields, though there was a slight decrease in AMBER’s performance in the 504 core set in NAMD 2.8. Despite this decrease, it appears that choosing different force fields has little bearing on the efficiency of either version of NAMD.

Summary of Results Performance Difference NAMD Scaling Analysis 57000 and 80000 atom: NAMD 2.8 was optimized for performance using larger core sets 7000 atom: odd results, two possible reasons: performance optimization only works for larger simulation systems the performances for either version will start to increase again if giving enough cores and the efficiencies may potentially reverse once again NAMD Scaling Analysis Optimal Number of Cores Peak Performance Force Field Comparison CHARMM vs AMBER So in summary, our project had three main parts. First, the Performance Difference - Get from slide Next, for the NAMD Scaling Analysis - We found an apparent optimum number of cores for NAMD 2.7, but could not find it for NAMD 2.8 due to a sudden jump in the 80000 atom system - We found the peak performance trend in both versions of NAMD, and the surprising result that NAMD 2.8 may be optimized for larger systems. Finally, a Force Field Comparison of CHARMM vs AMBER was done for the 57000 atom system. - Despite differing parameters defined by each force field, the efficiencies of both force fields were nearly identical.

Future Work More test cases to obtain empirical data for performance boundaries Deeper Analysis on Performance Differences System Calls Network Communications (We need to find out available tools for Kraken) Potential future work that could come from this project would be to include more test cases to better determine performance scaling and boundaries. This includes more core sets for each of the systems already tested as well as new systems to test. Something else that could be done would be to find tools that will run on Kraken and do a deeper analysis of what is going on in each of cores. Some aspects that would potentially be very useful to study would be the system calls being done by the cores as well as the network communications that are going on between them. Other tests that could be done would be a CPU vs GPU comparison, as well as testing systems of equal size yet differing complexities. ** Look at statistical tools that look at MPI**

Questions? (Start video) So with that, are there any questions?