Grad OS Course Project Kevin Kastner Xueheng Hu

Grad OS Course Project Kevin Kastner Xueheng Hu
Performance and Scaling Effects of MD Simulations using NAMD 2.7 and 2.8 Grad OS Course Project Kevin Kastner Xueheng Hu

Introduction Molecular Dynamics (MD)
MD is extremely computationally intensive Primarily due to the sheer size of the system Large system simulation can potentially take thousands of years on a modern desktop NAMD – Parallelized simulation tool for MD Recent release is 2.8 Our course project is mainly about investigating the performance attribute in molecular dynamics simulations. Molecular dynamics is a virtual simulation that depicts the movements of individual atoms and simple molecules in a given system. And MD simulations are usually very computationally intensive, which is primarily due to the sheer size of the systems being simulated. Large system simulations could potentially take thousands of years to complete on a modern desktop. The simulation tool that we’re using is NAMD, and the most recent release is 2. 8

GPCR Simulation Example
Here are some videos of what a molecular dynamics simulation does. (Start left video) On the left, is a G-Protein Coupled Receptor, or GPCR protein in a 10 ns simulation, that is, it shows the amount of movement the actual protein would do in only 10 ns real-time. (Start right video) On the right is the same protein as on the left, except this one also shows all of the surrounding water and lipid atoms that are also being calculated in the MD simulation. It should also be noted that there are more atoms in the protein as well that are not shown. As all of these atoms are being considered when doing calculations in MD, you can see how it would be quite computationally intensive.

Summary of Work Completed
Performance Comparison: NAMD 2.7 vs 2.8 Tested three different systems using each version, comparing efficiency of each How different size/complexity of the systems affect the performance of NAMD NAMD Scaling analysis Force Field Comparison Our work contains two main parts: First, we compared the performance of the newest version of NAMD, 2.8, and the previous version, 2.7, on three systems of differing size. Secondly, we explored how different sizes and complexities of the systems affect the performance of NAMD. - We performed a scaling analysis of both versions of NAMD, finding the optimum number of cores as well as peak performance for systems of varying size. - We also performed a force field comparison by running one of the system on NAMD using AMBER in addition to running it on CHARMM.

Performance Metrics Performance Efficiency
Performance Efficiency per Core Normalized Performance Efficiency per Core x: core set; base: 12 Performance Efficiency: time used for the actual movement of the protein divided by the corresponding time used to complete the simulation for the same amount of movement. Performance Efficiency per core: takes the efficiency of a specific core set and divide it by the efficiency when the based core set is used Normalized performance per core: performance ration divided by core set ratio

Simulation Systems Octopamine Receptor, a GPCR 56824 atoms
These are the three systems that we ran. a. The first is a octopamine receptor protein in a lipid membrane, solvated with water and ions, and containing about atoms. b. The next is a dihydrofolate reductase-thymidylate synthase fusion protein (DHFR-TS) solvated in water and containing about atoms. c. The last is ubiquitin solvated in water and containing about 7000 atoms. The simulation systems shown here were tested on the Kraken high-performance computing cluster, which was discussed by Dr. Timothy Stitt in one of our guest lectures. We tested the simulation efficiency of the two NAMD versions with varying core amounts, which I will refer to as core sets. We did 5 runs for each core set for each version of NAMD on three different systems using the CHARMM force field. Octopamine Receptor, a GPCR 56824 atoms (b) DHFR-TS Fusion Protein 82026 atoms (c) Ubiquitin 7051 atoms

Results - 57000 Atoms 57000 Atom Efficiency
Here are the performance results for the octopamine receptor system containing about atoms. Shown here are the average performance efficiencies of all 5 runs, with corresponding standard deviation displayed as error bars. As can be seen here, NAMD 2.7 does the same or better than version 2.8 for up to and including 300 cores, supporting our results from earlier. However, an unexpected event happened, in that for the 396 and higher core sets, NAMD 2.8 did much better than NAMD 2.7. Furthermore, NAMD 2.7’s efficiency begins to decline at approximately the same point as NAMD 2.8’s efficiency has its most drastic increase, with the exception of the beginning, of course. We are as of yet uncertain as to why this occurs. This graph also gives us an approximate optimal number of cores for our atom system on each version of NAMD, with 2.7’s optimum being around 300 and 2.8’s being around 504. 57000 Atom Efficiency

57000 Atom Efficiency per Core
Results Atoms This chart shows the average estimated efficiency of each core in each core set compared to our baseline metric of 12 cores. Note that 12 cores are used instead of 1 as this is the lowest amount of cores that can be used in Kraken. As is expected, you see a decrease in the efficiency of each core with the increasing number of cores used. This is of course due to the need for increasing amount of communication between the cores to complete the task. In agreement with the previous chart, NAMD 2.7 makes better use of each core in each core set up to and including 300 cores. However, NAMD 2.8 outperforms 2.7 in the 396 through 1008 core sets, which had also been shown in the previous chart. 57000 Atom Efficiency per Core

Here are the performance results for the DHFR-TS system containing about atoms, making it the largest system in our test set. In agreement with our previous results, NAMD 2.7 outperforming for the lower core sets, and NAMD 2.8 outperforming at the higher sets (in this case the dividing line appears to be at 192 through 300 cores). Once again, at the higher core sets NAMD 2.8 outperforms 2.7 by a higher amount than NAMD 2.7 ever outperforms 2.8 (as is shown on this Figure, NAMD 2.7’s outperformance at lower core sets is barely noticeable while 2.8’s is readily apparent at higher core sets). Something that is worthy of note is the large performance spike that appears at the 1008 core set for NAMD 2.8. 80000 Atom Efficiency

Results Atoms This chart shows the estimated percentage efficiency per core for the atom system. As is expected, the efficiency of each core decreases with each increase in the total number of cores in the set, due to more time being spent communicating between cores. This graph also demonstrates the efficiency reversal between the two NAMD versions, though here is appears to be at the 504 core set. This is because NAMD 2.8 actually outperforms 2.7 slightly at the base set, so even though the two versions have equivalent efficiencies for the 192 and 300 core sets, NAMD 2.8 appears more inefficient. 80000 Atom Efficiency per Core

Here are the performance results for the ubiquitin system containing about 7000 atoms, making it the smallest system in our test set. This system did not follow the trend of our other two systems, as NAMD 2.8 did better than or equal to 2.7 for nearly every core set. Even more interesting is that NAMD 2.8 outperforms 2.7 the most in the 96 through 192 core sets, instead of for the higher core sets as had been seen previously. Also note that tests higher than 2016 were performed, but had efficiencies less than or equal to what is shown at their respective 2016 core sets, so they were left out for better visibility of the lower end core sets. 7000 Atom Efficiency

Results Atoms As it shows here the estimated percentage efficiency per core for the 7000 atom system. As is expected, the efficiency of each core decreases with each increase in the total number of cores in the set, due to more time being spent communicating between cores. This chart also demonstrates the large efficiency increase per core for NAMD 2.8 over NAMD 2.7 in the 96, 120, and 192 core sets. 7000 Atom Efficiency per Core

Results – NAMD Scaling Analysis
Optimal Number of Cores Peak Performance We had mentioned before about potentially finding an optimum number of cores for each version of NAMD. These graphs were generated by taking the peak performance values for each system and plotting the size of the systems by either the number of cores (indicate left graph) or the efficiency (indicate right graph). Please note that even though the 7000 atom system had a higher efficiency in the 2016 core set for both versions of NAMD, the peak values found in the lower core sets were chosen as they have a better per core efficiency and would not be as wasteful for the kraken service units being allocated. For NAMD 2.7 an apparent optimum number of cores was found and appears to be at around 300 cores. However, we were not able to find an optimum number of cores for NAMD 2.8, due to the sudden jump in optimum cores for our largest system. An optimum peak performances did appear to be found and are shown. As expected, the smallest system performed the best and their appears to be a general decrease as the system size increases, though there appears to be a slight increase in performance in NAMD 2.8 from the atom system to the atom system. These results indicate that NAMD 2.8 was optimized for using larger core sets (indicate left graph) on larger systems (indicate right graph).

Results - Force Field Comparison
NAMD 2.7 – atoms NAMD 2.8 – atoms Force field comparisons were also done to see if the differing types of parameterization by the force fields had any sort of influence on NAMD’s efficiency. The AMBER force field was used on the atom system and the performances displayed here for both versions of NAMD. For both versions of NAMD, performances were nearly identical for both force fields, though there was a slight decrease in AMBER’s performance in the 504 core set in NAMD 2.8. Despite this decrease, it appears that choosing different force fields has little bearing on the efficiency of either version of NAMD.

Summary of Results Performance Difference NAMD Scaling Analysis
57000 and atom: NAMD 2.8 was optimized for performance using larger core sets 7000 atom: odd results, two possible reasons: performance optimization only works for larger simulation systems the performances for either version will start to increase again if giving enough cores and the efficiencies may potentially reverse once again NAMD Scaling Analysis Optimal Number of Cores Peak Performance Force Field Comparison CHARMM vs AMBER So in summary, our project had three main parts. First, the Performance Difference - Get from slide Next, for the NAMD Scaling Analysis - We found an apparent optimum number of cores for NAMD 2.7, but could not find it for NAMD 2.8 due to a sudden jump in the atom system - We found the peak performance trend in both versions of NAMD, and the surprising result that NAMD 2.8 may be optimized for larger systems. Finally, a Force Field Comparison of CHARMM vs AMBER was done for the atom system. - Despite differing parameters defined by each force field, the efficiencies of both force fields were nearly identical.

Future Work More test cases to obtain empirical data for performance boundaries Deeper Analysis on Performance Differences System Calls Network Communications (We need to find out available tools for Kraken) Potential future work that could come from this project would be to include more test cases to better determine performance scaling and boundaries. This includes more core sets for each of the systems already tested as well as new systems to test. Something else that could be done would be to find tools that will run on Kraken and do a deeper analysis of what is going on in each of cores. Some aspects that would potentially be very useful to study would be the system calls being done by the cores as well as the network communications that are going on between them. Other tests that could be done would be a CPU vs GPU comparison, as well as testing systems of equal size yet differing complexities. ** Look at statistical tools that look at MPI**

Questions? (Start video) So with that, are there any questions?

Grad OS Course Project Kevin Kastner Xueheng Hu

Similar presentations

Presentation on theme: "Grad OS Course Project Kevin Kastner Xueheng Hu"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Grad OS Course Project Kevin Kastner Xueheng Hu

Similar presentations

Presentation on theme: "Grad OS Course Project Kevin Kastner Xueheng Hu"— Presentation transcript:

Similar presentations

About project

Feedback