Preliminary CPMD Benchmarks On Ranger, Pople, and Abe TG AUS Materials Science Project Matt McKenzie LONI.

Slides:



Advertisements
Similar presentations
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Advertisements

Introduction to the Theory of Pseudopotentials Patrick Briddon Materials Modelling Group EECE, University of Newcastle, UK.
Doing Very Big Calculations on Modest Size Computers Reducing the Cost of Exact Diagonalization Using Singular Value Decomposistion Marvin Weinstein, Assa.
DFT – Practice Simple Molecules & Solids [based on Chapters 5 & 2, Sholl & Steckel] Input files Supercells Molecules Solids.
Dan Iannuzzi Kevin Pine CS 680. Outline The Problem Recap of CS676 project Goal of this GPU Research Approach Parallelization attempts Results Difficulties.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
Implementation methodology for Emerging Reconfigurable Systems With minimum optimization an appreciable speedup of 3x is achievable for this program with.
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
High memory instances Monthly SLA : Virtual Machines Validated & supported Microsoft workloads Price reduction: standard Windows (22%) & Linux (29%)
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
Ab Initio Total-Energy Calculations for Extremely Large Systems: Application to the Takayanagi Reconstruction of Si(111) Phys. Rev. Lett., Vol. 68, Number.
Large-Scale Density Functional Calculations James E. Raynolds, College of Nanoscale Science and Engineering Lenore R. Mullin, College of Computing and.
Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago)
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
TPB Models Development Status Report Presentation to the Travel Forecasting Subcommittee Ron Milone National Capital Region Transportation Planning Board.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
Norm-conserving pseudopotentials and basis sets in electronic structure calculations Javier Junquera Universidad de Cantabria.
The Nuts and Bolts of First-Principles Simulation Durham, 6th-13th December : DFT Plane Wave Pseudopotential versus Other Approaches CASTEP Developers’
RM2D Let’s write our FIRST basic SPIN program!. The Labs that follow in this Module are designed to teach the following; Turn an LED on – assigning I/O.
1 Li Xiao and Lichang Wang Department of Chemistry & Biochemistry Southern Illinois University Carbondale The Structure Effect of Pt Clusters on the Vibrational.
Read Sections 6.1 and 6.2 before viewing the slide show.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.
Planned AlltoAllv a clustered approach Stephen Booth (EPCC) Adrian Jackson (EPCC)
CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.
Phase diagram calculation based on cluster expansion and Monte Carlo methods Wei LI 05/07/2007.
Introduction to Algorithms (2 nd edition) by Cormen, Leiserson, Rivest & Stein Chapter 1: The Role of Algorithms in Computing (slides by N. Adlai A. DePano)
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Rates of Reactions Why study rates?
< BackNext >PreviewMain Chapter 2 Data in Science Preview Section 1 Tools and Models in ScienceTools and Models in Science Section 2 Organizing Your DataOrganizing.
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
1 Electronic structure calculations of potassium intercalated single-walled carbon nanotubes Sven Stafström and Anders Hansson Department of Physics, IFM.
Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,
CAS 721 Course Project Implementing Branch and Bound, and Tabu search for combinatorial computing problem By Ho Fai Ko ( )
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
Physics “Advanced Electronic Structure” Lecture 1. Theoretical Background Contents: 1. Historical Overview. 2. Basic Equations for Interacting Electrons.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
The Distributed Data Interface in GAMESS Brett M. Bode, Michael W. Schmidt, Graham D. Fletcher, and Mark S. Gordon Ames Laboratory-USDOE, Iowa State University.
CS 471 Final Project 2d Advection/Wave Equation Using Fourier Methods December 10, 2003 Jose L. Rodriguez
A Preliminary Investigation on Optimizing Charm++ for Homogeneous Multi-core Machines Chao Mei 05/02/2008 The 6 th Charm++ Workshop.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,
Parallelizing Functional Tests for Computer Systems Using Distributed Graph Exploration Alexey Demakov, Alexander Kamkin, and Alexander Sortov
The Nuts and Bolts of First-Principles Simulation Durham, 6th-13th December : Testing Testing. Basic procedure to “validate” calculations CASTEP.
Advanced methods of molecular dynamics 1.Monte Carlo methods 2.Free energy calculations 3.Ab initio molecular dynamics 4.Quantum molecular dynamics 5.Trajectory.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
Page 1 Monitoring, Optimization, and Troubleshooting Lecture 10 Hassan Shuja 11/30/2004.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
SCHOOL OF ENGINEERING AND ADVANCED TECHNOLOGY Engineering Project Routing in Small-World Networks.
States of Matter CHAPTER the BIG idea CHAPTER OUTLINE Particles of matter are in constant motion. Matter exists in different physical states. 6.1 Temperature.
Computational Chemistry Trygve Helgaker CTCC, Department of Chemistry, University of Oslo.
Matrix Multiplication in Hadoop
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
1 Nanoscale Modeling and Computational Infrastructure ___________________________ Ananth Grama Professor of Computer Science, Associate Director, PRISM.
Programmable Logic Devices
Finite-T smearing scheme in ABINIT M. Verstraete and X
Computer usage Notur 2007.
Maintaining Adiabaticity in Car-Parrinello Molecular Dynamics
Effects of Si on the Electronic Properties of the Clathrates
Development of the Nanoconfinement Science Gateway
Atomistic simulations of contact physics Alejandro Strachan Materials Engineering PRISM, Fall 2007.
Atomistic materials simulations at The DoE NNSA/PSAAP PRISM Center
Prof. Sanjay. V. Khare Department of Physics and Astronomy,
Masoud Aryanpour & Varun Rai
Constructing a system with multiple computers or processors
COMP60621 Fundamentals of Parallel and Distributed Systems
Outline System architecture Current work Experiments Next Steps
COMP60611 Fundamentals of Parallel and Distributed Systems
Atoms, Elements, & Molecules Atoms, Elements, & Molecules
Car Parrinello Molecular Dynamics
Presentation transcript:

Preliminary CPMD Benchmarks On Ranger, Pople, and Abe TG AUS Materials Science Project Matt McKenzie LONI

What is CPMD ? Car Parrinello Molecular Dynamics ▫ Parallelized plane wave / pseudopotential implementation of Density Functional Theory Common chemical systems: liquids, solids, interfaces, gas clusters, reactions ▫Large systems ~500atoms  Scales w/ # electrons NOT atoms

Key Points in Optimizing CPMD Developers have done a lot of work here The Intel compiler is used in this study BLAS/LAPACK ▫BLAS levels 1 (vector ops) and 3 (matrix-matrix ops)  Some level 2 (vector-matrix) Integrated optimized FFT Library ▫Compiler flag: -DFFT_DEFAULT

Benchmarking CPMD is difficult because… Nature of the modeled chemical system ▫Solids, liquids, interfaces  Require different parameters stressing the memory along the way ▫Volume and # electrons Choice of the pseudopotential (psp) ▫Norm-conserving, ‘soft’, non-linear core correction (++memory) Type of simulation conducted ▫CPMD, BOMD, Path Integral, Simulated Annealing, etc… ▫CPMD is a robust code Very chemical system specific ▫Any one CPMD sim. cannot be easily compared to another ▫However, THERE ARE TRENDS FOCUS: simple wave function optimization timing ▫This is a common ab initio calculation

Probing Memory Limitations For any ab initio calculation: Accuracy is proportional to # basis sets used Stored in matrices, requiring increased RAM Energy cutoff determines the size of the Plane wave basis set, N PW = (1/2π 2 )ΩE cut 3/2

Model Accuracy & Memory Overview Image obtained from the CPMD user’s manual Pseudopotential’s convergence behavior w.r.t. basis set size (cutoff) NOTE: Choice of psp is important i.e. ‘softer’ psp = lower cutoff = loss of transferability VASP specializes in soft psp’s ; CPMD works with any psp’s

Memory Comparison Ψ optimization, 63 Si atoms, SGS psp Ecut = 50 RydEcut = 70 Ryd N PW ≈ 134,000 Memory = 1.0 GB N PW ≈ 222,000 Memory = 1.8 GB Well known CPMD benchmarking model: Results can be shown either by: Wall time = (n steps x iteration time/step) + network overhead Typical Results / Interpretations, nothing new here Iteration time = fundamental unit, used throughout any given CPMD calculation It neglects the network, yet results are comparable Note, CPMD runs well on a few nodes connected with gigabyte ethernet Two important factor which affects CPMD performance MEMORY BANDWIDTH FLOATING-POINT

Pople, Abe, Ranger CPMD Benchmarks

Results I All calculations ran no longer than 2 hours Ranger is not the preferred machine for CPMD Scales well between 8 and 96 cores ▫This is a common CPMD trend CPMD is known to super-linearity scale above ~1000 processors ▫Will look into this ▫Chemical system would have to change as this smaller simulation is likely not to scale in this manner

Results II Pople and Abe gave the best performance IF a system requires more than 96 procs, Abe would be a slightly better choice Knowing the difficulties in benchmarking CPMD, ( psp, volume, system phase, sim. protocol ) this benchmark is not a good representation of all the possible uses of CPMD. ▫Only explored one part of the code How each system performs when taxed with additional memory requirements is a better indicator of CPMD’s performance ▫To increase system accuracy, increase E cut

Percent Difference between 70 and 50 Ryd %Diff = [(t 70 -t 50 ) / t 50 ]*100

Conclusions RANGER Re-ran Ranger calculations Lower performance maybe linked to Intel compiler on AMD chips ▫PGI compiler could show an improvement ▫Nothing over 5% is expected: still be the slowest ▫Wanted to use the same compiler/math libraries ABE Possible super-linear scaling, t Abe, 256procs < t others, 256procs Memory size effects hinders performance below 96 procs POPLE Is the best system for wave function optimization Shows a (relatively) stable, modest speed decrease as the memory requirement is increased, it is the recommended system

Future Work Half-node benchmarking Profiling Tools Test the MD part of CPMD ▫Force calculations involving the non-local parts of the psp will increase memory ▫Extensive level 3 BLAS & some level 2 ▫Many FFT all-to-all calls, Now the network plays a role ▫Memory > 2 GB  A new variable ! Monitor the fictitious electron mass Changing the model ▫Metallic system (lots of electrons, change of psp; E cut ) ▫Check super-linear scaling