Presentation is loading. Please wait.

Presentation is loading. Please wait.

HPCMP Benchmarking and Performance Analysis

Similar presentations


Presentation on theme: "HPCMP Benchmarking and Performance Analysis"— Presentation transcript:

1 HPCMP Benchmarking and Performance Analysis
Mark Cowan USACE ERDC ITL in support of DoD HPCMP Tuesday, April 17, 2012

2 What is the HPCMP? Initiated in 1992 Congressional mandate to modernize DoD’s HPC capabilities Assembled from collection of HPC departments across Army, Air Force, and Navy labs and test centers

3 What is the HPCMP? FOCUS Solve military and security problems using HPC hardware and software Assess technical and management risks Performance Time Available resources Cost Schedule Supports DoD objectives through research, development, test and evaluation

4 Where we benchmark

5 Migrate to a 2-year acquisition cycle
Why the radical change? Entice more vendors into the competition Vendor feedback  remove or alleviate disincentives Review the entirety of the TI acquisition process Line-by-line justification of benchmarking rules document Address both HPC community and vendor concerns Comprehensive reevaluation of how we benchmark Analyze the codes Justify the test cases

6 Migrate to a 2-year acquisition cycle
Dangers? Time the milestones poorly on the calendar and miss out on release of cutting-edge technology Difficult problem How to schedule activities to maximize likelihood of hitting publicly-available products months in advance, while being blind to intricacies of chip fabrication schedules and unforeseen recalls?

7 Codes considered for TI-11/12
ABAQUS COBALT ICEPIC ABINIT CP2K LAMMPS ACES CPMD LS-DYNA ADCIRC CTH MATLAB ADH ETA OOCORE ALE3D FDTD OVERFLOW ALEGRA FLAPW SHAMRC AMR FLUENT SIERRA AVUS GAMESS STAR CCM+ CFD++ GASP VASP CFDSHIP-IOWA GAUSSIAN WRF COAMPS HYCOM XPATCH

8 TI-11/12 benchmarking applications
ADCIRC – Coastal Circulation and Storm Surge model 100% Fortran, MPI Uses METIS library (C) 205K LOC ALEGRA – Hydrodynamic and solid dynamics plus magnetic field and thermal transport 96% C, 4% Fortran, MPI 978K LOC AVUS (Cobalt-60) – Turbulent flow CFD code Fortran, MPI, 29K LOC CTH – Shock physics code ~58% Fortran/~42% C, MPI, 900K LOC GAMESS – Quantum chemistry code Fortran, MPI, 330K LOC HYCOM – Ocean circulation modeling code Fortran, MPI, 31K LOC ICEPIC – Particle-in-cell magnetohydrodynamics code C, MPI, 350K LOC LAMMPS – Molecular dynamics code C++, MPI, 45K LOC █ Predicted █ Benchmarked

9 Components of testing packages
Applications tested on representative input sets CODE CASE Distinguished Core Count Time (sec) on DIAMOND Core Counts ADCIRC baroclinic 1024 8959 512, 768, 1024, 1280, 1536, 1792, 2048 hurricane 1280 2082 ALEGRA obliqueImp 1536 1640 1024, 1280, 1536, 1792, 2048 explWire 256 944 256, 384, 512, 768, 1024 AVUS waverider 941 384, 512, 768, 1024, 1536 turret-td 1332 768, 1024, 1280, 1536, 2048 CTH fixed-grid 3399 amr 2535 GAMESS DFT-grad 4701 128, 192, 256, 384, 512 MP2-grad 512 2536 128, 256, 512, 768, 1024 CC-energy 3658 512, 768, 1024, 1536, 2048 HYCOM lrg 1353 3020 1001, 1353, 1516, 1770, 2045 ICEPIC magnetron 384 2559 256, 384, 512, 768, 1024 gyrotron 2048 3639 1536, 1792, 2048, 2304, 2560 LAMMPS Au 3182 128, 256, 384, 512, 1024, 1280, 1536, 2048

10 Some components of HPC procurement cycle

11 Some components of HPC procurement cycle
Acquire new versions of codes Port codes to various machines Acquire test cases Develop or acquire accuracy checks Test codes, get times to compare Assemble package for vendors

12 Some components of HPC procurement cycle
Run codes with test cases on installed DSRC machines Optimize! How fast can we go?

13 Some components of HPC procurement cycle
We review vendor submittal Anything suspicious? How do vendor times compare to ours? How did vendors optimize? How risky is vendor’s proposal? Present our results

14 Components of testing packages continued
Timers measure the elapsed running times Accuracy checks ensure validity of output files Often requires determination of acceptable error bounds

15 How the test packages are used
Run all test cases on 5 different DSRC machines to acquire times Debug test packages Quantify variation across/within machines Compare times to proposed systems

16 Machine attributes DSRC Name Make Model Chip Set Processor Speed (GHz)
Architectures Used in Study DSRC Name Make Model Chip Set Processor Speed (GHz) Interconnect Number of Cores Cores per Node Operating System ERDC Diamond SGI Altix ICE Intel Xeon QC 2.8 DDR4 InfiniBand 15360 8 SUSE Linux MHPCC Mana Dell PowerEdge M610 DDR InfiniBand 9216 Linux NAVY DaVinci IBM Power6 IBM P6 DC 4.7 DDR Infiniband 4800 32 AIX Einstein Cray XT5 Cray Opteron QC 2.3 SeaStar2+ 12736 CNL Garnet XE6 AMD Opteron 64-bit 2.4 Cray Gemini 20224 16 CLE

17 RESULTS! Graphs of runtimes

18 Risk Assessment: Major Areas Assessed
Compliance assessment Ability to follow benchmark rules Number of test case results provided Results within accuracy criteria Assessment of risk in meeting proposed times in acceptance tests Differences between benchmarked and proposed system Processor , interconnect, and I/O system differences Quality of estimation procedure Quality of explanation and soundness of estimation procedure Aggressiveness of final estimate Comparison with measured benchmark system times Comparison with predicted times Assessment of likelihood of users and/or developers using proposed code modifications Acceptability of proposed code modifications

19 Benchmarking website URL:

20 Benchmarking website continued
Narrative of website purpose, codes tested Heatmap of systems best suited for applications

21 Brief description of application Brief description of test cases
Benchmarking website continued Brief description of application Brief description of test cases

22 Benchmarking website continued
An example of how we made the heatmap for allocation choices

23 Want to suggest an improvement?
Benchmarking website continued Got a question? Want to suggest an improvement? Contact us.

24 Performance Team Members
Mark Cowan – ERDC – Chair Larry Davis – HPCMPO Lloyd Slonaker – AFRL Tim Sell – AFRL Laura Brown – ERDC Mahbubur Rashid – ERDC Christine Cuicchi – NAVO Matt Grismer – AFRL Jerry Boatz – AFRL

25 Performance Team Advisors
William Ward – HPCMPO Steve Finn – DTRA Carrie Leach – ERDC Paul Bennett – ERDC Tom Oppe – ERDC Henry Newman – Instrumental Michael Laurenzano – SDSC Bronis de Supinski – LLNL Joseph Swartz – LM Allan Snavely – SDSC Laura Carrington – SDSC Robert Pennington – NSF Nick Wright – NERSC James Ianni – ARL

26 Questions?

27 Contact me… Mark Cowan USACE ERDC ITL Computational Analysis Branch
3909 Halls Ferry Road Building 8000, Room 1255 Vicksburg, MS 39180 (601)

28 ADDENDA

29 AVUS: Code description
CFD code, formerly COBALT_60 Simulates 3-D turbulent viscous flow over irregular geometries Grid-based, reads a large grid file AVUS: 29K lines of Fortran 90 code Uses ParMETIS: 12K lines of C code Parallelism via MPI, no OpenMP Runs on Cray XT, IBM Power, SGI Altix, Linux clusters

30 CTH: Code description CTA: CSM (Computational Structural Mechanics)
Shock Physics Two-step, 2nd order accurate Eulerian algorithm is used to solve the mass, momentum, and energy conservation equations An explicit approach that does not require solving a linear system Has both static and adaptive mesh capabilities Parallelism via MPI 900K LOC, 58% FORTRAN and 42% C Uses NetCDF, supplied with distribution

31 GAMESS: Code description
CTA: CCM (Computational Chemistry, Biology, and Materials Science) Ab Initio Quantum chemistry Computes many energy integrals with molecular data in form of atom positions and electron orbitals Communication depends on platform LAPI, Sockets, SHMEM, MPI Code composition: 99% FORTRAN, 1% C

32 HYCOM: Code description
CTA: Climate/Weather/Ocean Modeling and Simulation (CWO) A primitive equation ocean general circulation model Communication is MPI (MPI-2 is available) 100% FORTRAN Version

33 HYCOM: MPI-2 details HYCOM may be run with MPI or MPI-2
MPI-2 is MPI with additional features such as parallel I/O, dynamic process management and remote memory operations HYCOM utilizes parallel I/O feature Parallel I/O times required starting with TI-10

34 ICEPIC: Code description
CTA: Computational Electromagnetics and Acoustics (CEA) Particle-in-cell plasma physics code Ions and electrons move under influence of electromagnetic fields Particles are updated in a grid-free manner; grouped in cells which are periodically adjusted to preserve load balance Fields calculated on a structured, static grid and dual grid according to Maxwell's Equations Can simulate plasmas contained in complex geometries Used in electromagnetic device design ~350K lines of code, 100% C++, C

35 LAMMPS: Code description
CTA: CCM (Comp Chemistry, Biology, & Material Science) Classical molecular dynamics code that models particles in a liquid, solid, or gaseous state Calculates atomic velocities, positions, system energy, and temperature After equilibration: surface tension, radial pressure, and phase change Post-processing: pair-correlation function and diffusion coefficients All actions occur within box (usually orthogonal) Distributed-memory message-passing parallelism (MPI) Highly-portable C++ Libraries needed: MPI and single-processor FFT

36 ADCIRC: Code description
ADCIRC Coastal Circulation and Storm Surge Model Solves time dependent, free surface circulation and transport problems in 2 and 3 dimensions. Use the finite element method in space, which permits highly flexible, unstructured grids. Typical ADCIRC applications have included: Modeling tides and wind driven circulation, Analysis of hurricane storm surge and flooding, Dredging feasibility and material disposal studies, Larval transport studies, and Near shore marine operations

37 Pressure and temperature during formation of jet from shaped charge
“BASE” ALEGRA: Code description ALE code -- Arbitrary Lagrangian-Eulerian -- provides flexibility, accuracy and reduced numerical dissipation over pure Eulerian code; modern remeshing technology allows for robust mesh smoothing and control. Hydrodynamic and solid dynamics Models large distortions and strong shock propagation in multiple-materials Finite element code; descendent of PRONTO and uses some CTH Eulerian technology Energy deposition and explosive burn models Geometry -- 2D/3D Cartesian, 2D cylindrical   Material Models in ALEGRA: Equations of State Elastic-Plastic Models Fracture Models Pressure and temperature during formation of jet from shaped charge

38 “BASE” ALEGRA: Code description

39 ALEGRA_MHD: Code description
All hydrodynamics/solid dynamic modules of "base" ALEGRA PLUS magnetic field and thermal transport effects Lorentz forces, Joule heating, thermal transport and simple models for radiating excess energy 2D and 3D versions 2D modeling with the magnetic flux density vector components in or out of the plane with the corresponding current density out of or in the plane, respectively. 3D uses a magnetic diffusion solution based on edge and face elements which maintains the discrete flux divergence-free property during magnetic solve and constrained transport remap stage Lumped element coupled circuit equations Magnetic and thermal conduction Advanced models for thermal and electrical conductivity Emission model radiates excess energy when medium is optically thin while accounting for reabsorption


Download ppt "HPCMP Benchmarking and Performance Analysis"

Similar presentations


Ads by Google