HPCMP Benchmarking Update Cray Henry April 2008 Department of Defense High Performance Computing Modernization Program.

Slides:



Advertisements
Similar presentations
Presentation Outline A word or two about our program Our HPC system acquisition process Program benchmark suite Evolution of benchmark-based performance.
Advertisements

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Tesla CUDA GPU Accelerates Computing The Right Processor for the Right Task.
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.
FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.
Computation Fluid Dynamics & Modern Computers: Tacoma Narrows Bridge Case Study Farzin Shakib ACUSIM Software, Inc SGI Technical Users ’ Conference.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Early Experiences with Datastar: A 10TF Power4 + Federation.
UCSD SAN DIEGO SUPERCOMPUTER CENTER 1 Symbiotic Space-Sharing: Mitigating Resource Contention on SMP Systems Professor Snavely, University of California.
The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.
HPCMP Benchmarking and Performance Analysis
Implementation methodology for Emerging Reconfigurable Systems With minimum optimization an appreciable speedup of 3x is achievable for this program with.
Making HPC System Acquisition Decisions Is an HPC Application Larry P. Davis and Cray J. Henry, Department of Defense High Performance Computing Modernization.
OpenFOAM on a GPU-based Heterogeneous Cluster
HPC Impacts Automotive Aerodynamics Computational Fluid Dynamics HPC demands Kevin Golsch Aerodynamics – Energy Center 1 October 2010.
1 Presentation at the 4 th PMEO-PDS Workshop Benchmark Measurements of Current UPC Platforms Zhang Zhang and Steve Seidel Michigan Technological University.
Project 4 U-Pick – A Project of Your Own Design Proposal Due: April 14 th (earlier ok) Project Due: April 25 th.
2009 January 10-12© M. Kostic Prof. M. Kostic Mechanical Engineering NORTHERN ILLINOIS UNIVERSITY The CFD Simulation of Flooding Flows and Scouring Around.
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
Akhil Langer, Harshit Dokania, Laxmikant Kale, Udatta Palekar* Parallel Programming Laboratory Department of Computer Science University of Illinois at.
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
1 The VAMPIR and PARAVER performance analysis tools applied to a wet chemical etching parallel algorithm S. Boeriu 1 and J.C. Bruch, Jr. 2 1 Center for.
Programming for High Performance Computers John M. Levesque Director Cray’s Supercomputing Center Of Excellence.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Storage Allocation in Prefetching Techniques of Web Caches D. Zeng, F. Wang, S. Ram Appeared in proceedings of ACM conference in Electronic commerce (EC’03)
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
Statistical Performance Analysis for Scientific Applications Presentation at the XSEDE14 Conference Atlanta, GA Fei Xing Haihang You Charng-Da Lu July.
Information Technology at Purdue Presented by: Dr. Gerry McCartney Vice President and CIO, ITaP HPC User Forum September 8-10, 2008 Using SiCortex SC5832.
Waleed Alkohlani 1, Jeanine Cook 2, Nafiul Siddique 1 1 New Mexico Sate University 2 Sandia National Laboratories Insight into Application Performance.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
QCD Project Overview Ying Zhang September 26, 2005.
HPC Technology Track: Foundations of Computational Science Lecture 1 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Cell processor implementation of a MILC lattice QCD application.
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
PET Summer Institute Kim Kido | Univ. Hawaii Manoa.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
ANTs PI Meeting, Nov. 29, 2000W. Zhang, Washington University1 Flexible Methods for Multi-agent distributed resource Allocation by Exploiting Phase Transitions.
Early Experiences with Energy-Aware (EAS) Scheduling
1 Evaluation and Optimization of Multicore Performance Bottlenecks in Supercomputing Applications Jeff Diamond 1, Martin Burtscher 2, John D. McCalpin.
Supercomputing Center CFD Grid Research in N*Grid Project KISTI Supercomputing Center Chun-ho Sung.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
Supercomputing Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.
- Rohan Dhamnaskar. Overview  What is a Supercomputer  Some Concepts  Couple of examples.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Compiler and Tools: User Requirements from ARSC Ed Kornkven Arctic Region Supercomputing Center DSRC HPC User Forum September 10, 2009.
Mcs/ HPC challenges in Switzerland Marie-Christine Sawley General Manager CSCS SOS8, Charleston April,
Sep 08, 2009 SPEEDUP – Optimization and Porting of Path Integral MC Code to New Computing Architectures V. Slavnić, A. Balaž, D. Stojiljković, A. Belić,
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Program Update Presented by Larry Davis, Deputy Director September 2009 Department of Defense High Performance Computing Modernization Program.
The Distributed Data Interface in GAMESS Brett M. Bode, Michael W. Schmidt, Graham D. Fletcher, and Mark S. Gordon Ames Laboratory-USDOE, Iowa State University.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
Tackling I/O Issues 1 David Race 16 March 2010.
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
MicroGrid Update & A Synthetic Grid Resource Generator Xin Liu, Yang-suk Kee, Andrew Chien Department of Computer Science and Engineering Center for Networked.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Societal applications of large scalable parallel computing systems ARTEMIS & ITEA Co-summit, Madrid, October 30th 2009.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Performance Evaluation of Adaptive MPI
Supporting Fault-Tolerance in Streaming Grid Applications
Department of Computer Science University of California, Santa Barbara
Development of the Nanoconfinement Science Gateway
Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,
Department of Computer Science University of California, Santa Barbara
Support for Adaptivity in ARMCI Using Migratable Objects
Presentation transcript:

HPCMP Benchmarking Update Cray Henry April 2008 Department of Defense High Performance Computing Modernization Program

Outline Context – HPCMP Initial Motivation from 2003 Process Review Results

DoD HPC Modernization Program

HPCMP Serves a Large, Diverse DoD User Community 519 projects and 4,086 users at approximately 130 sites Requirements categorized in 10 Computational Technology Areas (CTA) FY08 non-real-time requirements of 1,108 Habu-equivalents 156 users are self characterized as “Other” Computational Structural Mechanics – 437 Users Electronics, Networking, and Systems/C4I – 114 Users Computational Chemistry, Biology & Materials Science – 408 Users Computational Electromagnetics & Acoustics – 337 Users Computational Fluid Dynamics – 1,572 Users Environmental Quality Modeling & Simulation – 147 Users Signal/Image Processing – 353 Users Integrated Modeling & Test Environments – 139 Users Climate/Weather/Ocean Modeling & Simulation – 241 Users Forces Modeling & Simulation – 182 Users

Benchmarks Have REAL Impact In 2003 we started to describe our benchmarking approach Today benchmarks are even more important

2003 Benchmark Focus Focused on application benchmarks Recognized application benchmarks were not enough

2003 Challenge – Move to Synthetic Benchmarks 5 years later we have made progress, but not enough to fully transition to synthetics Supported over $300M in purchases so far

Comparison of HPCMP System Capabilities – FY FY 2008 Habu-equivalents per Processor

What Has Changed Since 2003 (TI-08) Introduction of performance modeling and predictions –Primary emphases still on application benchmarks –Performance modeling now used to predict some application performance –Performance predictions and measured benchmark results compared for HPCMP systems used in TI-08 to assess accuracy (TI-08) Met one on one with vendors to review performance predictions for each vendor’s individual systems

Overview of TI-XX Acquisition Process Determine requirements, usage, and allocations Choose application benchmarks, test cases, and weights Vendors provide measured and projected times on offered systems Measure benchmark times on DoD standard system Measure benchmark times on existing DoD systems Determine performance for each offered system per application test case Determine performance for each existing system per application test case Determine performance for each offered system Usability/past performance information on offered systems Collective acquisition decision Use optimizer to determine price/performance for each offered system and combination of systems Center facility requirements Vendor pricing Life-cycle costs for offered systems

TI-09 Application Benchmarks AMR – Gas dynamics code –(C++/FORTRAN, MPI, 40,000 SLOC) AVUS (Cobalt-60) – Turbulent flow CFD code –(Fortran, MPI, 19,000 SLOC) CTH – Shock physics code –(~43% Fortran/~57% C, MPI, 436,000 SLOC) GAMESS – Quantum chemistry code –(Fortran, MPI, 330,000 SLOC) HYCOM – Ocean circulation modeling code –(Fortran, MPI, 31,000 SLOC) ICEPIC – Particle-in-cell magnetohydrodynamics code –(C, MPI, 60,000 SLOC) LAMMPS – Molecular dynamics code –(C++, MPI, 45,400 SLOC) Red = predicted Black = benchmarked

Predicting Code Performance for TI-08 and TI-09 *The next 12 charts were provided by the Performance Modeling and Characterization Group at the San Diego Supercomputer Center.

Prediction Framework – Processor and Communications Models

Memory Subsystem Is Key in Predicting Performance

Red Shift – Memory Subsystem Bottleneck

Predicted Compute Time Per Core – HYCOM

MultiMAPS System Profile One curve per stride pattern –Plateaus correspond to data fitting in cache –Drops correspond to data split between cache levels MultiMAPS ported to C and will be included in HPC Challenge Benchmarks Working Set Size (8 Byte Words) Memory Bandwidth (MB/s) Sample MultiMAPS Output

L2 cache being shared 4 Core Woodcrest Node Modeling the Effects of Multicore

Performance Sensitivity of LAMMPS LRG to 2x Improvements

Performance Sensitivity of OVERFLOW2 STD to 2x Improvements

Performance Sensitivity of OVERFLOW2 LRG to 2x Improvements

Main Memory and L1 Cache Have Most Effect on Runtime

Differences Between Predicted and Measured Benchmark Times (Unsigned ) Application Test Case System AMR Std AMR Lg ICEPIC Std ICEPIC Lg LAMMPS Lg OVERFLOW2 Std OVERFLOW2 Lg WRF Std WRF Lg Overall ASC HP Opteron Cluster 16.6%6.3%- -2.9%8.0%43.0% % ASC SGI Altix14.1%3.4%22.1%15.6%7.5%4.1%10.0%24.3%16.5% 13.1% MHPCC Dell Xeon Cluster 20.7%14.7%6.7%4.2%8.1%23.3% % NAVO IBM P5+11.7%---9.6%3.0%1.8%7.8%16.4% 8.4% Overall 12.4% Note: Average uncertainties of measured benchmark times on loaded HPCMP systems are approximately 5%.

Solving the hard problems... 11/12/

Solving the hard problems... 11/12/

Solving the hard problems... 11/12/

Solving the hard problems... 11/12/

Solving the hard problems... 11/12/

Solving the hard problems... 11/12/

What’s Next? More focus on Signature Analysis Continue to evolve application benchmarks to represent accurately the HPCMP computational workload Increase profiling and performance modeling to understand application performance better Use performance predictions to supplement application benchmark measurements and to guide vendors in designing more efficient systems