Are their more appropriate domain-specific performance metrics for science and engineering HPC applications available then the canonical “percent of peak”

Slides:

Advertisements

Similar presentations

Copyright 2011, Data Mining Research Laboratory Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining Xintian Yang, Srinivasan.

Advertisements

Priority Research Direction Key challenges General Evaluation of current algorithms Evaluation of use of algorithms in Applications Application of “standard”

Performance Metrics Inspired by P. Kent at BES Workshop Run larger: Length, Spatial extent, #Atoms, Weak scaling Run longer: Time steps, Optimizations,

DEPARTMENT OF COMPUTER LOUISIANA STATE UNIVERSITY Models without Borders Thomas Sterling Arnaud & Edwards Professor, Department of Computer Science.

Welcome to the 10 th OFA Workshop #OFADevWorkshop.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

Productive Performance Tools for Heterogeneous Parallel Computing Allen D. Malony Department of Computer and Information Science University of Oregon Shigeo.

Eigenvalue and eigenvectors  A x = λ x  Quantum mechanics (Schrödinger equation)  Quantum chemistry  Principal component analysis (in data mining)

Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)

Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.

Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.

1 Aug 7, 2004 GPU Req GPU Requirements for Large Scale Scientific Applications “Begin with the end in mind…” Dr. Mark Seager Asst DH for Advanced Technology.

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.

© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March

OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.

© 2008 The MathWorks, Inc. ® ® Parallel Computing with MATLAB ® Silvina Grad-Freilich Manager, Parallel Computing Marketing

Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures Ahmed Sameh and Ananth Grama Computer Science Department Purdue University.

Page 1 Trilinos Software Engineering Technologies and Integration Numerical Algorithm Interoperability and Vertical Integration –Abstract Numerical Algorithms.

Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.

Results Matter. Trust NAG. Numerical Algorithms Group Mathematics and technology for optimized performance Alternative Processors Panel IDC, Tucson, Sept.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Gilad Shainer, VP of Marketing Dec 2013 Interconnect Your Future.

Experience with COSMO MPI/OpenMP hybrid parallelization Matthew Cordery, William Sawyer Swiss National Supercomputing Centre Ulrich Schättler Deutscher.

So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.

GPU in HPC Scott A. Friedman ATS Research Computing Technologies.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.

VORPAL Optimizations for Petascale Systems Paul Mullowney, Peter Messmer, Ben Cowan, Keegan Amyx, Stefan Muszala Tech-X Corporation Boyana Norris Argonne.

Strategies for Solving Large-Scale Optimization Problems Judith Hill Sandia National Laboratories October 23, 2007 Modeling and High-Performance Computing.

Stochastic optimization of energy systems Cosmin Petra Argonne National Laboratory.

Experts in numerical algorithms and High Performance Computing services Challenges of the exponential increase in data Andrew Jones March 2010 SOS14.

Application Scaling. Doug Kothe, ORNL Paul Muzio, CUNY Jonathan Carter, LLBL Bronis de Supinski, LLNL Mike Heroux, Sandia Phil Jones, LANL Brent Leback,

JAVA AND MATRIX COMPUTATION

High Performance Computational Fluid-Thermal Sciences & Engineering Lab GenIDLEST Co-Design Virginia Tech AFOSR-BRI Workshop July 20-21, 2014 Keyur Joshi,

ISCA Panel June 7, 2005 Wen-mei W. Hwu —University of Illinois at Urbana-Champaign 1 Future mass apps reflect a concurrent world u Exciting applications.

International Exascale Software Co-Design Workshop: Initial Findings & Future Plans William Tang Princeton University/Princeton Plasma Physics Lab

Scalable Linear Algebra Capability Area Michael A. Heroux Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation,

Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.

October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)

FP7 Support Action - European Exascale Software Initiative DG Information Society and the unit e-Infrastructures EESI Final Conference Exascale key hardware.

Lawrence Livermore National Laboratory BRdeS-1 Science & Technology Principal Directorate - Computation Directorate How to Stop Worrying and Learn to Love.

High Performance Computing

Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.

Tackling I/O Issues 1 David Race 16 March 2010.

Scientific Computing Goals Past progress Future. Goals Numerical algorithms & computational strategies Solve specific set of problems associated with.

Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.

Applications Scaling Panel Jonathan Carter (LBNL) Mike Heroux (SNL) Phil Jones (LANL) Kalyan Kumaran (ANL) Piyush Mehrotra (NASA Ames) John Michalakes.

CERN VISIONS LEP  web LHC  grid-cloud HL-LHC/FCC  ?? Proposal: von-Neumann  NON-Neumann Table 1: Nick Tredennick’s Paradigm Classification Scheme Early.

Is MPI still part of the solution ? George Bosilca Innovative Computing Laboratory Electrical Engineering and Computer Science Department University of.

Kriging for Estimation of Mineral Resources GISELA/EPIKH School Exequiel Sepúlveda Department of Mining Engineering, University of Chile, Chile ALGES Laboratory,

Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.

Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

Tools and Libraries for Manycore Computing Kathy Yelick U.C. Berkeley and LBNL.

APE group Many-core platforms and HEP experiments computing XVII SuperB Workshop and Kick-off Meeting Elba, May 29-June 1,

Defining the Competencies for Leadership- Class Computing Education and Training Steven I. Gordon and Judith D. Gardiner August 3, 2010.

LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

The Materials Computation Center, University of Illinois Duane Johnson and Richard Martin (PIs), NSF DMR Computer science-based.

R. Rastogi, A. Srivastava , K. Sirasala , H. Chavhan , K. Khonde

Extreme Big Data Examples

for the Offline and Computing groups

So far we have covered … Basic visualization algorithms

Unstructured Grids at Sandia National Labs

Meros: Software for Block Preconditioning the Navier-Stokes Equations

Alternative Processor Panel Results 2008

Hybrid Programming with OpenMP and MPI

PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.

Panel on Research Challenges in Big Data

Presentation transcript:

Are their more appropriate domain-specific performance metrics for science and engineering HPC applications available then the canonical “percent of peak” or “parallel efficiency and scalability? If so, what are they? Are these metrics driving for weak or strong scaling or both? Value of HPL: Marketing Percentage of peak flops: Bad metric. Percentage of peak bandwidth: Better. Practical metric: – 8X performance improvement over previous system. Michael A. Heroux Sandia National Laboratories

Similar to HPC “disruptive technologies” (memory, power, etc.) thought to be needed in many H/W roadmaps over the next decade to reach exascale, are their looming computational challenges (models, algorithms, S/W) whose resolution will be game changing or required over the next decade? Advanced Modeling and Simulation Capabilities: Stability, Uncertainty and Optimization Promise: times increase in parallelism (or more). Pre-requisite: High-fidelity “forward” solve:  Computing families of solutions to similar problems.  Differences in results must be meaningful. SPDEs: Transient Optimization: - Size of a single forward problem Lower Block Bi-diagonal Block Tri-diagonal t0t0 t0t0 tntn tntn

Advanced Capabilities: Derived Requirements Large-scale problem presents collections of related subproblems. Common theme: Increased efficiency. Linear Solvers: – Krylov methods for multiple RHS, related systems. Preconditioners: – Preconditioners for related systems. Data structures/communication: – Substantial graph data reuse.

What is the role of local (node-based) floating point accelerators (e.g., cell, GPUs, etc.) for key science and engineering applications in the next 3-5 years? Is there unexploited or unrealized concurrency in the applications you are familiar with? If so, what and where is it? Single GTX285 ≈ Dual socket quad core. – Sparse, unstructured, implicit – GPU data: device-resident (not realistic). Unexploited parallelism? – Yes, a lot. – 2-level parallel required to extract it all.

Should applications continue with current programming models and paradigms? Fortran, C, C++: Yes. PGAS: – Not needed in my app space. – Might use if MPI-compatible and commercially available. Flat MPI: Yes, for a while. MPI+Manycore: Required for future, useful now. We are already moving in this direction.  Trilinos/Kokkos: Trilinos compute node package.  Generic Node object defines:  Memory structures for parallel buffers  Parallel computation routines (e.g., parallel_for, parallel_reduce )  This is an API, not yet-another-ad-hoc-parallel-programming model. (We don’t need any more!)  Needed because there is no portable user-level node programming model (My #1 need). Kokkos::Node Kokkos::SerialNodeKokkos::CUDANodeKokkos::TBBNode…