Presented by Performance Evaluation and Analysis Consortium (PEAC) End Station Patrick H. Worley Computational Earth Sciences Group Computer Science and.

Slides:



Advertisements
Similar presentations
Μπ A Scalable & Transparent System for Simulating MPI Programs Kalyan S. Perumalla, Ph.D. Senior R&D Manager Oak Ridge National Laboratory Adjunct Professor.
Advertisements

I/O Coordination ER25948 Patricia J Teller and Sarala Arunagiri The University of Texas-El Paso 1.
The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
Unified Parallel C at LBNL/UCB UPC at LBNL/U.C. Berkeley Overview Kathy Yelick U.C. Berkeley, EECS LBNL, Future Technologies Group.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
A Parallel Structured Ecological Model for High End Shared Memory Computers Dali Wang Department of Computer Science, University of Tennessee, Knoxville.
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.
Automating Topology Aware Task Mapping on Large Parallel Machines Abhinav S Bhatele Advisor: Laxmikant V. Kale University of Illinois at Urbana-Champaign.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
Autotuning Large Computational Chemistry Codes PERI Principal Investigators: David H. Bailey (lead)Lawrence Berkeley National Laboratory Jack Dongarra.
The Performance Evaluation Research Center (PERC) Patrick H. Worley, ORNL Participating Institutions: Argonne Natl. Lab.Univ. of California, San Diego.
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
Statistical Performance Analysis for Scientific Applications Presentation at the XSEDE14 Conference Atlanta, GA Fei Xing Haihang You Charng-Da Lu July.
U.S. Department of Energy Office of Science Advanced Scientific Computing Research Program CASC, May 3, ADVANCED SCIENTIFIC COMPUTING RESEARCH An.
1 Scientific Data Management Center DOE Laboratories: ANL: Rob Ross LBNL:Doron Rotem LLNL:Chandrika Kamath ORNL: Nagiza Samatova.
Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.
November 13, 2006 Performance Engineering Research Institute 1 Scientific Discovery through Advanced Computation Performance Engineering.
A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications Boyana Norris Argonne National Laboratory Van Bui, Lois.
Technology + Process SDCI HPC Improvement: High-Productivity Performance Engineering (Tools, Methods, Training) for NSF HPC Applications Rick Kufrin *,
Evaluation of Modern Parallel Vector Architectures Lenny Oliker.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
BG/Q vs BG/P—Applications Perspective from Early Science Program Timothy J. Williams Argonne Leadership Computing Facility 2013 MiraCon Workshop Monday.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Evaluation of Modern Parallel Vector Architectures Leonid Oliker Future Technologies Group Computational Research Division LBNL
Leibniz Supercomputing Centre Garching/Munich Matthias Brehm HPC Group June 16.
Commodity Grid Kits Gregor von Laszewski (ANL), Keith Jackson (LBL) Many state-of-the-art scientific applications, such as climate modeling, astrophysics,
HPCMP Benchmarking Update Cray Henry April 2008 Department of Defense High Performance Computing Modernization Program.
Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.
The Research Alliance in Math and Science program is sponsored by the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department.
1 1  Capabilities: Scalable algebraic solvers for PDEs Freely available and supported research code Usable from C, C++, Fortran 77/90, Python, MATLAB.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://
1 Spallation Neutron Source Data Analysis Jessica Travierso Research Alliance in Math and Science Program Austin Peay State University Mentor: Vickie E.
CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.
Presented by Performance Engineering Research Institute (PERI) Patrick H. Worley Computational Earth Sciences Group Computer Science and Mathematics Division.
Bronis R. de Supinski and Jeffrey S. Vetter Center for Applied Scientific Computing August 15, 2000 Umpire: Making MPI Programs Safe.
Code Motion for MPI Performance Optimization The most common optimization in MPI applications is to post MPI communication earlier so that the communication.
21 Sep UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Xolotl: A New Plasma Facing Component Simulator Scott Forest Hull II Jr. Software Developer Oak Ridge National Laboratory
2/22/2001Greenbook 2001/OASCR1 Greenbook/OASCR Activities Focus on technology to enable SCIENCE to be conducted, i.e. Software tools Software libraries.
The Performance Evaluation Research Center (PERC) Participating Institutions: Argonne Natl. Lab.Univ. of California, San Diego Lawrence Berkeley Natl.
A Multi-platform Co-Array Fortran Compiler for High-Performance Computing Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey {dotsenko, ccristi,
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.
Presented by LCF Climate Science Computational End Station James B. White III (Trey) Scientific Computing National Center for Computational Sciences Oak.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
Benchmarking and Applications. Purpose of Our Benchmarking Effort Reveal compiler (and run-time systems) weak points and lack of adequate automatic optimizations.
Presented by Fault Tolerance Challenges and Solutions Al Geist Network and Cluster Computing Computational Sciences and Mathematics Division Research supported.
HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.
Unified Parallel C at LBNL/UCB UPC at LBNL/U.C. Berkeley Overview Kathy Yelick LBNL and U.C. Berkeley.
Presented by SciDAC-2 Petascale Data Storage Institute Philip C. Roth Computer Science and Mathematics Future Technologies Group.
Regression Testing for CHIMERA Jessica Travierso Austin Peay State University Research Alliance in Math and Science National Center for Computational Sciences,
SciDAC CCSM Consortium: Software Engineering Update Patrick Worley Oak Ridge National Laboratory (On behalf of all the consorts) Software Engineering Working.
Shirley Moore Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress Shirley Moore
Performance Technology for Scalable Parallel Systems
Parallel Programming By J. H. Wang May 2, 2017.
Programming Models for SimMillennium
Parallel Programming in C with MPI and OpenMP
Department of Computer Science, University of Tennessee, Knoxville
Gaurab KCa,b, Zachary Mitchella,c and Sarat Sreepathia
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Presented by Performance Evaluation and Analysis Consortium (PEAC) End Station Patrick H. Worley Computational Earth Sciences Group Computer Science and Mathematics Division

2 Worley_PEAC_SC07 The PEAC End Station provides the performance evaluation and performance tool developer communities access to the Leadership Computing Facility (LCF) systems. Overview Consortium goals System evaluation  Evaluate the performance of LCF systems using standard and custom micro-, kernel, and application benchmarks Performance tools  Port performance tools to LCF systems and make them available to National Center for Computational Sciences (NCCS) users  Further develop the tools to take into account the scale and unique features of LCF systems Performance modeling  Validate the effectiveness of performance modeling methodologies  Modify methodologies as necessary to improve their utility for predicting resource requirements for production runs on LCF systems

3 Worley_PEAC_SC07 All of this must be accomplished while adhering to the “Golden Rules” of the performance community: Overview (continued) Consortium goals (continued) Application analysis and optimization  Analyze performance  Help optimize current and candidate LCF application codes Performance and application community support  Provide access to other performance researchers who are interested in contributing to the performance evaluation of the LCF systems or in porting complementary performance tools of use to the NCCS user community  Provide access to application developers who wish to evaluate the performance of their codes on LCF systems  Low visibility (no production runs!)  Open and fair evaluations  Timely reporting of results

4 Worley_PEAC_SC07 32 active users, 39 active projects: Status as of 8/28/07 Consuming: Contributing to:  13 application performance analysis and optimization  8 system evaluation  8 tool development  6 infrastructure development  4 application modeling  XT4: 1,168,000 processor hours (exceeding 1,000,000 processor-hour allocation)  1 refereed journal paper  1 invited journal paper  6 refereed proceedings papers  10 proceedings papers  2 book chapters  Numerous oral presentations

5 Worley_PEAC_SC07 System evaluation LBNL Memory, interprocess communication, and I/O benchmarks APEX-MAP system characterization benchmark Lattice-Boltzman kernels and mini applications Application benchmarks from Astrophysics (Cactus), Fluid Dynamics (ELBM3D), High Energy Physics (BeamBeam3D, MILC), Fusion (GTC), Materials Science (PARATEC), AMR Gas Dynamics (HyperCLaw) ORNL Computation, memory, interprocess comm., and I/O benchmarks Application benchmarks from Astrophysics (Chimera), Climate (CAM, CLM, FMS, POP), Combustion (S3D), Fusion (AORSA, GTC, GYRO, XGC), Molecular Dynamics (NAMD) SDSC Subsystem probes for system characterization needed for convolution-based performance modeling Purdue Univ. Computation, memory, and interprocess comm. benchmarks Application benchmarks from Chemistry (GAMESS), High Energy Physics (MILC), Seismic Processing (SEISMIC), Weather (WRF)

6 Worley_PEAC_SC07 Performance tools HPCToolkit Tool suite for profile-based performance analysis Modeling assertions Performance model specification and verification framework mpiP MPI profiling infrastructure PAPI Performance data collection infrastructure Scalasca Scalable trace collection and analysis tool SvPablo Performance analysis system TAU Performance analysis system MRNetScalable performance tool infrastructure

7 Worley_PEAC_SC07 Application performance analysis and optimization Chombo AMR gas dynamics model DeCart Nuclear code FACETS Framework application for core-edge transport simulation GADGET Computational cosmology GTC_s Shape plasma version of GTC gyrokinetic turbulence code NEWTRNX Neutron transport code PDNS3D/SBLI Ab initio aeroacoustic simulations of jet and airfoil flows PFLOTRAN Subsurface flow model PNEWT Combustion code

8 Worley_PEAC_SC07 Application code scaling, optimization, and/or performance evaluation POLCOMS Coastal ocean model S3D Combustion model TDCC-9d Nuclear code - Lattice-Boltzman applications

9 Worley_PEAC_SC07 System infrastructure cafc Co-array Fortran compiler for distributed-memory systems GASNet Runtime networking layer for UPC and Titanium compilers PETSc Toolset for numerical solution of PDEs PVFS/Portals PVFS file system implementation on native Portals interface UPC Extension of C designed for high-performance computing on large-scale parallel systems - Reduction-based communication library

10 Worley_PEAC_SC07 Performance modeling PMAC Genetic algorithm-based modeling of memory-bound computations ORNL NAS parallel benchmarks; HYCOM ocean code Texas A&M Univ. GTC fusion code Univ. of Wisconsin Reusable analytic model for wavefront algorithms, applied to NPB-LU, SWEEP3D, and Chimaera LogGP model for MPI communication on the XT4

11 Worley_PEAC_SC07 I/O performance characterization (LBL) Dual vs. single core performance evaluation using APEX-MAP (LBL) Identifying performance anomalies (ANL) MPI performance characterization (ORNL) Subsystem evaluations System4 neighbors8 Neighbors Periodic BG/L BG/L, VN XT XT XT4 SN Ratio of time for all processes sending in halo update to time for a single sender

12 Worley_PEAC_SC07 Performance sensitivities (SDSC) Scalability optimizations (ORNL) Application analyses and benchmarks Porting and optimizing new applications (RENCI/NCSA) Processing of genomes into domain maps: need improved load balancing that takes into account scale-free nature of the graphs.

13 Worley_PEAC_SC07 SvPablo source code-correlated performance analysis (RENCI) SCALASCA trace-based performance analysis (FZ-Jülich, UTenn) Tool development mpiP callsite profiling (LLNL/ORNL)

14 Worley_PEAC_SC07 Co-principal investigators David Bailey Leonid Oliker Lawrence Berkeley National Laboratory William Gropp Argonne National Laboratory Jeffrey Vetter Patrick Worley (PI) Oak Ridge National Laboratory Bronis de Supinski Lawrence Livermore National Laboratory Allan Snavely University of California– San Diego Daniel Reed University of North Carolina Jeffrey Hollingsworth University of Maryland John Mellor- Crummey Rice University Kathy Yelick University of California– Berkeley Jack Dongarra University of Tennessee Barton Miller University of Wisconsin Allen Malony University of Oregon

15 Worley_PEAC_SC07 Contact Patrick H. Worley Computational Earth Sciences Group Computer Science and Mathematics Division (865) Worley_PEAC_0711 Barbara Helland DOE Program Manager Office of Advanced Scientific Computing Research DOE Office of Science