Presented by Performance Engineering Research Institute (PERI) Patrick H. Worley Computational Earth Sciences Group Computer Science and Mathematics Division.

Slides:



Advertisements
Similar presentations
The Role of Environmental Monitoring in the Green Economy Strategy K Nathan Hill March 2010.
Advertisements

What do we currently mean by Computational Science? Traditionally focuses on the “hard sciences” and engineering –Physics, Chemistry, Mechanics, Aerospace,
SUPER Bob Lucas University of Southern California Sept. 23, 2011 SciDAC-3 Institute for Sustained Performance, Energy, and Resilience
U.S. Department of Energy’s Office of Science Basic Energy Sciences Advisory Committee Dr. Daniel A. Hitchcock October 21, 2003
Last Lecture The Future of Parallel Programming and Getting to Exascale 1.
Priority Research Direction: Portable de facto standard software frameworks Key challenges Establish forums for multi-institutional discussions. Define.
Presented by Suzy Tichenor Director, Industrial Partnerships Program Computing and Computational Sciences Directorate Oak Ridge National Laboratory DOE.
Scientific Grand Challenges Workshop Series: Challenges in Climate Change Science and the Role of Computing at the Extreme Scale Warren M. Washington National.
Implementation methodology for Emerging Reconfigurable Systems With minimum optimization an appreciable speedup of 3x is achievable for this program with.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Global Climate Modeling Research John Drake Computational Climate Dynamics Group Computer.
May 25, 2010 Mary Hall May 25, 2010 Advancing the Compiler Community’s Research Agenda with Archiving and Repeatability * This work has been partially.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Evaluating the Tera MTA Allan Snavely, Wayne Pfeiffer et.
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Chapter : Software Process
An Automated Component-Based Performance Experiment and Modeling Environment Van Bui, Boyana Norris, Lois Curfman McInnes, and Li Li Argonne National Laboratory,
Autotuning Large Computational Chemistry Codes PERI Principal Investigators: David H. Bailey (lead)Lawrence Berkeley National Laboratory Jack Dongarra.
1 CASC ROSE Compiler Infrastructure Source-to-Source Analysis and Optimization Dan Quinlan Rich Vuduc, Qing Yi, Markus Schordan Center for Applied Scientific.
The Performance Evaluation Research Center (PERC) Patrick H. Worley, ORNL Participating Institutions: Argonne Natl. Lab.Univ. of California, San Diego.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
Computer Science in UNEDF George Fann, Oak Ridge National Laboratory Rusty Lusk, Argonne National Laboratory Jorge Moré, Argonne National Laboratory Esmond.
U.S. Department of Energy Office of Science Advanced Scientific Computing Research Program CASC, May 3, ADVANCED SCIENTIFIC COMPUTING RESEARCH An.
1 Scientific Data Management Center DOE Laboratories: ANL: Rob Ross LBNL:Doron Rotem LLNL:Chandrika Kamath ORNL: Nagiza Samatova.
Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.
Cluster Reliability Project ISIS Vanderbilt University.
Computerised Air Traffic Management Tools - Benefits and Limitations OMAR BASHIR (March 2005)
U.S. Department of Energy Office of Science Advanced Scientific Computing Research Program NERSC Users Group Meeting Department of Energy Update September.
November 13, 2006 Performance Engineering Research Institute 1 Scientific Discovery through Advanced Computation Performance Engineering.
Physics Steven Gottlieb, NCSA/Indiana University Lattice QCD: focus on one area I understand well. A central aim of calculations using lattice QCD is to.
Presented by High Productivity Language Systems: Next-Generation Petascale Programming Aniruddha G. Shet, Wael R. Elwasif, David E. Bernholdt, and Robert.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
October 21, 2015 XSEDE Technology Insertion Service Identifying and Evaluating the Next Generation of Cyberinfrastructure Software for Science Tim Cockerill.
Center for Computational Sciences O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Vision for OSC Computing and Computational Sciences
Building an Electron Cloud Simulation using Bocca, Synergia2, TxPhysics and Tau Performance Tools Phase I Doe SBIR Stefan Muszala, PI DOE Grant No DE-FG02-08ER85152.
Combinatorial Scientific Computing and Petascale Simulation (CSCAPES) A SciDAC Institute Funded by DOE’s Office of Science Investigators Alex Pothen, Florin.
Presented by Performance Evaluation and Analysis Consortium (PEAC) End Station Patrick H. Worley Computational Earth Sciences Group Computer Science and.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
HPCMP Benchmarking Update Cray Henry April 2008 Department of Defense High Performance Computing Modernization Program.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.
Land Ice Verification and Validation (LIVV) Kit Weak scaling behavior for a large dome- shaped test case. It shows that the scaling behavior of a new run.
University of Maryland Towards Automated Tuning of Parallel Programs Jeffrey K. Hollingsworth Department of Computer Science University.
Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:
Computational Science & Engineering meeting national needs Steven F. Ashby SIAG-CSE Chair March 24, 2003.
Bob Lucas University of Southern California Sept. 23, 2011 Transforming Geant4 for the Future Bob Lucas and Rob Roser USC and FNAL May 8, 2012.
Lawrence Livermore National Laboratory S&T Principal Directorate - Computation Directorate Tools and Scalable Application Preparation Project Computation.
11 CCA Common Component Architecture CCA Forum Fall 2008 Meeting21-22 October 2008 Upcoming SciDAC Reviews David E. Bernholdt TASCS Lead PI.
1 OFFICE OF ADVANCED SCIENTIFIC COMPUTING RESEARCH The NERSC Center --From A DOE Program Manager’s Perspective-- A Presentation to the NERSC Users Group.
Code Motion for MPI Performance Optimization The most common optimization in MPI applications is to post MPI communication earlier so that the communication.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
ComPASS Summary, Budgets & Discussion Panagiotis Spentzouris, Fermilab ComPASS PI.
Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:
The Performance Evaluation Research Center (PERC) Participating Institutions: Argonne Natl. Lab.Univ. of California, San Diego Lawrence Berkeley Natl.
Adaptive Integrated Framework (AIF): a new methodology for managing impacts of multiple stressors in coastal ecosystems A bit more on AIF, project components.
Computing Systems: Next Call for Proposals Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Lawrence Livermore National Laboratory 1 Science & Technology Principal Directorate - Computation Directorate Scalable Fault Tolerance for Petascale Systems.
Presented by LCF Climate Science Computational End Station James B. White III (Trey) Scientific Computing National Center for Computational Sciences Oak.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
CSCAPES Mission Research and development Provide load balancing and parallelization toolkits for petascale computation Develop advanced automatic differentiation.
Steven L. Lee DOE Program Manager, Office of Science Advanced Scientific Computing Research Scientific Discovery through Advanced Computing Program: SciDAC-3.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
TeraGrid’s Process for Meeting User Needs. Jay Boisseau, Texas Advanced Computing Center Dennis Gannon, Indiana University Ralph Roskies, University of.
Presented by SciDAC-2 Petascale Data Storage Institute Philip C. Roth Computer Science and Mathematics Future Technologies Group.
Performance Technology for Scalable Parallel Systems
Presentation transcript:

Presented by Performance Engineering Research Institute (PERI) Patrick H. Worley Computational Earth Sciences Group Computer Science and Mathematics Division

2 Worley_PERI_SC07 Petascale computing is about delivering performance to scientists Maximizing performance is getting harder: PERI addresses this challenge in three ways:  Systems are more complicated  O (100 K) processors  Multi-core with SIMD extensions  Scientific software is more complicated:  Multi-disciplinary and multi-scale  Model and predict application performance  Assist SciDAC scientific code projects with performance analysis and tuning  Investigate novel strategies for automatic performance tuning Cray XT4 at ORNL BeamBeam3D accelerator modeling POP model of El Nino IBM BlueGene at LLNL Performance engineering: Enabling petascale science

3 Worley_PERI_SC07 SciDAC-1 Performance Evaluation Research Center (PERC): 2001–2006  In the last two years, added emphasis on optimizing performance of SciDAC applications, including  Community Climate System Model  Plasma Microturbulence Project (GYRO, GS2)  Omega3P accelerator model  Develop performance- related tools and methodologies for  Benchmarking  Analysis  Modeling  Optimization Second phase:Initial goals:

4 Worley_PERI_SC07  Performance portability is critical:  Codes outlive computing systems.  Scientists can’t publish that they ported and optimized code.  Most computational scientists are not interested in performance tools:  They want performance experts to work with them.  Such experts are not “scalable,” i.e., they are a limited resource and introduce yet another bottleneck in optimizing code. Some lessons learned

5 Worley_PERI_SC07 SciDAC-2 Performance Engineering Research Institute (PERI) Providing near-term impact on the performance optimization of SciDAC applications Providing guidance in automatic tuning Long-term research goal to improve performance portability Evaluating architectures and algorithms in preparation for move to petascale Relieving the performance optimization burden from scientific programmers Informing long-term automated tuning efforts

6 Worley_PERI_SC07 Engaging SciDAC software developers Application engagement Application liaisons Tiger teams  Work directly with DOE computational scientists  Ensure successful performance porting of scientific software  Focus PERI research on real problems  Build long-term personal relationships between PERI researchers and scientific code teams  Focus on DOE’s highest priorities  SciDAC-2  INCITE  JOULE Optimizing arithmetic kernels Processors Simulation years per day Load bal., MPI/OpenMP, improved dyn., land, and physics Load bal., MPI/OpenMP, improved dyn., and land V2.0, load bal., MPI/OpenMP V2.0, load balanced, MPI-only V2.0, original (2001) settings IBM p690 cluster, T42L26 benchmark Maximizing scientific throughput 1 Community Atmosphere Model Performance Evolution

7 Worley_PERI_SC07 FY 2007 application engagement activities Application survey Application liaisons Tiger teams  Collect and maintain data on SciDAC-2 and DOE INCITE code characteristics and performance requirements  Use data to determine efficient allocation of PERI engagement resources and provide direction for PERI research  Provide DOE with data on SciDAC-2 code portfolio  Active engagement (identifying and addressing significant performance issues) with five SciDAC-2 and one INCITE projects, drawn from accelerator, fusion, materials, groundwater, and nanoscience  Passive engagement (tracking performance needs and providing advice as requested) with an additional eight SciDAC-2 projects  Working with S3D (combustion) and GTC (fusion) code teams to achieve 2007 JOULE report computer performance goals  Tiger Team members drawn from across PERI collaboration, currently involving six of the ten PERI institutions

8 Worley_PERI_SC07 Performance modeling Modeling is critical for automation of tuning: Recent improvements:  Guidance to the developer:  New algorithms, systems, etc.  Need to know where to focus effort:  Where are the bottlenecks?  Need to know when we are done:  How fast should we expect to go?  Predictions for new systems. Modeling efforts contribute to procurements and other activities beyond PERI automatic tuning.  Reduced human/system cost.  Genetic Algorithms now “learn” application response to system parameters.  Application tracing sped up and storage requirements reduced by three orders of magnitude.  Greater accuracy  S3D (combustion), AVUS (CFD), Hycom (ocean), and Overflow (CFD) codes modeled within 10% average error.

9 Worley_PERI_SC07  Long-term goals for PERI  Obtain hand-tuned performance from automatically generated code for scientific applications  General loop nests  Key application kernels  Reduce the performance portability challenge facing computational scientists  Adapt quickly to new architectures  Integrate compiler-based and empirical search tools into a framework accessible to application developers  Runtime adaptation of performance-critical parameters Automatic performance tuning of scientific code

10 Worley_PERI_SC07 Automatic tuning flowchart 1: TriageWhere to focus effort 2: Semantic analysis Traditional compiler analysis 3: TransformationCode restructuring 4: Code generation Domain- specific code 5: Code selectionModeling and empirical search 6: AssemblyChoose the best components 7: Training runsPerformance data for feedback 8: Runtime adaptation Optimize long- running jobs Persistent database Source codeTriageAnalysis Transformations Code generationCode selection Application assembly Training runs Production execution Runtime adaptation Guidance  Measurements  Models  Hardware information  Sample input  Annotations  Assertions Runtime performance data Domain-specific code generation External software

11 Worley_PERI_SC07 Model-guided empirical optimization analysis/models transformation modules application code architecture specification code variant generation phase 1 set of parameterized code variants + constraints on unbound parameters optimized code + representative input data set search engine performance monitoring support execution environment phase 2 optimized code

12 Worley_PERI_SC07 The team David Bailey Daniel Gunter Katherine Yelick Lawrence Berkeley National Laboratory Paul Hovland Dinesh Kaushik Boyana Norris Argonne National Laboratory Sadaf Alam G. Mahinthakumar Philip Roth Jeffrey Vetter Patrick Worley Oak Ridge National Laboratory Bronis de Supinski Daniel Quinlan Lawrence Livermore National Laboratory Allan Snavely Laura Carrington University of California - San Diego John Mellor- Crummey Rice University Jacqueline Chame Mary Hall Robert Lucas (P.I.) University of Southern California Rob Fowler Daniel Reed Ying Zhang University of North Carolina Jack Dongarra Shirley Moore Daniel Terpstra University of Tennessee Jeffrey Hollingsworth University of Maryland

13 Worley_PERI_SC07 Contacts Patrick H. Worley Computational Earth Sciences Group Computer Science and Mathematics Division (865) Fred Johnson DOE Program Manager Office of Advanced Scientific Computing Research DOE Office of Science