Breakout Group: Debugging David E. Skinner and Wolfgang E. Nagel IESP Workshop 3, October, Tsukuba, Japan.

Slides:



Advertisements
Similar presentations
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Advertisements

State of Indiana Business One Stop (BOS) Program Roadmap Updated June 6, 2013 RFI ATTACHMENT D.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Exascale Runtime Systems Summit Plan and Outcomes Sonia R. Sachs 04/09/2014 AGU, Washington D.C.
Priority Research Direction Key challenges General Evaluation of current algorithms Evaluation of use of algorithms in Applications Application of “standard”
System Software Environments Breakout Report June 27, 2002.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Priority Research Direction: Portable de facto standard software frameworks Key challenges Establish forums for multi-institutional discussions. Define.
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
Modeling and Simulation for Architecture: Breakout Arun R. Krste A. Dave M. Doe H.Y. Derek C. Lizy J. Kevin L. Dean K. Karen B. Shekhar B. Mike P. Chris.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
OCIN Workshop Wrapup Bill Dally. Thanks To Funding –NSF - Timothy Pinkston, Federica Darema, Mike Foster –UC Discovery Program Organization –Jane Klickman,
University of Kansas Construction & Integration of Distributed Systems Jerry James Oct. 30, 2000.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
DIVES Alur, Lee, Kumar, Pappas: University of Pennsylvania  Charon: high-level modeling language and a design environment reflecting the current state.
IACT 901 Module 9 Establishing Technology Strategy - Scope & Purpose.
Power is Leading Design Constraint Direct Impacts of Power Management – IDC: Server 2% of US energy consumption and growing exponentially HPC cluster market.
What are the functions of an operating system? The operating system is the core software component of your computer. It performs many functions and is,
Efficient Hardware dependant Software (HdS) Generation using SW Development Platforms Frédéric ROUSSEAU CASTNESS‘07 Computer Architectures and Software.
Priority Research Direction Key challenges Fault oblivious, Error tolerant software Hybrid and hierarchical based algorithms (eg linear algebra split across.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
7th Workshop on Fusion Data Processing Validation and Analysis Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications J.
Beyond Automatic Performance Analysis Prof. Dr. Michael Gerndt Technische Univeristät München
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Priority Research Direction (use one slide for each) Key challenges -Fault understanding (RAS), modeling, prediction -Fault isolation/confinement + local.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
Architecting Web Services Unit – II – PART - III.
SAMANVITHA RAMAYANAM 18 TH FEBRUARY 2010 CPE 691 LAYERED APPLICATION.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Workshop on the Future of Scientific Workflows Break Out #2: Workflow System Design Moderators Chris Carothers (RPI), Doug Thain (ND)
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
30 October Agenda for Today Introduction and purpose of the course Introduction and purpose of the course Organization of a computer system Organization.
Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Software Working Group Chairman’s Note: This document was prepared by the “software and applications” working group and was received by the entire workshop.
ABone Architecture and Operation ABCd — ABone Control Daemon Server for remote EE management On-demand EE initiation and termination Automatic EE restart.
Chapter 6: Using The Windows Performance and Reliability Monitor.
Lawrence Livermore National Laboratory S&T Principal Directorate - Computation Directorate Tools and Scalable Application Preparation Project Computation.
CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.
FP7 Support Action - European Exascale Software Initiative DG Information Society and the unit e-Infrastructures EESI Final Conference Exascale key hardware.
System-Directed Resilience for Exascale Platforms LDRD Proposal Ron Oldfield (PI)1423 Ron Brightwell1423 Jim Laros1422 Kevin Pedretti1423 Rolf.
Programmability Hiroshi Nakashima Thomas Sterling.
Interconnection network network interface and a case study.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
© 2000 Morgan Kaufman Overheads for Computers as Components Host/target design  Use a host system to prepare software for target system: target system.
Computing Systems: Next Call for Proposals Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
ATLAS Database Access Library Local Area LCG3D Meeting Fermilab, Batavia, USA October 21, 2004 Alexandre Vaniachine (ANL)
Tackling I/O Issues 1 David Race 16 March 2010.
Priority Research Direction (use one slide for each) Key challenges What will you do to address the challenges?Brief overview of the barriers and gaps.
Profiling/Tracing Method and Tool Evaluation Strategy Summary Slides Hung-Hsun Su UPC Group, HCS lab 1/25/2005.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
Connect A 3 Contact persons: Sandro D'Elia Anne-Marie Sassen Horizon 2020: LEIT – ICT WP
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
VisIt Project Overview
Architecting Web Services
Architecting Web Services
Structural Simulation Toolkit / Gem5 Integration
SDM workshop Strawman report History and Progress and Goal.
Power is Leading Design Constraint
Priority Research Direction (use one slide for each)
Hadoop Technopoints.
Coordinator: DKRZ Weather Climate HPC
Exam 1 review CS 360 Lecture 20.
Mark McKelvin EE249 Embedded System Design December 03, 2002
Priority Research Direction (use one slide for each)
Presentation transcript:

Breakout Group: Debugging David E. Skinner and Wolfgang E. Nagel IESP Workshop 3, October, Tsukuba, Japan

Exascale Debugging Debugging: finding problems in the execution of code. Identifying and dealing with sources of: –incorrectness (application and architecture) –application failure (deadlock, hang, segfault) –critical application bottlenecks (standstill, performance cliff) Exascale issues –Concurrency expense of debugging –Scalability of debugger methodologies (data and interfaces) –Concurrency scaling of the frequency of errors/failures –Heterogeneity and lightweight OS 2009/05/21

Exascale Trends relevant to debugging To which broad exascale trends is debugging related? Concurrency ✓ Reliability ✓ Power Costs Heterogeneity in a node ✓ I/O and memory: ratios and breakthroughs 2009/05/21

What’s different about exascale debugging? Assumption that many things may/will go wrong at the same time will require triage, filtering, and clustering of faults and problems Focus on multi-level debugging, communicating details of faults between software layers Synthesis of fault information into understanding in the context of application and architecture Simulation of concurrency when possible Excision of buggy code snippets to run at lower concurrencies 2009/05/21

Debugging 2009/05/21 Priority Research Direction (use one slide for each) Key challenges Vertical integration of debug and performance information across software layers Layered contexts of debugging (just MPI, just I/O, or framework/application defined ) Scalable clustering of application process states and contexts. Filter/search within debugger Automatically triggered debugging Basic challenge of concurrency (hard & $$) Interoperability with compiler, library, runtime, OS and I/O Debugging without stopping (resilient analysis of victim processes) More eyes on debug information besides the person running the debugger Multi-layered debug histories become available/useful to system-wide monitoring Debugging meets performance analysis Debugging informs system software Lowering overhead and barriers to debugging at large scale Debuggers begin to communicate user level metrics, debugging becomes more meaningful Greater certainty in scientific validity of exascale’s computational results. Trust. Summary of research direction Potential impact on software component Potential impact on usability, capability, and breadth of community

UR Graph 2009/05/21 Roadmap for exascale debugging Planning & Workshops 1e5 cores 1e6 cores Breakthroughs needed for 1e6 core production debug Near-production exascale Scale of debugging

4.x.Debugging narrative 2009/05/21 Technology drivers Alternative R&D strategies Recommended research agenda Crosscutting considerations

Roadmap sections on debugging tools Technology drivers for Debugging Alternative R&D strategies for Debugging Recommended research agenda Debugging + Identify cross-cutting consideration and connections (compilers, resiliency and performance) + Identify key regional interests, expertise, and resources 2009/05/21

State of the art Debuggers scale to 10K procs Vendors are developing solutions for new debugging contexts (memory, communication, etc.) Some progress in clustering and data aggregation 2009/05/21