A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007.

Slides:



Advertisements
Similar presentations
Introduction to Grid Application On-Boarding Nick Werstiuk
Advertisements

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER TotalView on the T3E and IBM SP Systems NERSC User Services June 12, 2000.
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
TOTALVIEW Majdi Baddourah June 4, 2002 NERSC. Objective How to use totalview MPI codes OpenMp Codes.
PulseHR Time and Attendance software development and coding web development, web hosting IT project management and consulting Str. Ghioceilor.
Acquiring Information Systems and Applications
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
The IDE (Integrated Development Environment) provides a DEBUGGER for locating and correcting errors in program logic (logic errors not syntax errors) The.
1 Lab Session-2 CSIT 121 Spring 2005 Debugging Tips NiMo’s Varying Rates Lab-2 Exercise.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
CIT 590 Debugging. Find a laptop Please take a moment to open up your laptop and open up Eclipse. If you do not have a laptop, please find a friend. If.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
1 MPI-2 and Threads. 2 What are Threads? l Executing program (process) is defined by »Address space »Program Counter l Threads are multiple program counters.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Acquiring Information Systems and Applications
HPCC Mid-Morning Break Interactive High Performance Computing Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Computer Programming and Basic Software Engineering 4. Basic Software Engineering 1 Writing a Good Program 4. Basic Software Engineering.
Ashita Srivastava ISM High Five Corporations Chain of fast food restaurants Using Windows XP for clients and Windows Server 2008 Needs a robust.
Selecting and Implementing An Embedded Database System Presented by Jeff Webb March 2005 Article written by Michael Olson IEEE Software, 2000.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
Parallelization: Area Under a Curve. AUC: An important task in science Neuroscience – Endocrine levels in the body over time Economics – Discounting:
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 3, 2011outline.1 ITCS 6010/8010 Topics in Computer Science: GPU Programming for High Performance.
Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015.
TotalView Debugging Tool Presentation Josip Jakić
DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.
CS 390 Unix Programming Summer Unix Programming - CS 3902 Course Details Online Information Please check.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
CHAPTER 13 Acquiring Information Systems and Applications.
1 Serial Run-time Error Detection and the Fortran Standard Glenn Luecke Professor of Mathematics, and Director, High Performance Computing Group Iowa State.
Profiling, Tracing, Debugging and Monitoring Frameworks Sathish Vadhiyar Courtesy: Dr. Shirley Moore (University of Tennessee)
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Debugging and Profiling With some help from Software Carpentry resources.
Software Overview Environment, libraries, debuggers, programming tools and applications Jonathan Carter NUG Training 3 Oct 2005.
Debugging Xin Tong. GDB GNU Project debugger Allows you to see what is going on `inside' another program while it executes or crashed. (Faster than printing.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Experts in numerical algorithms and HPC services Compiler Requirements and Directions Rob Meyer September 10, 2009.
Debuggers in Python. The Debugger Every programming IDE has a tool called a debugger. This application does NOT locate or fix your bugs for you! It slows.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 11 – gdb and Debugging.
Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS.
1First BlueJ Day, Houston, Texas, 1st March 2006 Debugging in BlueJ Davin McCall.
Debugging 1/6/2016. Debugging 1/6/2016 Debugging  Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a program.
Single Node Optimization Computational Astrophysics.
Mitglied der Helmholtz-Gemeinschaft Debugging and Validation Tools on Parallel Systems 2012 |Bernd Mohr Institute for Advanced Simulation (IAS) Jülich.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
COMPUTER PROGRAMMING I SUMMER Understand Different Types of Programming Errors.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
Debugging using By: Samuel Ashby. What is debugging?  A bug is an error in either a program or the hardware itself.  Debugging is first locating and.
Debuggers. Errors in Computer Code Errors in computer programs are commonly known as bugs. Three types of errors in computer programs –Syntax errors –Runtime.
HP-SEE TotalView Debugger Josip Jakić Scientific Computing Laboratory Institute of Physics Belgrade The HP-SEE initiative.
POE Parallel Operating Environment. Cliff Montgomery.
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
Unconventional applications of Intel® Xeon Phi™ Processor (KNL)
CRESCO Project: Salvatore Raia
NGS computation services: APIs and Parallel Jobs
Performance Analysis, Tools and Optimization
CSCI/CMPE 3334 Systems Programming
Cray Announces Cray Inc.
Intel® Parallel Studio and Advisor
Objective Understand the concepts of modern operating systems by investigating the most popular operating system in the current and future market Provide.
Stack Trace Analysis for Large Scale Debugging using MRNet
Chapter 4: Threads & Concurrency
Presentation transcript:

A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007

NERSC User Group Meeting, September 17, Outline Parallel debugger usage at NERSC Comparison of Totalview and Allinea DDT Selecting a parallel debugger for NERSC: Allinea DDT –Functionality –License model and Price Current Status –Acceptance Testing –User availability

NERSC User Group Meeting, September 17, Since parallel debuggers are valuable, yet expensive tools for HPC centers, survey actual debugger usage at NERSC on Seaborg and Bassi to see if resources can be better optimized. Motivation

NERSC User Group Meeting, September 17, Totalview Usage on Seaborg and Bassi Number of times users have run Totalview on Seaborg in the past year Number of times 27 Users ran Totalview fewer than 5 times Number of times users have run Totalview on Bassi in the past 18 months Number of times 23 Users ran Totalview between 10 and 25 times

NERSC User Group Meeting, September 17, Totalview usage Very roughly ~15-20 % of active users have run Totalview Functionality requested is basic –Find cause for crashes and code hangs –Examine variables across processors –Users typically aren’t using Totalview for analysis Users are running at lower concurrencies than we expected –Many users debug codes locally and run in production mode at NERSC –In many codes an error at 512 processors can be detected at 32 processors. –Totalview runs interactively and users must wait a longer time for more nodes –Debuggers can run slowly at processors Rarely were all licenses checked out

NERSC User Group Meeting, September 17, Another Debugger in the Market: Allinea Software’s DDT DDT (Distributed Debugging Tool) –Some HPC Customers Lawrence Livermore National Lab (LLNL) Texas Advanced Computing Center (TACC) Barcelona Supercomputing Center (BSC) Leibniz Computing Center (LRZ) HPC Center Stuttgart (HLRS) CEA, IPGP, ONERA - France CINECA, CASPUR - Italy AWE, RAL - UK Spring 2007 tested DDT on NERSC platforms –Low learning curve for Totalview users –Basic debugging functionality worked as expected –Found some bugs, all on AIX –Responsive developers –Viable alternative to Totalview Created an RFP to get best response from vendors

NERSC User Group Meeting, September 17, Weighing the Debuggers... Established company and technology with large market share Totalview debugger ported to most platforms and tested on many codes Full featured parallel debugger with advanced features such as debugging with multiple executables, GAS languages, sophisticated analysis tools Inflexible license server model Expensive Totalview Younger company, established market in Europe but smaller American presence Basic Parallel Debugging functionality Linux strongest supported operating system. (Increasing support for AIX) Responsive developers Flexible license model Lower price Allinea DDT

NERSC User Group Meeting, September 17, DDT Licensing Model and Price Flexible model –1024 processors –Can be divided any way One 1024 processor job Two 512 processor jobs One 512, one 256, four 64 processor jobs Significantly cheaper than Totalview

NERSC User Group Meeting, September 17, DDT Functionality Parallel Debugger –Support for MPI, OpenMP, pthreads –Fortran, C, C++ Typical serial debugging features –set breakpoints and watches, step through program, dive into arrays, evaluate expressions, analyze core files Parallel debugging features Step through processors View variables across processors Grouping processors Parallel Stack View Other Features –Memory Debugging –Visualization Tools

NERSC User Group Meeting, September 17, User Interface

NERSC User Group Meeting, September 17, Parallel Stack View Allows user to see position of each processor in the code in the same window Essentially groups processors by location in code -- only reasonable strategy at high concurrencies Easily can find stray processor Can create sub-groups of processors

NERSC User Group Meeting, September 17, Current Status Acceptance Testing DDT on Franklin –Running 5-6 codes with DDT at various concurrencies –Testing MPI, OpenMP, Fortran, C, C++, mixed- mode applications Demo on Thursday Available for users to try Please let us know if you have any problems Excited to have DDT on Franklin and think it is good for the HPC community to have options in parallel debugging

NERSC User Group Meeting, September 17, Questions?