Mitglied der Helmholtz-Gemeinschaft Debugging and Validation Tools on Parallel Systems 2012 |Bernd Mohr Institute for Advanced Simulation (IAS) Jülich.

Slides:



Advertisements
Similar presentations
Matt Wolfe LC Development Environment Group Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA.
Advertisements

Module 20 Troubleshooting Common SQL Server 2008 R2 Administrative Issues.
Dr. Fabrizio Gala Dipartimento di Scienze di Base e Applicate Per l’Ingegneria – Sezione di Fisica Via Scarpa Rome, Italy 1.
Continuously Recording Program Execution for Deterministic Replay Debugging.
The IDE (Integrated Development Environment) provides a DEBUGGER for locating and correcting errors in program logic (logic errors not syntax errors) The.
JavaScript, Fourth Edition
Parallel Debugging Techniques Le Yan Louisiana Optical Network Initiative 8/3/2009Scaling to Petascale Virtual Summer School.
Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.
Parallel Debugging Techniques & Introduction to Totalview Le Yan Louisiana Optical Network Initiative 7/6/2010Scaling to Petascale Virtual Summer School.
Microsoft ® Official Course Monitoring and Troubleshooting Custom SharePoint Solutions SharePoint Practice Microsoft SharePoint 2013.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
Computer Programming and Basic Software Engineering 4. Basic Software Engineering 1 Writing a Good Program 4. Basic Software Engineering.
Success status, page 1 Collaborative learning for security and repair in application communities MIT & Determina AC PI meeting July 10, 2007 Milestones.
CSE 486/586 CSE 486/586 Distributed Systems PA Best Practices Steve Ko Computer Sciences and Engineering University at Buffalo.
Debugging Cluster Programs using symbolic debuggers.
Memory & Storage Architecture Seoul National University GDB commands Hyeon-gyu School of Computer Science and Engineering.
Designing For Testability. Incorporate design features that facilitate testing Include features to: –Support test automation at all levels (unit, integration,
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
1 VeriSoft A Tool for the Automatic Analysis of Concurrent Reactive Software Represents By Miller Ofer.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
Instructor Notes GPU debugging is still immature, but being improved daily. You should definitely check to see the latest options available before giving.
SC 2012 © LLNL / JSC 1 HPCToolkit / Rice University Performance Analysis through callpath sampling  Designed for low overhead  Hot path analysis  Recovery.
Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015.
TotalView Debugging Tool Presentation Josip Jakić
DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.
Debugging in Java. Common Bugs Compilation or syntactical errors are the first that you will encounter and the easiest to debug They are usually the result.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Use of Coverity & Valgrind in Geant4 Gabriele Cosmo.
DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.
Games Development 2 Concurrent Programming CO3301 Week 9.
Profiling, Tracing, Debugging and Monitoring Frameworks Sathish Vadhiyar Courtesy: Dr. Shirley Moore (University of Tennessee)
Module 6: Debugging a Windows CE Image.  Overview Debug Zones IDE Debug Setup IDE Debug Commands Platform Builder Integrated Kernel Debugger Other Debugging.
Debugging and Profiling With some help from Software Carpentry resources.
A Tutorial on Introduction to gdb By Sasanka Madiraju Graduate Assistant Center for Computation and Technology.
A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
CSCI Rational Purify 1 Rational Purify Overview Michel Izygon - Jim Helm.
1 Debugging and Syntax Errors in C++. 2 Debugging – a process of finding and fixing bugs (errors or mistakes) in a computer program.
Debugging Computer Networks Sep. 26, 2007 Seunghwan Hong.
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 11 – gdb and Debugging.
Application Debugging. Debugging methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic.
A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
The single most important skill for a computer programmer is problem solving Problem solving means the ability to formulate problems, think creatively.
CSc 352 Debugging Tools Saumya Debray Dept. of Computer Science The University of Arizona, Tucson
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
Debugging tools in Flash CIS 126. Debugging Flash provides several tools for testing ActionScript in your SWF files. –The Debugger, lets you find errors.
1 How to do Multithreading First step: Sampling and Hotspot hunting Myongji University Sugwon Hong 1.
© 2002 IBM Corporation Confidential | Date | Other Information, if necessary Copyright © 2009 Ericsson, Made available under the Eclipse Public License.
1 Advanced.Net Debugging Using Visual Studio, R# and OzCode IT Week, Summer 2015.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
HP-SEE Debugging with GDB Vladimir Slavnic Research Assistant SCL, Institute of Physics Belgrade The HP-SEE initiative.
What's New in Visual Studio 2010 Debugging Brian Peek Senior Consultant, ASPSOFT, Inc. Microsoft MVP - C#
CSE 486/586 CSE 486/586 Distributed Systems PA Best Practices Steve Ko Computer Sciences and Engineering University at Buffalo.
Debugging using By: Samuel Ashby. What is debugging?  A bug is an error in either a program or the hardware itself.  Debugging is first locating and.
HP-SEE TotalView Debugger Josip Jakić Scientific Computing Laboratory Institute of Physics Belgrade The HP-SEE initiative.
Code improvement: Coverity static analysis Valgrind dynamic analysis GABRIELE COSMO CERN, EP/SFT.
YAHMD - Yet Another Heap Memory Debugger
CSE 374 Programming Concepts & Tools
Designing For Testability
Testing and Debugging.
Testing Key Revision Points.
runtime verification Brief Overview Grigore Rosu
Parallel Debugging Techniques & Introduction to Totalview
CSc 352 Debugging Tools Saumya Debray Dept. of Computer Science
Module 6: Debugging a Windows CE Image
Makefiles, GDB, Valgrind
3.8 static vs dynamic thread management
Presentation transcript:

Mitglied der Helmholtz-Gemeinschaft Debugging and Validation Tools on Parallel Systems 2012 |Bernd Mohr Institute for Advanced Simulation (IAS) Jülich Supercomputing Centre (JSC)

CONTENT  Debugging of Parallel Programs  Totalview (Rogue Wave)  DDT (Allinea)  Marmot  Stat

Avoiding Bugs Careful design and coding is the best way to avoid bugs!  Almost impossible to recover from a bad initial design without starting again from scratch Clear source code structure & comments  More complex code requires more comprehensive comments Straightforward code generally requires no additional comments  Good comments in source code help maintainability  Poor or invalid comments indicate danger Regular testing is the best way to catch bugs early!  Assertions verify expected consistency  A verbose/logging mode can help follow execution trail  Unit tests validate distinct functionality or operations  Bugs hide in the code/cases that aren't tested! III-3

Serial Bugs Also manifest in parallel applications  but often reproducible with single process or thread Compiler (or lint) warnings indicate uncertainties  often symptomatic of unsafe/unportable code Compilers can automatically insert run-time checks  use of uninitialized variables  null-pointer dereferencing  indices out of bounds  floating-point exceptions  … check your compiler manual for details Specific tools for memory/heap errors  including leaks, use after free, corruption  e.g., memcheck/valgrind, Insure++, Purify III-4

Parallel Bugs Multi-threading: race conditions, deadlocks, etc.  e.g., Intel Thread Analysis Tool, Oracle SS Thread Analyzer  additional run-time checks of OpenMP/POSIX lock usage often limited scalability often report false positives (which can be ignored/filtered) sometimes have false negatives (errors missed due to timing) Message-passing: incorrect/inconsistent arguments, datatype matching, resource/buffer usage, deadlocks, etc.  e.g., Intel MPI Checker, Marmot, Umpire, MUST  additional run-time checks (local & global) often limited scalability extra moderator processes can change execution behaviour sometimes can miss potential deadlocks (due to timing)  identifies unsafe/non-compliant MPI usage (portability bugs) III-5

Parallel Debuggers Multiple instances of serial debuggers: e.g., dbx, gdb, idb  manually attach to processes of interest in separate windows  type examination/control commands in each window Lightweight parallel debuggers: e.g., Guard, STAT  produce condensed aggregated reports of where MPI processes have failed and their state at that point  allows considerable scalability and low overhead Full-featured parallel debuggers: e.g., DDT, TotalView  provide complete control of parallel executions individual processes/threads or (dynamic) groups thereof  comprehensive examination of state at breakpoints Individually or collectively  can attach to specific processes  recently have demonstrated significant scalability III-6

Parallel Debugging Complicated by multiple processes & threads  which need to be managed and monitored as they execute  and which may execute differently each time due to inherent non-determinism making it difficult to reproduce consistently To make debugging easier, try to reproduce the buggy execution with as few threads and processes as possible  serial executions are (more) deterministic  debugging takes time and consumes resources debug runs will be slower than otherwise single-stepping line-by-line can be very slow deadlocks (and livelocks) will never terminate! III-7

Making Debugging Easier Debugging without symbols is hard!  Compile & link with “-g” to include symbolic information good compilers won't disable optimization, though they may not be able to produce complete symbolic infomation Debugging optimized code is even harder!  Optimized code may bear little resemblance to the source instructions will be added/removed/substituted/rearranged  Use the lowest optimization level which reproduces the bug Sometimes the compiler/optimizer itself is buggy! Debugging compilers or MPI libraries is no fun at all!  Try reproducing bug with a different compiler or MPI including older/newer versions  Just because a bug isn't reproducible with another compiler doesn't guarantee that the bug is not in your source code! III-8

CONTENT  Debugging of Parallel Programs  Totalview (Rogue Wave)  DDT (Allinea)  Marmot  Stat

Parallel Debugger UNIX Symbolic Debugger for C, C++, f77, f90, PGI HPF, assembler programs „Standard” debugger Special, non-traditional features  Multi-process and multi-threaded  C++ support (templates, inheritance, inline functions)  F90 support (user types, pointers, modules)  1D + 2D Array Data visualization  Support for parallel debugging (MPI: automatic attach, message queues, OpenMP, pthreads)  Scripting and batch debugging  Memory Debugging  Reverse Debugging with ReplayEngine III-10

TotalView: Startup 1. Select Toolbar "Parallel" 2. Select MPI for your system 3. Select desired number of tasks II-11

TotalView: Main Window Toolbar for common options Local variables for selected stack frame Source code window Break points Stack trace III-12

Totalview: Non-standard Features Message queue graph Data visualization Call graph II-13

Totalview: Batch Debugging ****************************************** * TotalView Debugger Script Log File * * Date: _17:11:33 * Target:./hm * Actionpoint/Action Directives: * 10 => print myrank ****************************************** Running target hm !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! Print ! ! Process:./hm (Debugger Process ID: 1) ! Thread: Debugger ID: 1.1 ! Rank: 0 ! Time Stamp: :11:35 ! Triggered from event: actionpoint ! Results: ! myrank = 0x (0) ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Same output for other 3 ranks... % mpicc -g hello-mpi.c -o hm In interactive session or batch script: % tvscript -mpi –np 4 -create_actionpoint 'main#10=>print myrank'./hm % less hm-.slog III-14

DDT Parallel Debugger UNIX Graphical Debugger for C, C++, f77, f90 programs Modern, easy-to-use debugger Special, non-traditional features  Multi-process and multi-threaded  1D + 2D Array Data visualization  Support for MPI parallel debugging (automatic attach, message queues)  Support for OpenMP (Version 2.x and later)  Job submission from within debugger III-15

DDT: JUROPA Startup Select desired number of tasks Check: MPI otherwise "Change" III-16

DDT: JUROPA MPI Setup Under "System" choose MPI Under "Job Submission" enter specifics of your batch system III-17

DDT: Main Window Process controls Process groups Source code Variables Expression evaluator Stack trace III-18

DDT: Non-standard Features Multi-Dimensional Array Viewer Memory Usage Message queue graph III-19

Marmot MPI correctness and portability checker Marmot reports  Errors: violations of the MPI-standard  Warnings: unusual behavior or possible problems  Notes: harmless but remarkable behavior  Also: deadlock detection Usage  Compile with marmotcc, marmotcxx, marmotf90  Run your application with one additional process  See report as plain text file, HTML, or as cube report III-20

Marmot HTML Output Example III-21

STAT: Aggregating Stack Traces for Debugging Existing debuggers don’t scale  Inherent limits in the approaches  Need for new, scalable methodologies Need to pre-analyze and reduce data  Fast tools to gather state  Help select nodes to run conventional debuggers on  Scalable tool: STAT Stack Trace Analysis Tool  Goal: Identify equivalence classes  Hierarchical and distributed aggregation of stack traces from all tasks  Stack trace merge <1s from 200K+ cores (Project by LLNL, UW, UNM) III-22

Distinguishing Behavior with Stack Traces II-23

Appl … … 3D-Trace Space/Time Analysis II-24

Scalable Representation 288 Nodes / 10 Snapshots II-25