Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mitglied der Helmholtz-Gemeinschaft Debugging and Validation Tools on Parallel Systems 2012 |Bernd Mohr Institute for Advanced Simulation (IAS) Jülich.

Similar presentations


Presentation on theme: "Mitglied der Helmholtz-Gemeinschaft Debugging and Validation Tools on Parallel Systems 2012 |Bernd Mohr Institute for Advanced Simulation (IAS) Jülich."— Presentation transcript:

1 Mitglied der Helmholtz-Gemeinschaft Debugging and Validation Tools on Parallel Systems 2012 |Bernd Mohr Institute for Advanced Simulation (IAS) Jülich Supercomputing Centre (JSC)

2 CONTENT  Debugging of Parallel Programs  Totalview (Rogue Wave)  DDT (Allinea)  Marmot  Stat

3 Avoiding Bugs Careful design and coding is the best way to avoid bugs!  Almost impossible to recover from a bad initial design without starting again from scratch Clear source code structure & comments  More complex code requires more comprehensive comments Straightforward code generally requires no additional comments  Good comments in source code help maintainability  Poor or invalid comments indicate danger Regular testing is the best way to catch bugs early!  Assertions verify expected consistency  A verbose/logging mode can help follow execution trail  Unit tests validate distinct functionality or operations  Bugs hide in the code/cases that aren't tested! III-3

4 Serial Bugs Also manifest in parallel applications  but often reproducible with single process or thread Compiler (or lint) warnings indicate uncertainties  often symptomatic of unsafe/unportable code Compilers can automatically insert run-time checks  use of uninitialized variables  null-pointer dereferencing  indices out of bounds  floating-point exceptions  … check your compiler manual for details Specific tools for memory/heap errors  including leaks, use after free, corruption  e.g., memcheck/valgrind, Insure++, Purify III-4

5 Parallel Bugs Multi-threading: race conditions, deadlocks, etc.  e.g., Intel Thread Analysis Tool, Oracle SS Thread Analyzer  additional run-time checks of OpenMP/POSIX lock usage often limited scalability often report false positives (which can be ignored/filtered) sometimes have false negatives (errors missed due to timing) Message-passing: incorrect/inconsistent arguments, datatype matching, resource/buffer usage, deadlocks, etc.  e.g., Intel MPI Checker, Marmot, Umpire, MUST  additional run-time checks (local & global) often limited scalability extra moderator processes can change execution behaviour sometimes can miss potential deadlocks (due to timing)  identifies unsafe/non-compliant MPI usage (portability bugs) III-5

6 Parallel Debuggers Multiple instances of serial debuggers: e.g., dbx, gdb, idb  manually attach to processes of interest in separate windows  type examination/control commands in each window Lightweight parallel debuggers: e.g., Guard, STAT  produce condensed aggregated reports of where MPI processes have failed and their state at that point  allows considerable scalability and low overhead Full-featured parallel debuggers: e.g., DDT, TotalView  provide complete control of parallel executions individual processes/threads or (dynamic) groups thereof  comprehensive examination of state at breakpoints Individually or collectively  can attach to specific processes  recently have demonstrated significant scalability III-6

7 Parallel Debugging Complicated by multiple processes & threads  which need to be managed and monitored as they execute  and which may execute differently each time due to inherent non-determinism making it difficult to reproduce consistently To make debugging easier, try to reproduce the buggy execution with as few threads and processes as possible  serial executions are (more) deterministic  debugging takes time and consumes resources debug runs will be slower than otherwise single-stepping line-by-line can be very slow deadlocks (and livelocks) will never terminate! III-7

8 Making Debugging Easier Debugging without symbols is hard!  Compile & link with “-g” to include symbolic information good compilers won't disable optimization, though they may not be able to produce complete symbolic infomation Debugging optimized code is even harder!  Optimized code may bear little resemblance to the source instructions will be added/removed/substituted/rearranged  Use the lowest optimization level which reproduces the bug Sometimes the compiler/optimizer itself is buggy! Debugging compilers or MPI libraries is no fun at all!  Try reproducing bug with a different compiler or MPI including older/newer versions  Just because a bug isn't reproducible with another compiler doesn't guarantee that the bug is not in your source code! III-8

9 CONTENT  Debugging of Parallel Programs  Totalview (Rogue Wave)  DDT (Allinea)  Marmot  Stat

10 Parallel Debugger UNIX Symbolic Debugger for C, C++, f77, f90, PGI HPF, assembler programs „Standard” debugger Special, non-traditional features  Multi-process and multi-threaded  C++ support (templates, inheritance, inline functions)  F90 support (user types, pointers, modules)  1D + 2D Array Data visualization  Support for parallel debugging (MPI: automatic attach, message queues, OpenMP, pthreads)  Scripting and batch debugging  Memory Debugging  Reverse Debugging with ReplayEngine http://www.totalviewtech.com III-10

11 TotalView: Startup 1. Select Toolbar "Parallel" 2. Select MPI for your system 3. Select desired number of tasks II-11

12 TotalView: Main Window Toolbar for common options Local variables for selected stack frame Source code window Break points Stack trace III-12

13 Totalview: Non-standard Features Message queue graph Data visualization Call graph II-13

14 Totalview: Batch Debugging ****************************************** * TotalView Debugger Script Log File * * Date: 11-26-2009_17:11:33 * Target:./hm * Actionpoint/Action Directives: * 10 => print myrank ****************************************** Running target hm !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! Print ! ! Process:./hm (Debugger Process ID: 1) ! Thread: Debugger ID: 1.1 ! Rank: 0 ! Time Stamp: 11-26-2009 17:11:35 ! Triggered from event: actionpoint ! Results: ! myrank = 0x00000000 (0) ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Same output for other 3 ranks... % mpicc -g hello-mpi.c -o hm In interactive session or batch script: % tvscript -mpi –np 4 -create_actionpoint 'main#10=>print myrank'./hm % less hm-.slog III-14

15 DDT Parallel Debugger UNIX Graphical Debugger for C, C++, f77, f90 programs Modern, easy-to-use debugger Special, non-traditional features  Multi-process and multi-threaded  1D + 2D Array Data visualization  Support for MPI parallel debugging (automatic attach, message queues)  Support for OpenMP (Version 2.x and later)  Job submission from within debugger http://www.allinea.com III-15

16 DDT: JUROPA Startup Select desired number of tasks Check: MPI otherwise "Change" III-16

17 DDT: JUROPA MPI Setup Under "System" choose MPI Under "Job Submission" enter specifics of your batch system III-17

18 DDT: Main Window Process controls Process groups Source code Variables Expression evaluator Stack trace III-18

19 DDT: Non-standard Features Multi-Dimensional Array Viewer Memory Usage Message queue graph III-19

20 Marmot MPI correctness and portability checker http://www.hlrs.de/organization/av/amt/projects/marmot/ Marmot reports  Errors: violations of the MPI-standard  Warnings: unusual behavior or possible problems  Notes: harmless but remarkable behavior  Also: deadlock detection Usage  Compile with marmotcc, marmotcxx, marmotf90  Run your application with one additional process  See report as plain text file, HTML, or as cube report III-20

21 Marmot HTML Output Example III-21

22 STAT: Aggregating Stack Traces for Debugging Existing debuggers don’t scale  Inherent limits in the approaches  Need for new, scalable methodologies Need to pre-analyze and reduce data  Fast tools to gather state  Help select nodes to run conventional debuggers on  Scalable tool: STAT Stack Trace Analysis Tool  Goal: Identify equivalence classes  Hierarchical and distributed aggregation of stack traces from all tasks  Stack trace merge <1s from 200K+ cores (Project by LLNL, UW, UNM) III-22

23 Distinguishing Behavior with Stack Traces II-23

24 Appl … … 3D-Trace Space/Time Analysis II-24

25 Scalable Representation 288 Nodes / 10 Snapshots II-25


Download ppt "Mitglied der Helmholtz-Gemeinschaft Debugging and Validation Tools on Parallel Systems 2012 |Bernd Mohr Institute for Advanced Simulation (IAS) Jülich."

Similar presentations


Ads by Google