Download presentation
Presentation is loading. Please wait.
Published byAdela Gilmore Modified over 9 years ago
1
Mitglied der Helmholtz-Gemeinschaft Debugging and Validation Tools on Parallel Systems 2012 |Bernd Mohr Institute for Advanced Simulation (IAS) Jülich Supercomputing Centre (JSC)
2
CONTENT Debugging of Parallel Programs Totalview (Rogue Wave) DDT (Allinea) Marmot Stat
3
Avoiding Bugs Careful design and coding is the best way to avoid bugs! Almost impossible to recover from a bad initial design without starting again from scratch Clear source code structure & comments More complex code requires more comprehensive comments Straightforward code generally requires no additional comments Good comments in source code help maintainability Poor or invalid comments indicate danger Regular testing is the best way to catch bugs early! Assertions verify expected consistency A verbose/logging mode can help follow execution trail Unit tests validate distinct functionality or operations Bugs hide in the code/cases that aren't tested! III-3
4
Serial Bugs Also manifest in parallel applications but often reproducible with single process or thread Compiler (or lint) warnings indicate uncertainties often symptomatic of unsafe/unportable code Compilers can automatically insert run-time checks use of uninitialized variables null-pointer dereferencing indices out of bounds floating-point exceptions … check your compiler manual for details Specific tools for memory/heap errors including leaks, use after free, corruption e.g., memcheck/valgrind, Insure++, Purify III-4
5
Parallel Bugs Multi-threading: race conditions, deadlocks, etc. e.g., Intel Thread Analysis Tool, Oracle SS Thread Analyzer additional run-time checks of OpenMP/POSIX lock usage often limited scalability often report false positives (which can be ignored/filtered) sometimes have false negatives (errors missed due to timing) Message-passing: incorrect/inconsistent arguments, datatype matching, resource/buffer usage, deadlocks, etc. e.g., Intel MPI Checker, Marmot, Umpire, MUST additional run-time checks (local & global) often limited scalability extra moderator processes can change execution behaviour sometimes can miss potential deadlocks (due to timing) identifies unsafe/non-compliant MPI usage (portability bugs) III-5
6
Parallel Debuggers Multiple instances of serial debuggers: e.g., dbx, gdb, idb manually attach to processes of interest in separate windows type examination/control commands in each window Lightweight parallel debuggers: e.g., Guard, STAT produce condensed aggregated reports of where MPI processes have failed and their state at that point allows considerable scalability and low overhead Full-featured parallel debuggers: e.g., DDT, TotalView provide complete control of parallel executions individual processes/threads or (dynamic) groups thereof comprehensive examination of state at breakpoints Individually or collectively can attach to specific processes recently have demonstrated significant scalability III-6
7
Parallel Debugging Complicated by multiple processes & threads which need to be managed and monitored as they execute and which may execute differently each time due to inherent non-determinism making it difficult to reproduce consistently To make debugging easier, try to reproduce the buggy execution with as few threads and processes as possible serial executions are (more) deterministic debugging takes time and consumes resources debug runs will be slower than otherwise single-stepping line-by-line can be very slow deadlocks (and livelocks) will never terminate! III-7
8
Making Debugging Easier Debugging without symbols is hard! Compile & link with “-g” to include symbolic information good compilers won't disable optimization, though they may not be able to produce complete symbolic infomation Debugging optimized code is even harder! Optimized code may bear little resemblance to the source instructions will be added/removed/substituted/rearranged Use the lowest optimization level which reproduces the bug Sometimes the compiler/optimizer itself is buggy! Debugging compilers or MPI libraries is no fun at all! Try reproducing bug with a different compiler or MPI including older/newer versions Just because a bug isn't reproducible with another compiler doesn't guarantee that the bug is not in your source code! III-8
9
CONTENT Debugging of Parallel Programs Totalview (Rogue Wave) DDT (Allinea) Marmot Stat
10
Parallel Debugger UNIX Symbolic Debugger for C, C++, f77, f90, PGI HPF, assembler programs „Standard” debugger Special, non-traditional features Multi-process and multi-threaded C++ support (templates, inheritance, inline functions) F90 support (user types, pointers, modules) 1D + 2D Array Data visualization Support for parallel debugging (MPI: automatic attach, message queues, OpenMP, pthreads) Scripting and batch debugging Memory Debugging Reverse Debugging with ReplayEngine http://www.totalviewtech.com III-10
11
TotalView: Startup 1. Select Toolbar "Parallel" 2. Select MPI for your system 3. Select desired number of tasks II-11
12
TotalView: Main Window Toolbar for common options Local variables for selected stack frame Source code window Break points Stack trace III-12
13
Totalview: Non-standard Features Message queue graph Data visualization Call graph II-13
14
Totalview: Batch Debugging ****************************************** * TotalView Debugger Script Log File * * Date: 11-26-2009_17:11:33 * Target:./hm * Actionpoint/Action Directives: * 10 => print myrank ****************************************** Running target hm !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! Print ! ! Process:./hm (Debugger Process ID: 1) ! Thread: Debugger ID: 1.1 ! Rank: 0 ! Time Stamp: 11-26-2009 17:11:35 ! Triggered from event: actionpoint ! Results: ! myrank = 0x00000000 (0) ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Same output for other 3 ranks... % mpicc -g hello-mpi.c -o hm In interactive session or batch script: % tvscript -mpi –np 4 -create_actionpoint 'main#10=>print myrank'./hm % less hm-.slog III-14
15
DDT Parallel Debugger UNIX Graphical Debugger for C, C++, f77, f90 programs Modern, easy-to-use debugger Special, non-traditional features Multi-process and multi-threaded 1D + 2D Array Data visualization Support for MPI parallel debugging (automatic attach, message queues) Support for OpenMP (Version 2.x and later) Job submission from within debugger http://www.allinea.com III-15
16
DDT: JUROPA Startup Select desired number of tasks Check: MPI otherwise "Change" III-16
17
DDT: JUROPA MPI Setup Under "System" choose MPI Under "Job Submission" enter specifics of your batch system III-17
18
DDT: Main Window Process controls Process groups Source code Variables Expression evaluator Stack trace III-18
19
DDT: Non-standard Features Multi-Dimensional Array Viewer Memory Usage Message queue graph III-19
20
Marmot MPI correctness and portability checker http://www.hlrs.de/organization/av/amt/projects/marmot/ Marmot reports Errors: violations of the MPI-standard Warnings: unusual behavior or possible problems Notes: harmless but remarkable behavior Also: deadlock detection Usage Compile with marmotcc, marmotcxx, marmotf90 Run your application with one additional process See report as plain text file, HTML, or as cube report III-20
21
Marmot HTML Output Example III-21
22
STAT: Aggregating Stack Traces for Debugging Existing debuggers don’t scale Inherent limits in the approaches Need for new, scalable methodologies Need to pre-analyze and reduce data Fast tools to gather state Help select nodes to run conventional debuggers on Scalable tool: STAT Stack Trace Analysis Tool Goal: Identify equivalence classes Hierarchical and distributed aggregation of stack traces from all tasks Stack trace merge <1s from 200K+ cores (Project by LLNL, UW, UNM) III-22
23
Distinguishing Behavior with Stack Traces II-23
24
Appl … … 3D-Trace Space/Time Analysis II-24
25
Scalable Representation 288 Nodes / 10 Snapshots II-25
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.