Presentation is loading. Please wait.

Presentation is loading. Please wait.

Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Similar presentations


Presentation on theme: "Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel."— Presentation transcript:

1 Debugging Kate Hedstrom August 2006

2 Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel bug story Demo

3 Before Programming Think about the structure of the program you are writing What are the data structures? Careful planning can lead to programs that are: –Easier to debug –Easier to understand later when modifications prove necessary There’s a whole industry around tools for program design

4 Various Problems Failure to compile –First compiler message is valid, rest could be due to confusion caused by first error Failure to link –Missing routines –Missing libraries Failure to run Runs but gives the wrong answer

5 Some Common Mistakes Number and type of arguments Misspelled variables Uninitialized variables Failure to match up do/if and the end do/if Index out of range Array size too small Parallel bugs

6 Defensive programming Let the compiler help you find problems Implicit none (Fortran) Use modules or interface blocks to let the compiler check the argument count/type for you (Fortran 90) Check error codes on function calls Write useful comments!

7 Messages Assert (C/C++) #include assert(g == 9.8); Fortran example: GET_2DFLD - unable to find requested variable: In input file: /wrkdir/kate/…. ERROR: Abnormal termination: NetCDF INPUT REASON: No error

8 if (.not. Got_var) then write (stdout,10) trim(Vname(1,ifield)), trim(ncfile) exit_flag = 2 return end if status = nf_open(trim(ncfile), nf_nowrite, ncid) if (status.ne. nf_noerr) then write (stdout, 20) trim(ncfile) exit_flag = 3 ioerror = status return end if 10 format(/, ‘GET_2DFLD - unable to find …) 20 format(/, ‘GET_2DFLD - unable to open NetCDF file:‘ a)

9 C example: If (init_graph(graph) == OKAY) { while ((count < MAX_EDGES) && !ferror(source) && !feof(source)) { if (fgets(line, MAX_LEN, source) != NULL){ linenum++; if (sscanf(line, “%d %d: %d\n” …) ==3) { : } else { fprintf(stderr, “%s[%s()] Error: sscanf couldn’t parse line #%d\n”, progname, proc, linenum); fprintf(stderr, “line = \”%s\”\n”, line); return(-2); }

10 Modular Programming and Testing Write programs in components or modules Test them individually There are “test harnesses” for creating and managing tests –Many gnu programs can be tested with “make check” after the “make”

11 Other tips Check cpp labels: –ifdefs –ifnames Bounds checking (-C) Floating point trap (-qflttrap=enable:invalid:imprecise on IBM) Try another compiler - and write portable code There is no shame in using print statements

12 Core files Contain a binary dump of your program as it crashed Can extract a stack trace from it If you recompile -g, might have enough info in the core to solve your problem Check your limits - you might be truncating your cores

13 Causes of Core Files Not enough memory Segmentation violation –Not enough stack space –Wrong number of function arguments Floating point error if not using IEEE standard I/O error

14 More on Core Files Running “file” on it will tell you the executable name: % file core Core: AIX core file fulldump 64-bit, ncra I prefer dbx to totalview on core files: % dbx ncra core (dbx) where abort() at 0x9000… nco_exit(??), line 28 in nco_ctl.c main(argc = 47, argv = 0x0fff….), line548 in ncra.c

15 Interactive Debuggers Totalview: –Is on both Cray and IBM –Has a gui –Works for parallel programs –Is worth learning –Isn’t my favorite debugger Text based debuggers: –dbx, gdb, etc –Some have had gui wrappers (xxgdb, for instance)

16 Debugger Uses Finding bugs Help to understand the code –Watch variables change –Watch the flow control –Perl debugger helped me learn Perl

17 Debugger Features Set breakpoints Execute: –run/go –step –next View variables Works for each process/thread Debug the serial version first!

18 Tips for Totalview on IBM Use -qfullpath as well as -g compiler option When your program reads from standard input, invoke as: totalview roms < roms.in Doesn’t work right with -q64 or - qflttrap on IBM

19 dbx/gdb Commands help - list of commands where - stack trace, call trace print - give the value of an expression break/stop in/stop at - set a breakpoint run - start execution until first breakpoint cont - continue to next breakpoint step - step into function next - execute next command list - list source code for next ten lines quit - how to get out

20 Debugger Caveats Debuggers have bugs too Developers code in C/C++, don’t focus on Fortran If you don’t know where the problem is, you can spend an awful lot of time in the debugger

21 Miscellaneous Any parallel program should be compared to the serial version Did you overflow your quota? Can the processor see the filesystem? Try recompiling after “make clean” Are you solving the equations you think you’re solving?

22 Compiler Bugs More common than you might think Again, try other compilers Try turning off optimization I once had a situation where adding a print statement made the problem go away Auto-parallel compilers are especially buggy

23 Parallel Bug Story It’s always a good idea to compare the serial and parallel runs I can plot the difference field between the two outputs I can create a differences file with ncdiff (part of NCO)

24 Differences after a Day

25 Differences after one step - in a part of the domain without ice

26 What’s up? A variable was not being initialized properly - “if” statement without an “else” Both serial and parallel values are random junk Fixing this did not fix the one-day plot

27 Differences after a few steps - guess where the tile boundaries are

28 What was That? The ocean code does a check for water colder than the local freezing point It then forms ice and tells the ice model about the new ice It adjusts the local temperature and salinity to account for the ice growth (warmer and saltier) It failed to then update the salinity and temperature ghost points

29 More… Plotting the differences in surface temperature after one step failed to show this The change was very small and the single precision plotting code couldn’t catch it Differences did show up in timestep two of the ice variables Running ncdiff on the first step, then asking for the min/max values in temperature showed a problem

30 Debugging I didn’t then know how to use totalview in parallel (fixed!) I don’t have good luck with totalview and 64-bit code Enclosing print statements inside if statements prevents each process from printing, possibly trying to print out-of- range values Find i,j value of the worst point from the diff file, print just that point - many fields

31 Last Word In my field, it is the problems that blow up right away that are the easiest to fix. You can see things go bad in the debugger, perhaps in the very first timestep. The problems that blow up after days and days of cpu time are more challenging and might require a complete rewrite of the model.


Download ppt "Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel."

Similar presentations


Ads by Google