RPROM , 2002 Lassi A. Tuura, Northeastern University Using Valgrind Lassi A. Tuura Northeastern University, Boston
September 2, 2002 Lassi A. Tuura, Northeastern University 2 Valgrind Introduction v Valgrind is a tool for checking memory access errors r Use of uninitialised memory r Reading or writing memory after it has been freed r Reading or writing off the end of a dynamically allocated block r Reading or writing inappropriate areas on the stack r Memory leaks (pointer to dynamically allocated memory lost forever) r Passing uninitialised and/or unaddressable memory to system calls r Mismatched use of malloc/new/new[] vs. free/delete/delete[] r Some misuses of POSIX pthreads v Easy to use: just run, no instrumentation required $ valgrind ls -laF v Powerful and accurate r Good at trapping errors, by design r Low false alarm rate: I have yet to see one r Gives precise error location, down to source line
September 2, 2002 Lassi A. Tuura, Northeastern University 3 Valgrind Introduction... v Relatively modest processing requirements r Runs 20-50x slower—noticeable but not earth-shattering slowdown – Graphical applications like IGUANA are sluggish but remain usable – Unlike other tools which can take days to run instrumented code! r Uses 2.5x+ memory – 9 bits for every byte of memory to track validity and addressability – Plus other overhead (tracking memory block owners, code cache, …) r In other words – There is little reason not use it—use it early, use it often – OK on a properly equipped development machine (e.g. lxcmsc1, lxcmsc2) v Additional goodies r Finds memory leaks, with exact allocation traces r Performance measurement using cachegrind
September 2, 2002 Lassi A. Tuura, Northeastern University 4 How Does It Work? v Valgrind is not a toy! It works on huge systems such as OpenOffice, KDE, Mozilla, ORCA, OSCAR v Tracks only two kinds of basic memory access errors r Is the address valid? – Is it outside currently allocated memory, e.g. memory already freed? – Catches uses of illegitimate memory addresses r Is the value in memory initialised? – Tracks the validity of every memory bit – Catches uses of uninitialised values – Flags attempts to compute values that would depend on uninitialised values (copying data is ok, only attempts to compute values are bad) v Enough to discover all sorts of memory-management nasties!
September 2, 2002 Lassi A. Tuura, Northeastern University 5 How Does It Work? v Valgrind runs your program on a synthetic CPU r It virtualises the real CPU and runs your program on the synthetic one r An attempt to execute code causes it to be instrumented – Dynamic just-in-time instrumentation for small sections of the code at a time (cached, sort of like just-in-time compilation in Java) – Intercepts and checks every memory access the program makes – Intercepts and checks memory regions passed to system calls v Special mode for cache profiling r Does no error checking, but… r Instruments for the cache architecture – Keeps track of the D1, I1 and L2 caches – Determines memory references that miss the caches Produces a cachegrind.out which can be used to annotate source code
September 2, 2002 Lassi A. Tuura, Northeastern University 6 Getting Started v Valgrind is part of CERN CMS environment Installed for RH 6.x and 7.x Linux in /afs/cern.ch/cms/system r Automatically in your $PATH if you are in CMS group (zh) v Just prefix the normal command with “valgrind” cd /afs/cern.ch/cms/Releases/OSCAR/OSCAR_1_3_2/src eval `scram runtime -csh` valgrind iguana v Use a host equipped for development r Needs a linux box with enough memory and CPU power – Most new boxes qualify, 1+ GHz and 512+ MB is “enough” r CMS provides lxcmsc1, lxcmsc2 (RH 6.x) and lxcmsd1 (RH 7.x) r CERN provides lxplus (RH 6.x), lxplus7 (RH 7.x)
September 2, 2002 Lassi A. Tuura, Northeastern University 7 Valgrind Output v Valgrind output looks like this ==3949== valgrind-1.0.1, a memory error detector for x86 GNU/Linux. ==3949== Copyright (C) , and GNU GPL’d, by Julian Seward. ==3949== Estimated CPU clock rate is 601 MHz ==3949== For more details, rerun with: -v Program output follows v Each error produces similarly prefixed output, with r The type of the error (e.g. “Invalid write of size 4”) r A stack trace indicating where the error occurred r Information about the memory location – E.g. a stack trace of where it was allocated, or who last free’d it, etc. v At the end, you’ll see: ==3949== ERROR SUMMARY: 67 errors from 10 contexts (suppressed: 30 from 2) ==3949== malloc/free: in use at exit: bytes in 3858 blocks. ==3949== malloc/free: allocs, frees, bytes allocated. ==3949== For a detailed leak analysis, rerun with: --leak-check=yes ==3949== For counts of detected errors, rerun with: -v
September 2, 2002 Lassi A. Tuura, Northeastern University 8 Error Output v Example ==3949== Invalid write of size 4 ==3949== at 0x40460BEA: IgApplication::run(int, char **) (…/IgApplication.cc:148) ==3949== by 0x80496FA: main (…/Iguana.cpp:19) ==3949== by […] ==3949== Address 0x410BC1D4 is 0 bytes after a block of size 4 alloc’d ==3949== at 0x : __builtin_vec_new (…/vg_clientfuncs.c:156) ==3949== by 0x404608A6: IgApplication::run(int, char **) (…/IgApplication.cc:103) ==3949== by 0x80496FA: main (…/Iguana.cpp:19) ==3949== by […] v Which means: r The error type is “Invalid write” (not a valid address) “of size 4” (four bytes, thus either a pointer, int or a float) The location of the error is in IgApplication.cc, line 148 (in method IgApplication::run(int, char **) ); this is followed by a stack trace, by default four levels but you can change it with a command-line option r The memory accessed was four bytes at 0x410BC1D4, which was 0 bytes after a block of 4 bytes that was allocated by IgApplication.cc, line 103 (= off- by-one error at the end of the memory block)
September 2, 2002 Lassi A. Tuura, Northeastern University 9 Valgrind Errors #1 v Invalid read/write of size X r The memory address is not valid – It is not part of program image, stack or dynamically allocated memory – Usually either an access past the end of an existing block The address detail says X bytes after a block of size Y, plus stack trace of where that block of memory was allocated – … or an access to already freed memory The address detail tells where the memory was freed – … or something completely bogus (beyond current stack top, …) Returning pointers to locals out of scope or using other garbage r The “of size X” tells how many bytes the machine instruction was trying to access; it may help to deduce what type of data was accessed – Size 1 usually implies char (or byte) data – Size 2 usually implies short – Size 8 usually implies double – Size 4 could be an int, float or a pointer
September 2, 2002 Lassi A. Tuura, Northeastern University 10 Valgrind Errors #2 v Conditional jump or move depends on uninitialised value(s) r An attempt to use a value that derives from uninitialised memory r Valgrind permits program to copy junk (= uninitialised data) to its heart’s content; only using uninitialised values in computations cause errors – Locating the origin of uninitialised value may be non-trivial – However: most causes are relatively easy to determine Uninitialised member variables Uninitialised local variables Contents of dynamically allocated memory not cleared before use
September 2, 2002 Lassi A. Tuura, Northeastern University 11 Valgrind Errors #3 v Illegal free r An attempt to free memory that isn’t dynamically allocated r Valgrind keeps track of every dynamically allocated block – It knows exactly which arguments to free, delete are valid r Typical reasons include – Attempting to double delete objects – Attempting to delete objects not on the heap (= statically or stack-allocated objects) – Attempting to free an object that wasn’t allocated (e.g. an interior pointer to another object, or a totally garbage pointer value)
September 2, 2002 Lassi A. Tuura, Northeastern University 12 Valgrind Errors #4 v Mismatched free() / delete / delete [] r An attempt to free memory in a way incompatible with its allocation – Memory allocated with malloc() must be freed with free() – Memory allocated with new must be freed with delete – Memory allocated with new[] must be freed with delete[] r Valgrind knows for each memory block how it was allocated and produces an error when you use an incompatible deallocation method – In general, if you encounter new[]/delete mismatches, the correct fix is to remove manual allocation completely and to use std::vector or some other appropriate container!
September 2, 2002 Lassi A. Tuura, Northeastern University 13 Valgrind Errors #5 v Syscall param write(buf) contains uninitialised or unaddressable byte(s) r Sort of like previous errors, but the program is attempting to pass something invalid or uninitialised to a system call instead of doing it yourself r Valgrind checks memory parameters at system call boundaries and checks them like if the program was trying to do the accesses itself
September 2, 2002 Lassi A. Tuura, Northeastern University 14 Valgrind Errors #6 v Valgrind suppresses errors after a certain limit r Valgrind reports only unique errors. After certain number of errors is reached, it becomes more conservative and begins to suppress errors. After even more errors, it suppresses error reporting completely. r Both limits are controlled by command line options; if you want to keep on looking for errors, use the options to allow valgrind to continue to look for errors (if the errors are in code you can’t modify, try using suppression files to squelch errors from those modules) r If you see More than 300 errors detected. I’m not reporting any more. Final error counts may be inaccurate. Go fix your program! …do as it says!
September 2, 2002 Lassi A. Tuura, Northeastern University 15Tips v Build the code you analyse with debugging information r Not required, but produces more accurate information for stack traces – With GCC, debugging information does not affect code generation! r Rebuild the modules in your work area with scram build CXXUSERFLAGS=“-g” r Optionally: build without optimisation to get even more accurate stack traces (avoids having to wade through 15 levels of inlining) – Remove the “CXXFLAGS += -O2” from config/compilers.mk v Test early and often r Make valgrind part of your standard development and debugging routine r Suggestions to subsystem coordinators – Run valgrind regularly, especially before submitting tags for releases – Don’t submit new code to releases if valgrind reports errors Don’t use ORCA as garbage generator, “dd if=/dev/urandom of=blah.ntpl” is faster and incurs much less general frustration
September 2, 2002 Lassi A. Tuura, Northeastern University 16Limitations v The tool makes very few errors! r Understood errors we can do nothing about (e.g. system libraries) or potential fake alarms can be filtered out with suppression mechanism v Most important issue: valgrind is a dynamic checking tool r The checks made depend on the code covered r Thus: the more tests, the better – Provide as many tests as possible, from large programs down to unit tests – First write some tests, then make the ones you have better to cover more – You can use OVAL tests with valgrind and vice versa – Make sure test programs cover pathological cases, not just the easy paths through the code that everybody checks for anyway r Add more tests! The pay off hugely, the more the longer we run!
September 2, 2002 Lassi A. Tuura, Northeastern University 17Links v Valgrind home page v Julian Seward’s presentation on valgrind v Valgrind at CERN CMS Installed in /afs/cern.ch/cms/system – bin/valgrind, cachegrind : front-end executables – bin/vg_annotate : tool for presenting cachegrind results – lib/valgrind : valgrind’s own files v What’s this error I don’t understand? r Read the user guide, it is good! r Send a mail to CCS developers will answer r Report bugs through cmc/BugsRS/bug_reporting_system.cgihttp://cmsdoc.cern.ch/cgi- cmc/BugsRS/bug_reporting_system.cgi