Tools and Techniques for Higher Reliability Software FOSDEM 2013 – Ada Developer Room Philippe Waroquiers Eurocontrol/DNM 3 February 2013
FOSDEM Eurocontrol European Organisation for the Safety of Air Navigation International organisation, 39 member states Multiple activities/directorates/… Participates/supports big European projects Central Route Charge Office Directorate Network Management …. More info:
FOSDEM Directorate Network Management Air Traffic Management “Network Management” = services of general interest for the European Aviation European air route design Flight plan processing Flow management Scarce resources management Radio frequencies SSR codes Crisis management (remember 2010 volcano ash crisis) …
FOSDEM Flight plan processing, Flow management, … Flight Plan processing over the whole of Europe (IFPS) Aircraft Operators send flight plans to IFPS Flight plans are verified, corrected if needed, redistributed to airspace control centres, airports, Aircraft Operators Flow Management (ETFMS) : Balancing demand and capacity Safety : avoid Air Traffic Control overload Efficiency : best use of ATC capacity, minimise delays ENV: “airspace data management system” Data Ware house User interfaces for external users Web Portal …
FOSDEM D Trajectory, alternate routes
FOSDEM Vertical Trajectory
FOSDEM Differences radar plots <> Plan
FOSDEM Recomputed with radar plots
FOSDEM Macroscopic view of Europe
FOSDEM ETFMS & IFPS Sophisticated systems around 2 million SLOC of Ada Reliability requirements If IFPS down: no flight plan processing in Europe ! If ETFMS down: passengers will sleep in aerodromes ! Duplicated hardware, duplicated sites, contingency systems, … Performances requirements ETFMS handles 3 millions messages per day Sometimes implies complex processing (e.g. recompute a flight route) Safety requirements Various obligations about people, procedures and systems E.g. Software Assurance Level (SWAL) Safety audits
FOSDEM Better no critical bugs in critical systems … Use of uninitialised data Memory leaks Dangling pointers Buffer overflows Race conditions Performance issues Memory use problems …
FOSDEM But how to avoid/find/eliminate such bugs ? People (qualifications, training, …) Procedures (code review, coding standards, …) Testing (unit testing, integration testing, user acceptance tests, shadow operations, security audits, …) But also TOOLS The Ada language is a main asset to avoid many such bugs Thanks to early detection at compilation time Thanks to run-time checks showing bugs during early testing Valgrind is a main asset to find and eliminate remaining bugs
FOSDEM What is Valgrind ? Valgrind = framework to build runtime analysis tools + a set of tools Framework = about 400 KSLOC Tools : between 3 KSLOC to 22 KSLOC Tools: Memcheck Callgrind Helgrind Drd Massif Exp-sgcheck …
FOSDEM Use of uninitialised data (1) Memcheck --undef-value-errors=yes (default value) Will report an error if an undefined value use will change the behaviour Ada language : pragma Normalize_Scalars All non-explicitly assigned scalars are automatically given a (invalid if possible) value Run-time checks will detect the use of a invalid value GNAT pragma Initialize_Scalars More flexible version of Normalize_Scalars Initial scalar value can be controlled Flexibility about which/when run-time checks are done
FOSDEM Use of uninitialised data (2) Memcheck detects a bug even if there is no invalid value Initialize_Scalars Detects a bug only if there is an invalid value in the range of the type Otherwise, runs with different initial values can expose use of unitialised data Initialize_Scalars is faster than memcheck -O0 + all checks on + Initialize_Scalars only 2x slower than -O2 + standard Ada Reference Manual checks (these checks detect the most horrible/random behaviour) At Eurocontrol: Day to day development done with Initialize_Scalars Some “shadow operational” testing period with Initialize_Scalars Week-end builds validated with memcheck
FOSDEM Memory leaks Avoid by using Ada constructs Often, some Ada constructs allow to avoid using heap E.g. record discriminants, OO types without heap, arrays, … Otherwise, manage heap a “safe” way : Controlled types, storage pools Not always possible (CPU, memory) Detect with gcc/gnat debug pools (GNAT.Debug_Pools) “pre-processing” + recompile Detect with memcheck --leak-check=full
FOSDEM Dangling pointers Avoid by using Ada constructs : same as avoid memory leaks Detect with gcc/gnat debug pools Detect with memcheck Detect with gcc “address sanitizer” option New functionality, will be in gcc 4.8 Need to recompile Not (yet) tried at Eurocontrol
FOSDEM Buffer overflows (1) Ada arrays are first class citizens ‘range, ‘first, ‘last, … avoid buffer overflows Arrays always carry their bounds Detect with Ada : standard mandates array index verification All array overflows are detected before damage Buffer overflow results in a run-time exception => no “random behaviour” Very small overhead. Measured on a representative program (compiled with optimisation) : less than 2% for all standard Ada Reference Manual checks (a part of these Ada RM checks are the buffer overflow checks).
FOSDEM Buffer overflows (2) Detect (not needed with Ada) with Memcheck Detects (most) buffer overflows in heap allocated blocks No detection in global or stack or “inside” a struct Detect (not needed with Ada) with Exp-sgcheck Experimental tool detecting stack and global overrun No detection “inside” a struct Detect (not needed with Ada) with gcc “address sanitizer” option Will be in gcc 4.8, not (yet) tried at Eurocontrol No detection “inside” a struct Only the Ada run-time checks are detecting all buffer overflows E.g. “inside” record (struct) components
FOSDEM Race conditions Avoid by using Ada constructs Ada tasks (threads) are first class citizens Many constructs helps to avoid race conditions Rendez-vous, protected objects, … Ada multi-tasking constructs are easy – higher abstraction level (or at least easier to use than pthreads) E.g. protected objects Detect by using helgrind (or drd) Helgrind used very successfully at Eurocontrol Detect by using gcc “thread sanitizer” option New functionality, will be in gcc 4.8 Need to recompile Not tried (yet) at Eurocontrol
FOSDEM Performance issues Callgrind : where is my CPU spent ? It can measure a lot more E.g. memory cache misses using a cache simulator Callgrind is the main tool used at Eurocontrol to tune the performance Kcachegrind : amazing visualisation tool for callgrind output
FOSDEM Kcachegrind
FOSDEM Kcachegrind
FOSDEM Memory use analysis Memcheck Report “delta memory” usage between two memory scans Reports can be triggered from the program or from the shell Massif Shows the evolution of memory use with time Produces reports at regular interval or on request Exp-dhat Shows if heap allocated memory is “accessed” a lot E.g. can report memory allocated and then not used anymore Memcheck and Massif used at Eurocontrol
FOSDEM Feedback from Valgrind use at Eurocontrol Very easy to use No re-compilation, no re-linking, works with closed source libs, … Many powerful/advanced functionalities But, depending on the tool times slower 2.. xxx+ more memory Eurocontrol applications are big/heavy Encountered very high memory and CPU use by Valgrind => several optimisations/additional functionalities added to Valgrind
FOSDEM Valgrind NEWS One or two new major releases per year New platforms, support for new instructions, … New functionalities, new tools, … Optimisation in CPU or memory, … Bug fixes Easy to get and compile new versions Get last released version on Next (unreleased) version: svn co svn://svn.valgrind.org/valgrind/trunk valgrind cd valgrind ./autogen.sh ./configure --prefix=... make make install
FOSDEM Valgrind NEWS Current release : Next release under development : We will discuss recently provided or next release NEWS Not yet released functionality in orange (will be in 3.9.0)
FOSDEM Valgrind NEWS: platforms Started on linux/x86 Now available on Linux/x86,amd64,ppc32,ppc64,arm,s390,mips32 Android/arm,x86 MacOS/x86,amd64 Support for new instructions E.g. SSE, AVX, AES E.g. ppc Decimal Floating Point instructions Support for new distributions and glibc versions
FOSDEM Valgrind NEWS: improved leak functionality memcheck leak suppression suppresses all leak kinds E.g. an entry aimed at suppressing “possible leak” also suppresses “definite leak” Dangling pointer errors only reports the “freed at” stack trace : A suppression optionally indicates the kind of leaks to suppress Command line arguments to control output and/or exit code --show-leak-kinds=kind1,kind2,… --errors-for-leak-kinds=kind1,kind2,… --keep-stacktraces=alloc|free|alloc-and-free|alloc-then-free|none Can report more stacktraces in a dangling pointer error Or can optimise memory by recording fewer or no stack traces E.g. if not interested in some error kinds --merge-recursive-frames= Useful to limit the number of recorded stack traces by merging recursive calls
FOSDEM Valgrind NEWS : gdb server (1) GDB server allows to have fully debuggable program under Valgrind Connect with GDB to the Valgrind gdb server GDB can then Insert breakpoints, (unlimited) watchpoints, … Examine the list of threads/tasks Examine the value of variables Continue/interrupt execution … Valgrind gdb server provides “monitor commands” Allows to trigger Valgrind functionalities from GDB (or from the shell command line) E.g. for memcheck : leak search, checking definedness, …
FOSDEM Valgrind NEWS : gdb server (2) memcheck monitor commands get_vbits [ ] returns validity bits for (or 1) bytes at bit values 0 = valid, 1 = invalid, __ = unaddressable byte Example: get_vbits 0x8049c78 10 make_memory [noaccess|undefined |defined|Definedifaddressable] [ ] mark (or 1) bytes at with the given accessibility check_memory [addressable|defined] [ ] check that (or 1) bytes at have the given accessibility and outputs a description of
FOSDEM Valgrind NEWS : gdb server (3) memcheck monitor commands leak_check [full*|summary] [kinds kind1,kind2,...|reachable|possibleleak*|definiteleak] [increased*|changed|any] [unlimited*|limited ] * = defaults where kind is one of definite indirect possible reachable all none Examples: leak_check leak_check summary any leak_check full kinds indirect,possible leak_check full reachable any limited 100 block_list after a leak search, shows the list of blocks of who_points_at [ ] shows places pointing inside (default 1) bytes at (with len 1, only shows "start pointers" pointing exactly to, with len > 1, will also show "interior pointers")
FOSDEM Valgrind NEWS: tune red zones size Red zone = protection zone before/after malloc-ed block Allows to detect buffer over/under-flow If too small: less chance to detect a bug If too big : uses too much memory Command line options to increase/decrease size --redzone-size= Size for client (application) malloc’ed blocks --core-redzone-size= Size for Valgrind internal malloc’ed blocks No buffer overflows with Ada => use minimal red zone
FOSDEM Valgrind NEWS: support for other malloc libs Command line –soname-synonyms=… allows to support non-libc malloc libraries or statically linked libs --soname-synonyms=somalloc=*tcmalloc* Support for all variants of tcmalloc shared libraries --soname-synonyms=somalloc=NONE Support for a statically linked malloc library
FOSDEM Valgrind bad NEWS: failure to develop, help needed Valgrind serialises thread execution In other words, on a multi-core, Valgrind can only use one core Trial done to make a “really” multi-threaded Valgrind Many race conditions found (with Valgrind on Valgrind) Some have been fixed The “none” tool reasonably uses multi-core Biggest (not solved) blocking problem: Memcheck “VA bits” data structure is used for each memory access Using locks to protect it is way too slow Even using one atomic instruction is too slow => ???? Ideas/help welcome …
FOSDEM Reliable Software : other tools/approaches/… AdaControl : Ada coding rule checker Developed initially for Eurocontrol. Open source Routinely used at Eurocontrol Static code analyzers CodePeer (Adacore) Program provers SPARK : annotated subset of Ada Ada 2012 contracts …
FOSDEM Conclusion : Reliable Software Reliable software obtained using a combination of various techniques and tools Use a safe language, i.e. Ada Complement this with tools Valgrind is a main tool used at Eurocontrol Use it, you will like it
FOSDEM Questions ?