Using Likely Invariants For Automated Software Fault Localization Swarup Kumar Sahoo John Criswell Chase Geigle Vikram Adve 1 Department of Computer Science University of Illinois at Urbana-Champaign
Motivation Software bugs cost ~$59.5 billion annually (about 0.6% of the GDP) NIST Report Windows 2000, 35M LOC, 63,000 known bugs at release time, 2 per 1000 lines [Mary Jo Foley] In 2006, a Mozilla developer admitted that everyday, almost 300 bugs appear in their Bugzilla [Anvik et.al.]
Motivation Cost of fault localization increases exponentially with life cycle Bug fix cost very high during operational/maintenance phase Important to quickly to fix the bugs before the appl ships (Source: Barry Boehm, “Equity Keynote Address” 2007 &Stefan Priebsch in Advanced OOP and Design Pattern, Codeworks 2009)
Motivation - 3 Debugging - process of eliminating a software failure Automatic fault localization –Automatically identify the root cause responsible for a failure Automatic fault localization can reduce dev cost and time 4 Reproduce failure Locate and understand root cause Fix Root Cause
Goal Need an automated system to detect root cause of bugs Efficient, scalable and report few false positives 5 Bug Localization Tool.C Program Faulty Input
Contributions Novel diagnosis mechanism for automated bug localization –Likely invariants using auto generated inputs "close to" failing input –First to combine invariants-approach with dynamic slicing in s/w –Two novel heuristics for reducing false positives Used 8 bugs in Squid, Apache, MySQL for evaluation –Tool provides only 5 to 17 program expressions as root cause 6
Intuition Two runs …. Compare invariants 7
Diagnosis with Invariants We use likely range invs to find potential root causes Range of values computed by program insts in correct runs –Violated invariants give us a set of candidate root cause(s) Invariants on load values, store values, return values. 8 Value TypeExample InstructionsExample Invariants Returnreturn int %weekday 0 <= %weekday <= 6 Load%value = load int* %p 0 <= %value Storestore int %val, int* %q 100 <= %val <= 100
Key Insights Tighter Likely invariants: –Compact (not precise) way to summarize & compare memory states –Efficiently isolates initial candidates of root causes to a small set Novel way to generate invariants –Automatically generated good inputs "close to" failing input –Few close good inputs to train invariants Much tighter and more relevant invariants Missed root causes less likely (though possibly many candidates) Sequence of novel filtering techniques to reduce false+ves 9
Generate Inputs Generate Invariants Test with Bad Input False +ve Filters Good Inputs Bad Inputs Instrumented Program Failed Invariants Failure-inducing (Bad) input Optional Input Specification Program Backward Slicing Dependence Filtering Multi-faulty Inp Filter Diagnosis Tool Architecture False+ve Filters
1 long day_nr ( uint year, uint month, uint day ) { 2 if ( year == 0 && … ) 3 return ( 0 ) ; 4 Delsum = 365 * year + … ; 5 if (month <= 2) 6 year--; 7 else delsum -= (month*4 + … ; 8 temp = ( year / … ; 9 return ( delsum + year/4 * temp ) ; 10 } 11 int week_day ( long daynr, bool first_weekday ) { 12 return ( daynr + … ) % 7 ; } 13 bool make_date ( … ) { Weekday = week_day ( day_nr (year, month, day ), 0 ) ; 16 str->append ( …names [ weekday ] …. ; } Source Code Example 11 Failing MySQL input : SELECT DATE (” ”,’%W %d %M %Y’) as a
1 long day_nr ( uint year, uint month, uint day ) { 2 if ( year == 0 && … ) 3 return ( 0 ) ; 4 Delsum = 365 * year + … ; 5 if (month <= 2) 6 year--; 7 else delsum -= (month*4 + … ; 8 temp = ( year / … ; 9 return ( delsum + year/4 * temp ) ; 10 } 11 int week_day ( long daynr, bool first_weekday ) { 12 return ( daynr + … ) % 7 ; } 13 bool make_date ( … ) { Weekday = week_day ( day_nr (year, month, day ), 0 ) ; 16 str->append ( …names [ weekday ] …. ; } Source Code Example – Invariant Failures 12 Invariant Failures – candidate Root causes
Insights for Training Invariants Use training inputs very “similar” to the failing input Capture the key relevant differences between 2 similar runs Very few “similar” good inputs to train invariants Much tighter and more relevant invariants Less likely to miss root causes Though possibly many false-positive candidates 13
Training Input Construction – 2 Approaches Deletion-based specification-independent approach –A variation of ddmin algorithm [Zeller et.al.] –Apply character-level rewriting/deletion Replacement-based specification-dependent approach –User gives input specification Tokens in input grammar and alternative tokens for each token –Create variations for each input token depending upon type –Create inputs by using variations of 1 token at a time –Possible to implement this by modifying inbuilt parsers in application Can be automated for given input specifications 14
Replacement-Based Spec-Dependent Approach – Example 15 SELECT NAME_CONST('flag',1) * MAX(a) FROM t1; SELECT NAME_CONST('flag',2) * MAX(a) FROM t1; SELECT NAME_CONST('flag',3) * MAX(a) FROM t1; SELECT NAME_CONST('flag',5) * MAX(a) FROM t1; SELECT NAME_CONST('flag',9) * MAX(a) FROM t1; SELECT NAME_CONST('flag',1) * SUM(a) FROM t1; SELECT NAME_CONST('flag',1) * MIN(a) FROM t1; SELECT NAME_CONST('flag',1) * AVG(a) FROM t1; SELECT NAME_CONST('flag',1) * STD(a) FROM t1;
Selecting Candidate Root Causes Invariant generation: –Select a set of “close” good inputs based on edit distance –Run Instrumented program on good inputs to generate invariants Candidate root cause selection: –Execute the program with the inserted invariants for the bad input –Failed invariants provide a set of candidate root-causes 16
Filtering Techniques Remove candidates that do not affect symptom Leverages DFS traversal of DDG during Slicing Discard dep failed inv, if no intervening passing inv Processing time is linear in the #edges in DDG Intersection of root causes for similar failing inputs 17 Dynamic Backward Slicing Dependence Filtering Multiple faulty Input Filter
Dependence Filtering Inv Pass Inv Fail Crash Symptom Inv Pass Probably not a root cause A possible root cause
Dependence Filtering Inv Pass Inv Fail Crash Symptom Inv Pass A possible root cause Inv Fail A possible root cause
1 long day_nr ( uint year, uint month, uint day ) { 2 if ( year == 0 && … ) 3 return ( 0 ) ; 4 Delsum = 365 * year + … ; 5 if (month <= 2) 6 year--; 7 else delsum -= (month*4 + … ; 8 temp = ( year / … ; 9 return ( delsum + year/4 * temp ) ; 10 } 11 int week_day ( long daynr, bool first_weekday ) { 12 return ( daynr + … ) % 7 ; } 13 bool make_date ( … ) { Weekday = week_day ( day_nr (year, month, day ), 0 ) ; 16 str->append ( …names [ weekday ] …. ; } Souce Code Example – Dependence Filtering 20 Probably not a root cause A possible root cause
Slice from root cause to symptom span several functions. This distance is high for incorrect output bugs. Methodology - Characteristics of 8 Bugs 21 Bug-Name Symptom Distance (Dyn #LLVM inst) Distance (Static #LOC) Distance (Static #Functions) Squid-len Buffer Overflow 1262 SQL-int-overflow Buffer Overflow 1873 SQL-convert Incorrect Output SQL-aggregate Buffer Overflow 442 SQL-precision Incorrect Output SQL-mul-overflow Incorrect Output SQL-dataloss Incorrect Output Apache- overflow Buffer Overflow s/w bugs in 3 large server appl: Squid, Apache & MySQL Overall 12 bugs - 4 missing code bugs not considered LLVM-2.6 (LLVM-gcc) to compile programs & run our passes
Experimental Results – Bug Loc Effectiveness 22 Bug-Name#Invs#Failed-Invs #Final Candidate Root Causes Squid-len SQL-int-overflow SQL-convert SQL-aggregate SQL-precision SQL-mul-overflow SQL-dataloss Apache- overflow Tool provides only 5 –17 exprs as candidate root causes #Final Candidate Root Causes #Failed-Invs
Experimental Results – Effectiveness of Filters 23 Bug-Name #Failed- InvsSlice Squid-len SQL-int-overflow 9536 SQL-convert 9327 SQL-aggregate SQL-precision SQL-mul-overflow 8313 SQL-dataloss Apache- overflow % Reduction80%
Experimental Results – Effectiveness of Filters 24 Bug-Name Failed- InvsSlice Dependence- filter Squid-len SQL-int-overflow SQL-convert SQL-aggregate SQL-precision SQL-mul-overflow SQL-dataloss Apache- overflow % Reduction80%58%
Experimental Results – Effectiveness of Filters 25 Bug-Name Failed- InvsSlice Dependence- filter Multiple- faulty-inputs Squid-len SQL-int-overflow SQL-convert SQL-aggregate SQL-precision SQL-mul-overflow SQL-dataloss Apache- overflow % Reduction80%58%23% Slicing removes 80%, Dep-filtering 58%, multy-inp 23% false+ves
Experimental Results Hundreds of failed invariants as the candidate set Slicing removes nearly 80% false positives Dependence filtering removes 58% remaining false+ves Final step includes root cause for 7 out of 8 bugs –Last filter removes true root cause for SQL-int-overflow bug Tool provides only 5 to 17 expressions as root cause 26
Comparison With Tarantula and Ochiai 27 Statistical techniques rank each statement based on formula Performed better for SQL-int-overflow & SQL-precision bugs Good for bugs where control flow diverges from good runs Our approach better than Tarantula/Ochiai in 6 out of 8 bugs
Summary Tool can identify only 5 − 17 candidate root causes Generates invariants using auto-generated similar inputs Very few “similar” good inputs to get tighter invariants Novel filters to reduce false positives Questions? 28