Using Likely Invariants For Automated Software Fault Localization Swarup Kumar Sahoo John Criswell Chase Geigle Vikram Adve 1 Department of Computer Science.

Using Likely Invariants For Automated Software Fault Localization Swarup Kumar Sahoo John Criswell Chase Geigle Vikram Adve 1 Department of Computer Science University of Illinois at Urbana-Champaign

Motivation - 1 2 Software bugs cost ~$59.5 billion annually (about 0.6% of the GDP) NIST Report Windows 2000, 35M LOC, 63,000 known bugs at release time, 2 per 1000 lines [Mary Jo Foley] In 2006, a Mozilla developer admitted that everyday, almost 300 bugs appear in their Bugzilla [Anvik et.al.]

Motivation - 2 3 Cost of fault localization increases exponentially with life cycle Bug fix cost very high during operational/maintenance phase Important to quickly to fix the bugs before the appl ships (Source: Barry Boehm, “Equity Keynote Address” 2007 &Stefan Priebsch in Advanced OOP and Design Pattern, Codeworks 2009)

Motivation - 3 Debugging - process of eliminating a software failure Automatic fault localization –Automatically identify the root cause responsible for a failure Automatic fault localization can reduce dev cost and time 4 Reproduce failure Locate and understand root cause Fix Root Cause

Goal Need an automated system to detect root cause of bugs  Efficient, scalable and report few false positives 5 Bug Localization Tool.C Program Faulty Input

Contributions Novel diagnosis mechanism for automated bug localization –Likely invariants using auto generated inputs "close to" failing input –First to combine invariants-approach with dynamic slicing in s/w –Two novel heuristics for reducing false positives Used 8 bugs in Squid, Apache, MySQL for evaluation –Tool provides only 5 to 17 program expressions as root cause 6

Outline Motivation and Contribution Problems With Existing Work Bug Diagnosis Framework Details Experimental Results Future Work and Summary 7

Definitions Locations of root cause of a software bug: –For experiments, all modified statements in the bug patches False Positives: –All candidate root causes except the locations of true root causes Likely program invariants: –Program properties observed to hold in some set of successful runs –Unlike sound invariants, may not hold for all possible executions 8

Problems With Existing Work - 1 Delta Debugging - smart approach, dont scale [Zeller et.al.] –Comparing entire memory states of 2 runs is very expensive Insight: Likely invariants for efficient comparison of runs 9

Existing Invariants Approaches - Problems [Dimitrov et.al.,Pytlik et.al.] Test inputs for training may not always be available Coverage of test inputs is often low for training [Mockus et.al.] No solution to make likely invariants narrow or tighter  May miss root cause 10

Key Insights for improvement Tighter Likely invariants: –Compact (not precise) way to summarize & compare memory states –Efficiently isolates initial candidates of root causes to a small set Novel way to generate invariants –Automatically generated good inputs "close to" failing input –Few close good inputs to train invariants  Much tighter and more relevant invariants  Missed root causes less likely (though possibly many candidates) Sequence of novel filtering techniques to reduce false+ves 11

Outline Motivation and Contributions Problems With Existing Work Bug Diagnosis Framework Details Experimental Results Future Work and Summary 12

Generate Inputs Generate Invariants Test with Bad Input False +ve Filters Good Inputs Bad Inputs Instrumented Program Failed Invariants Failure-inducing (Bad) input Optional Input Specification Program Backward Slicing Dependence Filtering Multi-faulty Inp Filter Diagnosis Tool Architecture False+ve Filters

Source Code Example - 1 Failing MySQL input : SELECT DATE (”0000-01-01”,’%W %d %M %Y’) as a 1 bool make_date ( uint year, … ) { 2 …… 3 if (month <= 2) 4 year--; 5 … 6 daynr = ………. ; 7 weekday = ( daynr + … ; 8 str->append ( … names [ weekday ] …. ; 9... } 14 Weekday turns –ve  Buffer overflow Unsigned year becomes a large value instead of -1

1 long day_nr ( uint year, uint month, uint day ) { 2 if ( year == 0 && … ) 3 return ( 0 ) ; 4 Delsum = 365 * year + … ; 5 if (month <= 2) 6 year--; 7 else delsum -= (month*4 + … ; 8 temp = ( year / 100 + … ; 9 return ( delsum + year/4 * temp ) ; 10 } 11 int week_day ( long daynr, bool first_weekday ) { 12 return ( daynr + … ) % 7 ; } 13 bool make_date ( … ) { 14... 15 Weekday = week_day ( day_nr (year, month, day ), 0 ) ; 16 str->append ( …names [ weekday ] …. ; 17... } Source Code Example – 2 15

Diagnosis with Invariants We use likely range invs to find potential root causes –Efficient comparison of 2 runs to quickly isolate diff behavior Range of values computed by program insts in correct runs –Violated invariants give us a set of candidate root cause(s) Invariants on load values, store values, return values. 16 Value TypeExample InstructionsExample Invariants Returnreturn int %weekday 0 <= %weekday <= 6 Load%value = load int* %p 0 <= %value Storestore int %val, int* %q 100 <= %val <= 100

1 long day_nr ( uint year, uint month, uint day ) { 2 if ( year == 0 && … ) 3 return ( 0 ) ; 4 Delsum = 365 * year + … ; 5 if (month <= 2) 6 year--; 7 else delsum -= (month*4 + … ; 8 temp = ( year / 100 + … ; 9 return ( delsum + year/4 * temp ) ; 10 } 11 int week_day ( long daynr, bool first_weekday ) { 12 return ( daynr + … ) % 7 ; } 13 bool make_date ( … ) { 14... 15 Weekday = week_day ( day_nr (year, month, day ), 0 ) ; 16 str->append ( …names [ weekday ] …. ; 17... } Source Code Example – Invariant Failures 17 Invariant Failures – candidate Root causes

Insights for Training Invariants Use training inputs very “similar” to the failing input  Capture the key relevant differences between 2 similar runs Very few “similar” good inputs to train invariants  Much tighter and more relevant invariants  Less likely to miss root causes  Though possibly many false-positive candidates 18

Training Input Construction – 2 Approaches Deletion-based specification-independent approach –A variation of ddmin algorithm [Zeller et.al.] –Apply character-level rewriting/deletion Replacement-based specification-dependent approach –User gives input specification  Tokens in input grammar and alternative tokens for each token –Create variations for each input token depending upon type –Create inputs by using variations of 1 token at a time –Possible to implement this by modifying inbuilt parsers in application Can be automated for given input specifications 19

Replacement-Based Spec-Dependent Approach – Example 20 SELECT NAME_CONST('flag',1) * MAX(a) FROM t1; SELECT NAME_CONST('flag',2) * MAX(a) FROM t1; SELECT NAME_CONST('flag',3) * MAX(a) FROM t1; SELECT NAME_CONST('flag',5) * MAX(a) FROM t1; SELECT NAME_CONST('flag',9) * MAX(a) FROM t1; SELECT NAME_CONST('flag',1) * SUM(a) FROM t1; SELECT NAME_CONST('flag',1) * MIN(a) FROM t1; SELECT NAME_CONST('flag',1) * AVG(a) FROM t1; SELECT NAME_CONST('flag',1) * STD(a) FROM t1;

Replacement-Based Spec-Dependent Approach – String Example 21 ABCDEFGHBBCDEFGHACCDEFGHBCCDEFGHABDEEFGHBCDEEFGHABCDFGHIBCDEFGHI

Selecting Candidate Root Causes Invariant generation: –Select a set of “close” good inputs based on edit distance –Run Instrumented program on good inputs to generate invariants Candidate root cause selection: –Execute the program with the inserted invariants for the bad input –Failed invariants provide a set of candidate root-causes 22

Filtering Techniques Remove candidates that do not affect symptom Leverages DFS traversal of DDG during Slicing Discard dep failed inv, if no intervening passing inv Processing time is linear in the #edges in DDG Intersection of root causes for similar failing inputs 23 Dynamic Backward Slicing Dependence Filtering Multiple faulty Input Filter

Dependence Filtering - 1 24 Inv Pass Inv Fail Crash Symptom Inv Pass Probably not a root cause A possible root cause

Dependence Filtering - 2 25 Inv Pass Inv Fail Crash Symptom Inv Pass A possible root cause Inv Fail A possible root cause

1 long day_nr ( uint year, uint month, uint day ) { 2 if ( year == 0 && … ) 3 return ( 0 ) ; 4 Delsum = 365 * year + … ; 5 if (month <= 2) 6 year--; 7 else delsum -= (month*4 + … ; 8 temp = ( year / 100 + … ; 9 return ( delsum + year/4 * temp ) ; 10 } 11 int week_day ( long daynr, bool first_weekday ) { 12 return ( daynr + … ) % 7 ; } 13 bool make_date ( … ) { 14... 15 Weekday = week_day ( day_nr (year, month, day ), 0 ) ; 16 str->append ( …names [ weekday ] …. ; 17... } Souce Code Example – Dependence Filtering 26 Probably not a root cause A possible root cause

Outline Motivation and Contributions Problems With Existing Work Bug Diagnosis Framework Details Experimental Results Future Work and Summary 27

Evaluation Questions How effective is our overall bug localization framework? How effective are our filtering techniques? 28

Slice from root cause to symptom span several functions. This distance is high for incorrect output bugs. Methodology - Characteristics of 8 Bugs 29 Bug-Name Symptom Distance (Dyn #LLVM inst) Distance (Static #LOC) Distance (Static #Functions) Squid-len Buffer Overflow 1262 SQL-int-overflow Buffer Overflow 1873 SQL-convert Incorrect Output 86 2710 SQL-aggregate Buffer Overflow 442 SQL-precision Incorrect Output 124 4117 SQL-mul-overflow Incorrect Output 114 3617 SQL-dataloss Incorrect Output 429 165 Apache- overflow Buffer Overflow 3278062 8 s/w bugs in 3 large server appl: Squid, Apache & MySQL Overall 12 bugs - 4 missing code bugs not considered LLVM-2.6 (LLVM-gcc) to compile programs & run our passes 2710 4117 3617 165

Experimental Results – Bug Loc Effectiveness 30 Bug-Name#Invs#Failed-Invs #Final Candidate Root Causes Squid-len 33583579 SQL-int-overflow 59179516 SQL-convert 5942936 SQL-aggregate 68471568 SQL-precision 456613017 SQL-mul-overflow 4652835 SQL-dataloss 583615311 Apache- overflow 22951206 Tool provides only 5 –17 exprs as candidate root causes #Final Candidate Root Causes 9 16 6 8 17 5 11 6 #Failed-Invs 357 95 93 156 130 83 153 120

Experimental Results – Effectiveness of Filters 31 Bug-Name #Failed- InvsSlice Squid-len 35730 SQL-int-overflow 9536 SQL-convert 9327 SQL-aggregate 15644 SQL-precision 13034 SQL-mul-overflow 8313 SQL-dataloss 15335 Apache- overflow 12012 % Reduction80%

Experimental Results – Effectiveness of Filters 32 Bug-Name Failed- InvsSlice Dependence- filter Squid-len 357309 SQL-int-overflow 953616 SQL-convert 93279 SQL-aggregate 1564414 SQL-precision 1303418 SQL-mul-overflow 83137 SQL-dataloss 1533517 Apache- overflow 120126 % Reduction80%58%

Experimental Results – Effectiveness of Filters 33 Bug-Name Failed- InvsSlice Dependence- filter Multiple- faulty-inputs Squid-len 3573099 SQL-int-overflow 95361612 SQL-convert 932796 SQL-aggregate 15644148 SQL-precision 130341817 SQL-mul-overflow 831375 SQL-dataloss 153351711 Apache- overflow 1201266 % Reduction80%58%23% Slicing removes 80%, Dep-filtering 58%, multy-inp 23% false+ves

Experimental Results – Effectiveness of Filters 34 Bug-Name Failed- InvsSlice Dependence- filter Multiple- faulty-inputs Squid-len 3573099 SQL-int-overflow 95361612 SQL-convert 932796 SQL-aggregate 15644148 SQL-precision 130341817 SQL-mul-overflow 831375 SQL-dataloss 153351711 Apache- overflow 1201266 % Reduction80%58%23% Final step includes root cause for 7 out of 8 bugs Root Cause missed in last step, but is in dep filtering step

Souce Code Example – Output to Programmer 35 1 long day_nr ( uint year, uint month, uint day ) { 2 if ( year == 0 && … ) 3 return ( 0 ) ; 4 Delsum = 365 * year + … ; 5 if (month <= 2) 6 year--; 7 else delsum -= (month*4 + … ; 8 temp = ( year / 100 + … ; 9 return ( delsum + year/4 * temp ) ; 10 } 11 int week_day ( long daynr, bool first_weekday ) { 12 return ( daynr + … ) % 7 ; } 13 bool make_date ( … ) { 14... 15 Weekday = week_day ( day_nr (year, month, day ), 0 ) ; 16 str->append ( …names [ weekday ] …. ; 17... } Maximal, Pure sub-expression LOC in Src- expr-tree 49 48 64 28 89 26 152 171 #Candidate Root Causes 9 16 6 8 17 5 11 6

Comparison With Tarantula and Ochiai 36 Statistical techniques rank each statement based on formula Performed better for SQL-int-overflow & SQL-precision bugs Good for bugs where control flow diverges from good runs Our approach better than Tarantula/Ochiai in 6 out of 8 bugs

Future Work More kinds of invariants More applications and more classes of bugs Evaluation by outside users for unfixed bugs (clang, chrome) More systematic analysis techniques to reduce false+ve Make input generation more robust 37

Summary Tool can identify only 5 − 17 candidate root causes Generates invariants using auto-generated similar inputs Very few “similar” good inputs to get tighter invariants Novel filters to reduce false positives Questions? 38

Experimental Results – Input Sensitivity SQL-int-overflow, SQL-convert & Squid-len bugs inp sensitive Squid-len may not find root cause with general test inputs –Our approach is more likely to identify the root cause of bug –Input field (a username) with many special characters causes failure –larger username in training inp than faulty input  will miss root cause Failure-inducing input ‘ftp://usernam\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ :pass@ftp.apoc.com’ Training input 40 ‘ftp://Narayanan_Venkatasubramaniam_2012:pass@ftp.apoc.com’

Comparison With Tarantula and Ochiai BugTarantulaOchiaiOur Approach Squid-len [6 - 5266][2 - 5255]49 SQL-int-overflow [34 - 34] 48 SQL-convert [55 - 9409] 64 SQL-aggregate [776 - 9847][694 - 9762]28 SQL-precision [1 - 19][1 - 16]89 SQL-mul-overflow [5680 - 6878][5678 - 6876]26 SQL-dataloss [8 - 6060] 152 Apache- overflow [28 - 5372][1 - 5357]171 41 Statistical techniques rank each statement based on formula –How many good/bad inputs executes a statement & how many not –#Statements with higher rank are more likely root causes –Tied statements are given a range as rank [34 - 34] 48 [1 - 19][1 - 16]89 Good for bugs where control flow diverges from good runs Better than Tarantula/Ochiai in 6 out of 8 bugs

Problems With Existing Work - 1 Earlier delta-debugging/SAT based solutions dont scale well [Zeller et.al., Jose et.al.] Improvements to Delta Debug handle larger appls [Sumner et.al.] –Uses execution and memory indexing –But still misses root causes in around 55% of cause-effect chains One approach uses hardware support [Dimitrov et.al.] –Combines invariants with dynamic forward data dependence slice Statistical techniques like Tarantula and Ochiai [Jones et.al.] –Gives many false positives(10% of stmts may need to be examined) –Can miss root causes if control flow does not diverge 42

Questions 1. Automated, but require lot of manual effort –But, it can be mostly automated… (Grammar + Symptom + ) 2. Why do we need localization when auto-bug-patching exists? –Localization is itself needed auto-patching 3. Only 8 bugs, how can you generalize? –Randomly selected 12 bugs – choose 8 bugs….Comparable #bugs with related work…Need more bugs to confirm/generalize 4. Did u compare with general test input (except last step)? –No, … We tried some small expts….Future work…requires lots of bugs…. 43

Motivation - 2 44 Cost of fault localization & bug fixing increases with time & phases of development. It is important to quickly to fix the bugs

Intuition Two runs …. Compare invariants 45

Generate Inputs Generate Invariants Test with Bad Input False +ve Filters Good Inputs Bad Inputs Instrumented Program Failed Invariants Failure-inducing (Bad) input Optional Input Specification Program Backward Slicing Dependence Filtering Multi-faulty Inp Filter Diagnosis Tool Architecture False+ve Filters

Source Code Example – 2 1 long day_nr ( uint year, uint month, uint day ) { 2 …… 3 if (month <= 2) 4 year--; 5 else delsum -= month*4 + … ; 6 temp = ( year / 100 + … ; 7 return ( delsum + year/4 * temp ) ; 8 } 9 int week_day ( long daynr, bool first_weekday ) { 10 return ( daynr + … ) % 7 11} 12 bool make_date ( … ) { 13... 14 weekday = week_day ( day_nr (year, month, day ), 0 ) ; 15 str->append ( …names [ weekday ] …. ; 16... } 47 Invariant Failures – candidate Root causes

Diagnosis Tool Architecture - 1 Input to the proposed tool –Failing input : SELECT DATE (”0000-01-01”,’%W %d %M %Y’) as a –Source code –Optional input specification : Input tokens and their replacements 48

Diagnosis Tool Architecture - 2 output to the programmer –The statement with violated invariants for each filtering step 2 invariants violated at line 15 – return of calc_daynr & calc_weekday Only the invariants on calc_daynr remains after last filter XXXXX –The maximal pure, local sub-expr rooted at each invariant candidate Only tracks invariants on load values, store values, return values. Bug may be anywhere in the maximal sub-expression rooted at a load, store or return that does not itself contain any load, store or return. XXX Red Lines 3, 5, 6, 8, 9 in code form expr tree for return of calc_daynr 49

Diagnosis Tool Architecture - 3 Programmer will first analyze candidates in last filtering step –For each candidate location, its local sub-expr tree will be analyzed If root cause is not found from candidates in last filter –Candidate locations from previous step will be analyzed & so on. 50

Deletion-Based Grammar-Independent Approach 51 A variation of ddmin algorithm [Zeller et.al.] Divide initial input into n subsets –Test each subset & its complement for good input Recursively increase the subset size select CAST(' 2006 -08-10 10:11:12. 012 345' AS DATETIME); select CAST(‘006-08-10 10:11:12.012345' AS DATETIME); select CAST('206-08-10 10:11:12.012345' AS DATETIME); select CAST('20-08-10 10:11:12.012345' AS DATETIME); select CAST('2006-08-10 10:1:12.12345' AS DATETIME); select CAST('2006-08-10 10:11:12.0345' AS DATETIME);

Dynamic Backwards Slicing Find the precise set of insts that influences a given inst Build backward slice from failure symptom –Remove invariants that do not affect symptom by taking intersection Implemented NPwC [Zhang et.al.] algo in 2 phases (Giri): –Handles both data-flow and control-flow dependences –At runtime record: Trace of memory addresses accessed Basic blocks traveresed Function call/returns –Build a dynamic program dep graph using trace & SSA form 52

Multiple Faulty Inputs Filtering Root cause likely to be same for failures with same symptom –Assume it must be present in candidate set of all such bad inputs Construct few similar bad inputs –Use same input construction algorithms –But select the inputs which produce same failure symptoms Repeat previous 3 steps: –Invariants, slicing and dependence filtering steps Take intersection of all root causes –Final set of candidate root cause locations 53

Experimental Results - 3 SQL-int-overflow & SQL-convert bugs are input sensitive –Last step filters root cause of SQL-int-overflow bug for auto-gen inps –Last filter removes root cause for SQL-convert using manual inputs 54

Source Code Example - 2 1 long day_nr ( uint year, uint month, uint day ) { 2 long delsum ; int temp ; 3 if ( year == 0 && month == 0 && day == 0) 4 return ( 0 ) ; /* Skip errors */ 5 Delsum = ( long ) (365 * year + 31 * (month - 1)+day ) ; 6 if (month <= 2) year--; 7 else delsum -= ( long ) (month*4+23)/10; 8 temp = ( int ) ( ( year /100+1) * 3)/4; 9 return ( delsum+( int ) year/4 * temp ) ; 10 } 11 Int week_day ( long daynr, bool first_weekday ) { 12 return ( ( daynr + 5L + … ) % 7 ) } 13 bool make_date ( … ) { 14... 15 weekday = week_day ( day_nr (year, month, day ), 0 ) ; 16 str->append ( … names [ weekday ] …. ; 17... } 55

Dependence Filtering Leverages DFS traversal of DDG during Slicing –Computes #intervening passed invs between two failed invariants –Discard the dependent failed invariant, if no intervening passing inv Processing time is linear in the #edges in DDG 56 Return of day_nr Return of week_day Not a root cause

Experimental Results – Effectiveness of Filters 57 Bug-Name Failed- InvsSlice Dependence- filter Multiple- faulty-inputs Squid-len 3573099 SQL-int-overflow 95361612 SQL-convert 932796 SQL-aggregate 15644148 SQL-precision 130341817 SQL-mul-overflow 831375 SQL-dataloss 153351711 Apache- overflow 1201266 Slicing removes 80%, Dep-filtering 58% false+ves Final step includes root cause for 7 out of 8 bugs Src-expr- tree 49 48 64 28 89 26 152 171 % Reduction80%58%23% Multiple- faulty-inputs 9 12 6 8 17 5 11 6 Dependence- filter 9 16 9 14 18 7 17 6 Slice 30 36 27 44 34 13 35 12

Experimental Results Hundreds of failed invariants as the candidate set Slicing removes nearly 80% false positives Dependence filtering removes 58% remaining false+ves Final step includes root cause for 7 out of 8 bugs –Last filter removes true root cause for SQL-int-overflow bug Tool provides only 5 to 17 expressions as root cause 58

Using Likely Invariants For Automated Software Fault Localization Swarup Kumar Sahoo John Criswell Chase Geigle Vikram Adve 1 Department of Computer Science.

Similar presentations

Presentation on theme: "Using Likely Invariants For Automated Software Fault Localization Swarup Kumar Sahoo John Criswell Chase Geigle Vikram Adve 1 Department of Computer Science."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using Likely Invariants For Automated Software Fault Localization Swarup Kumar Sahoo John Criswell Chase Geigle Vikram Adve 1 Department of Computer Science.

Similar presentations

Presentation on theme: "Using Likely Invariants For Automated Software Fault Localization Swarup Kumar Sahoo John Criswell Chase Geigle Vikram Adve 1 Department of Computer Science."— Presentation transcript:

Similar presentations

About project

Feedback