Using Likely Invariants For Automated Software Fault Localization Swarup Kumar Sahoo John Criswell Chase Geigle Vikram Adve 1 Department of Computer Science.

Slides:

Advertisements

Similar presentations

Delta Debugging and Model Checkers for fault localization

Advertisements

Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.

Program Slicing Mark Weiser and Precise Dynamic Slicing Algorithms Xiangyu Zhang, Rajiv Gupta & Youtao Zhang Presented by Harini Ramaprasad.

Presented By: Krishna Balasubramanian

Bouncer securing software by blocking bad input Miguel Castro Manuel Costa, Lidong Zhou, Lintao Zhang, and Marcus Peinado Microsoft Research.

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 2007 Exterminator: Automatically Correcting Memory Errors with High Probability Gene.

Bug Isolation via Remote Program Sampling Ben Liblit, Alex Aiken, Alice X.Zheng, Michael I.Jordan Presented by: Xia Cheng.

Ch. 1: Software Development (Read) 5 Phases of Software Life Cycle: Problem Analysis and Specification Design Implementation (Coding) Testing, Execution.

CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Statistical Debugging: A Tutorial Steven C.H. Hoi Acknowledgement: Some slides in this tutorial were borrowed from Chao Liu at UIUC.

CS590 Z Software Defect Analysis Xiangyu Zhang. CS590F Software Reliability What is Software Defect Analysis  Given a software program, with or without.

Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.

Delta Debugging - Demo Presented by: Xia Cheng. Motivation Automation is difficult Automation is difficult fail analysis needs complete understanding.

Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.

Guide To UNIX Using Linux Third Edition

1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.

Automated Diagnosis of Software Configuration Errors

50.530: Software Engineering Sun Jun SUTD. DateTopicRemarks Sep 15Introduction Sep 22Automatic Testing Sep 29Delta Debugging Oct 13Bug Localization Oct.

Expediting Programmer AWAREness of Anomalous Code Sarah E. Smith Laurie Williams Jun Xu November 11, 2005.

Fundamentals of Python: From First Programs Through Data Structures

Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.

CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.

Fundamentals of Python: First Programs

Invitation to Computer Science 5th Edition

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Debugging Support.

Chapter 5: Control Structures II (Repetition)

CHAPTER 5: CONTROL STRUCTURES II INSTRUCTOR: MOHAMMAD MOJADDAM.

An Empirical Study of Reported Bugs in Server Software with Implications for Automated Bug Diagnosis Swarup Kumar Sahoo, John Criswell, Vikram Adve Department.

CS5103 Software Engineering Lecture 17 Debugging.

Bug Localization with Machine Learning Techniques Wujie Zheng

School of Electrical Engineering and Computer Science University of Central Florida Anomaly-Based Bug Prediction, Isolation, and Validation: An Automated.

The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.

ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.

Which Configuration Option Should I Change? Sai Zhang, Michael D. Ernst University of Washington Presented by: Kıvanç Muşlu.

Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.

Software Engineering Chapter 3 CPSC Pascal Brent M. Dingle Texas A&M University.

Controlling Execution Programming Right from the Start with Visual Basic.NET 1/e 8.

Ongoing projects in the Program Analysis Group Marcelo d’Amorim Informatics Center, Federal University of Pernambuco (UFPE) Belo Horizonte, MG-Brazil,

Chapter 5: Control Structures II (Repetition). Objectives In this chapter, you will: – Learn about repetition (looping) control structures – Learn how.

Using Likely Program Invariants to Detect Hardware Errors Swarup Kumar Sahoo, Man-Lap Li, Pradeep Ramachandran, Sarita Adve, Vikram Adve, Yuanyuan Zhou.

1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng

Relyzer: Exploiting Application-level Fault Equivalence to Analyze Application Resiliency to Transient Faults Siva Hari 1, Sarita Adve 1, Helia Naeimi.

1 Ch. 1: Software Development (Read) 5 Phases of Software Life Cycle: Problem Analysis and Specification Design Implementation (Coding) Testing, Execution.

Using Likely Invariants For Automated Software Fault Localization Swarup Kumar Sahoo John Criswell Chase Geigle Vikram Adve 1 Department of Computer Science.

Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.

Xusheng Xiao North Carolina State University CSC 720 Project Presentation 1.

References: “Pruning Dynamic Slices With Confidence’’, by X. Zhang, N. Gupta and R. Gupta (PLDI 2006). “Locating Faults Through Automated Predicate Switching’’,

Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd.

Software Development Problem Analysis and Specification Design Implementation (Coding) Testing, Execution and Debugging Maintenance.

CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

CAPP: Change-Aware Preemption Prioritization Vilas Jagannath, Qingzhou Luo, Darko Marinov Sep 6 th 2011.

Automating Configuration Troubleshooting with Dynamic Information Flow Analysis Mona Attariyan Jason Flinn University of Michigan.

Pruning Dynamic Slices With Confidence Original by: Xiangyu Zhang Neelam Gupta Rajiv Gupta The University of Arizona Presented by: David Carrillo.

CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.

1 CS510 S o f t w a r e E n g i n e e r i n g Delta Debugging Simplifying and Isolating Failure-Inducing Input Andreas Zeller and Ralf Hildebrandt IEEE.

Simplifying and Isolating Failure-Inducing Input Andreas Zeller and Ralf Hildebrandt IEEE Transactions on Software Engineering (TSE) 2002.

Automated Adaptive Bug Isolation using Dyninst Piramanayagam Arumuga Nainar, Prof. Ben Liblit University of Wisconsin-Madison.

Automatic Diagnosis and Response to Memory Corruption Vulnerabilities Authors: Jun Xu, Peng Ning, Chongkyung Kil, Yan Zhai, Chris Bookholt Cyber Defense.

Test Case Purification for Improving Fault Localization presented by Taehoon Kwak SoftWare Testing & Verification Group Jifeng Xuan, Martin Monperrus [FSE’14]

Evidence-Based Automated Program Fixing

YAHMD - Yet Another Heap Memory Debugger

Chapter 8 – Software Testing

Verification and Testing

Lazy Diagnosis of In-Production Concurrency Bugs

RDE: Replay DEbugging for Diagnosing Production Site Failures

Types of Testing Visit to more Learning Resources.

High Coverage Detection of Input-Related Security Faults

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

Presentation transcript:

Using Likely Invariants For Automated Software Fault Localization Swarup Kumar Sahoo John Criswell Chase Geigle Vikram Adve 1 Department of Computer Science University of Illinois at Urbana-Champaign

Motivation Software bugs cost ~$59.5 billion annually (about 0.6% of the GDP) NIST Report Windows 2000, 35M LOC, 63,000 known bugs at release time, 2 per 1000 lines [Mary Jo Foley] In 2006, a Mozilla developer admitted that everyday, almost 300 bugs appear in their Bugzilla [Anvik et.al.]

Motivation Cost of fault localization increases exponentially with life cycle Bug fix cost very high during operational/maintenance phase Important to quickly to fix the bugs before the appl ships (Source: Barry Boehm, “Equity Keynote Address” 2007 &Stefan Priebsch in Advanced OOP and Design Pattern, Codeworks 2009)

Motivation - 3 Debugging - process of eliminating a software failure Automatic fault localization –Automatically identify the root cause responsible for a failure Automatic fault localization can reduce dev cost and time 4 Reproduce failure Locate and understand root cause Fix Root Cause

Goal Need an automated system to detect root cause of bugs  Efficient, scalable and report few false positives 5 Bug Localization Tool.C Program Faulty Input

Contributions Novel diagnosis mechanism for automated bug localization –Likely invariants using auto generated inputs "close to" failing input –First to combine invariants-approach with dynamic slicing in s/w –Two novel heuristics for reducing false positives Used 8 bugs in Squid, Apache, MySQL for evaluation –Tool provides only 5 to 17 program expressions as root cause 6

Intuition Two runs …. Compare invariants 7

Diagnosis with Invariants We use likely range invs to find potential root causes Range of values computed by program insts in correct runs –Violated invariants give us a set of candidate root cause(s) Invariants on load values, store values, return values. 8 Value TypeExample InstructionsExample Invariants Returnreturn int %weekday 0 <= %weekday <= 6 Load%value = load int* %p 0 <= %value Storestore int %val, int* %q 100 <= %val <= 100

Key Insights Tighter Likely invariants: –Compact (not precise) way to summarize & compare memory states –Efficiently isolates initial candidates of root causes to a small set Novel way to generate invariants –Automatically generated good inputs "close to" failing input –Few close good inputs to train invariants  Much tighter and more relevant invariants  Missed root causes less likely (though possibly many candidates) Sequence of novel filtering techniques to reduce false+ves 9

Generate Inputs Generate Invariants Test with Bad Input False +ve Filters Good Inputs Bad Inputs Instrumented Program Failed Invariants Failure-inducing (Bad) input Optional Input Specification Program Backward Slicing Dependence Filtering Multi-faulty Inp Filter Diagnosis Tool Architecture False+ve Filters

1 long day_nr ( uint year, uint month, uint day ) { 2 if ( year == 0 && … ) 3 return ( 0 ) ; 4 Delsum = 365 * year + … ; 5 if (month <= 2) 6 year--; 7 else delsum -= (month*4 + … ; 8 temp = ( year / … ; 9 return ( delsum + year/4 * temp ) ; 10 } 11 int week_day ( long daynr, bool first_weekday ) { 12 return ( daynr + … ) % 7 ; } 13 bool make_date ( … ) { Weekday = week_day ( day_nr (year, month, day ), 0 ) ; 16 str->append ( …names [ weekday ] …. ; } Source Code Example 11 Failing MySQL input : SELECT DATE (” ”,’%W %d %M %Y’) as a

1 long day_nr ( uint year, uint month, uint day ) { 2 if ( year == 0 && … ) 3 return ( 0 ) ; 4 Delsum = 365 * year + … ; 5 if (month <= 2) 6 year--; 7 else delsum -= (month*4 + … ; 8 temp = ( year / … ; 9 return ( delsum + year/4 * temp ) ; 10 } 11 int week_day ( long daynr, bool first_weekday ) { 12 return ( daynr + … ) % 7 ; } 13 bool make_date ( … ) { Weekday = week_day ( day_nr (year, month, day ), 0 ) ; 16 str->append ( …names [ weekday ] …. ; } Source Code Example – Invariant Failures 12 Invariant Failures – candidate Root causes

Insights for Training Invariants Use training inputs very “similar” to the failing input  Capture the key relevant differences between 2 similar runs Very few “similar” good inputs to train invariants  Much tighter and more relevant invariants  Less likely to miss root causes  Though possibly many false-positive candidates 13

Training Input Construction – 2 Approaches Deletion-based specification-independent approach –A variation of ddmin algorithm [Zeller et.al.] –Apply character-level rewriting/deletion Replacement-based specification-dependent approach –User gives input specification  Tokens in input grammar and alternative tokens for each token –Create variations for each input token depending upon type –Create inputs by using variations of 1 token at a time –Possible to implement this by modifying inbuilt parsers in application Can be automated for given input specifications 14

Replacement-Based Spec-Dependent Approach – Example 15 SELECT NAME_CONST('flag',1) * MAX(a) FROM t1; SELECT NAME_CONST('flag',2) * MAX(a) FROM t1; SELECT NAME_CONST('flag',3) * MAX(a) FROM t1; SELECT NAME_CONST('flag',5) * MAX(a) FROM t1; SELECT NAME_CONST('flag',9) * MAX(a) FROM t1; SELECT NAME_CONST('flag',1) * SUM(a) FROM t1; SELECT NAME_CONST('flag',1) * MIN(a) FROM t1; SELECT NAME_CONST('flag',1) * AVG(a) FROM t1; SELECT NAME_CONST('flag',1) * STD(a) FROM t1;

Selecting Candidate Root Causes Invariant generation: –Select a set of “close” good inputs based on edit distance –Run Instrumented program on good inputs to generate invariants Candidate root cause selection: –Execute the program with the inserted invariants for the bad input –Failed invariants provide a set of candidate root-causes 16

Filtering Techniques Remove candidates that do not affect symptom Leverages DFS traversal of DDG during Slicing Discard dep failed inv, if no intervening passing inv Processing time is linear in the #edges in DDG Intersection of root causes for similar failing inputs 17 Dynamic Backward Slicing Dependence Filtering Multiple faulty Input Filter

Dependence Filtering Inv Pass Inv Fail Crash Symptom Inv Pass Probably not a root cause A possible root cause

Dependence Filtering Inv Pass Inv Fail Crash Symptom Inv Pass A possible root cause Inv Fail A possible root cause

1 long day_nr ( uint year, uint month, uint day ) { 2 if ( year == 0 && … ) 3 return ( 0 ) ; 4 Delsum = 365 * year + … ; 5 if (month <= 2) 6 year--; 7 else delsum -= (month*4 + … ; 8 temp = ( year / … ; 9 return ( delsum + year/4 * temp ) ; 10 } 11 int week_day ( long daynr, bool first_weekday ) { 12 return ( daynr + … ) % 7 ; } 13 bool make_date ( … ) { Weekday = week_day ( day_nr (year, month, day ), 0 ) ; 16 str->append ( …names [ weekday ] …. ; } Souce Code Example – Dependence Filtering 20 Probably not a root cause A possible root cause

Slice from root cause to symptom span several functions. This distance is high for incorrect output bugs. Methodology - Characteristics of 8 Bugs 21 Bug-Name Symptom Distance (Dyn #LLVM inst) Distance (Static #LOC) Distance (Static #Functions) Squid-len Buffer Overflow 1262 SQL-int-overflow Buffer Overflow 1873 SQL-convert Incorrect Output SQL-aggregate Buffer Overflow 442 SQL-precision Incorrect Output SQL-mul-overflow Incorrect Output SQL-dataloss Incorrect Output Apache- overflow Buffer Overflow s/w bugs in 3 large server appl: Squid, Apache & MySQL Overall 12 bugs - 4 missing code bugs not considered LLVM-2.6 (LLVM-gcc) to compile programs & run our passes

Experimental Results – Bug Loc Effectiveness 22 Bug-Name#Invs#Failed-Invs #Final Candidate Root Causes Squid-len SQL-int-overflow SQL-convert SQL-aggregate SQL-precision SQL-mul-overflow SQL-dataloss Apache- overflow Tool provides only 5 –17 exprs as candidate root causes #Final Candidate Root Causes #Failed-Invs

Experimental Results – Effectiveness of Filters 23 Bug-Name #Failed- InvsSlice Squid-len SQL-int-overflow 9536 SQL-convert 9327 SQL-aggregate SQL-precision SQL-mul-overflow 8313 SQL-dataloss Apache- overflow % Reduction80%

Experimental Results – Effectiveness of Filters 24 Bug-Name Failed- InvsSlice Dependence- filter Squid-len SQL-int-overflow SQL-convert SQL-aggregate SQL-precision SQL-mul-overflow SQL-dataloss Apache- overflow % Reduction80%58%

Experimental Results – Effectiveness of Filters 25 Bug-Name Failed- InvsSlice Dependence- filter Multiple- faulty-inputs Squid-len SQL-int-overflow SQL-convert SQL-aggregate SQL-precision SQL-mul-overflow SQL-dataloss Apache- overflow % Reduction80%58%23% Slicing removes 80%, Dep-filtering 58%, multy-inp 23% false+ves

Experimental Results Hundreds of failed invariants as the candidate set Slicing removes nearly 80% false positives Dependence filtering removes 58% remaining false+ves Final step includes root cause for 7 out of 8 bugs –Last filter removes true root cause for SQL-int-overflow bug Tool provides only 5 to 17 expressions as root cause 26

Comparison With Tarantula and Ochiai 27 Statistical techniques rank each statement based on formula Performed better for SQL-int-overflow & SQL-precision bugs Good for bugs where control flow diverges from good runs Our approach better than Tarantula/Ochiai in 6 out of 8 bugs

Summary Tool can identify only 5 − 17 candidate root causes Generates invariants using auto-generated similar inputs Very few “similar” good inputs to get tighter invariants Novel filters to reduce false positives Questions? 28