Classifying Software Faults to Improve Fault Detection Effectiveness

Classifying Software Faults to Improve Fault Detection Effectiveness
Technical Briefing NASA OSMA Software Assurance Symposium September 9-11, 2008 Contingency Software in Autonomous Systems Objective: 1) enhance diagnostic techniques to identify failures 2) provide software contingencies to mitigate failures 3) perform tool-based verification of contingency software and 4) apply results to ARP (and MSL) to pave the way to more resilient, adaptive unmanned systems. The contingency plan for the autonomous rotorcraft before SW contingencies is to terminate flight - good because the rotorcraft doesn’t fly away and crash somewhere (like highway 101) when it runs out of fuel. Challenging because flight termination means “full down collective and throttle idle setting” which can result in hard landing. Our SW contingencies support mission completion or improved flight termination (landing in some cases). Summary: Mitigate failures via software contingencies resulting in safer, more reliable autonomous vehicles in space and in FAA national airspace. Uniqueness: 1) Adding intelligent diagnostic capabilities without introducing non-determinism. 2) Responding to anomalous situations currently beyond the scope of the nominal fault protection. 3) Contingency planning using the SAFE (Software Adjusts Failed Equipment) approach. Allen P. Nikora, JPL/Caltech This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology under a contract with the National Aeronautics and Space Administration. The work was sponsored by the NASA Office of Safety and Mission Assurance under the Software Assurance Research Program led by the NASA Software IV&V Facility. This activity is managed locally at JPL through the Assurance and Technology Program Office 09/09/2008 SAS08_Classify_Defects_Nikora

SAS08_Classify_Defects_Nikora
Agenda Problem/Approach Relevance to NASA Accomplishments and/or Tech Transfer Potential Related Work Technology Readiness Level Data Availability Impediments to Research or Application Next Steps 09/09/2008 SAS08_Classify_Defects_Nikora

Problem/Approach All software systems contain faults Different types of faults exhibit different types of failure behavior Different types of faults require different identification techniques Some faults are easier to find than others. Likelihood of detecting and removing software faults during development and testing, as well as the possible strategies for dealing with residual faults during mission operations depend on the fault type. Goals are to Determine the relative frequencies of specific types of faults and to identify trends in those frequencies Develop effective techniques for identifying and removing faults or making their effects. Develop guidelines, based on the analysis of faults and failures, for applying the techniques based on the context of current and future missions. 09/09/2008 SAS08_Classify_Defects_Nikora

Problem/Approach (cont’d)
What must be done? Analyze software failure data (test and operations) from historical, current JPL and NASA missions and classify the underlying software faults. Further classify the faults by criticality (e.g., non-critical, significant mission impact, mission critical), and detection phase. Perform statistical analysis Proportions of faults of each category. Conditional frequencies (e.g., percentage of critical faults among aging-related bugs, percentage of aging-related bugs among the critical faults). Trends in conditional frequencies (within and across missions). Determine criteria for further classifying faults (e.g., for the aging-related bugs: faults causing round-off errors, faults causing memory leaks, etc.) to identify classes of faults with high criticality and low detectability. For highly critical faults that are difficult to detect prior to release, develop techniques for: Identifying component(s) most likely to contain these types of faults. Improving the detectability of the faults with model-based verification or static analysis tools, as well as during testing. Masking the faults via fault-tolerance (e.g, software rejuvenation for aging-related faults) Such techniques must be able to accurately distinguish between behavioral changes resulting from normal changes in the system’s operating environment input space and those brought about by aging-related faults. Develop guidelines for implementing techniques in the context of current, future missions. 09/09/2008 SAS08_Classify_Defects_Nikora

Relevance to NASA Different types of faults have different types of effects. Choose fault identification/mitigation strategies based on types of failures encountered in system being developed Bohrbugs Deterministically cause failures Easiest to find during testing Fault-tolerance of the operational system can mainly be achieved with design diversity Mandelbugs difficult to find, isolate, and correct during testing Re-execution of an operation that failed because of a Mandelbug will generally not result in another failure Fault-tolerance can be achieved by simple retries or more sophisticated approaches like checkpointing, and recovery-oriented computing Aging-related Tendency of causing a failure increases with the system run-time Proactive measures that clean the internal system state (software rejuvenation) and thus reduce the failure rate are useful Aging can be a significant threat to NASA software systems, (e.g., continuously operating planetary exploration spacecraft flight control systems), since aging-related faults are often difficult to find during development Related work Rejuvenation has been implemented in many different kinds of software systems, including telecommunications system], transaction processing systems], and cluster servers Various types of software systems, like web servers and military systems, have been found to age 09/09/2008 SAS08_Classify_Defects_Nikora

Accomplishments and/or Tech Transfer Potential
Collected over 40,000 failure records from JPL problem reporting system Operational failures and failures observed during system test, ATLO operations All failures (software and non-software) Over 2 dozen projects represented Planetary exploration Earth-orbiter Instruments Continued analysis of software failures Classified flight software failures for 18 projects Classification of ground software failures for same 18 missions in progress Completed statistical analysis of flight software failure data Started application of machine learning/data mining techniques to improve classification accuracy: Software vs. non-software failures Types of software failures Supervised vs. unsupervised learning 09/09/2008 SAS08_Classify_Defects_Nikora

Related Work Working with Software Quality Initiative project at JPL to analyze software failure reports across multiple projects Work performed includes: Applying software reliability modeling techniques to multiple sets of failure data for a current flight project to determine. Investigating use of text mining/machine learning techniques for discriminating between ground software anomalies encountered during test and ground software anomalies encountered during operations Both types of anomalies reported using same problem reporting system; operations vs. test anomalies not consistently reported. 09/09/2008 SAS08_Classify_Defects_Nikora

Potential Applications
Improved SW reliability, SW development practices for NASA software systems Identify/develop most appropriate techniques for identifying most common types of defects Identify/develop appropriate techniques for preventing introduction of most common types of defects Identify/develop fault mitigation techniques for most common types of defects Applicable to mission/safety-critical software systems Human-rated systems Robotic planetary exploration systems Critical ground support systems (e.g., planning and sequencing, navigation, engineering analysis) 09/09/2008 SAS08_Classify_Defects_Nikora

Technology Readiness Level
Current 3 (defect classification) Target Level 3 “Analytical and experimental critical function and/or characteristic proof-of-concept achieved in a laboratory environment” Limited functionality Small representative datasets 09/09/2008 SAS08_Classify_Defects_Nikora

Data Availability Data collected from JPL Problem Reporting system Failure reports during test, operations for current, historical missions going back to Voyager Over 40,000 failure reports Software and non-software Flight vs. ground software Detailed descriptions of observed problem, analysis and verification, and corrective action in PR database. Additional problem reports available for DSN software Software-related failures are tagged by a “Cause” field However, analysis to date indicates that “Cause” field is not reliable. For a limited sample of problem reports All problems identified as “Software” are software-related, but Some problems identifies as non-software are also software For a subset of problem reports analyzed by Nelson Green et al., analysis indicates that relying on “Cause” field to identify software anomalies may undercount them by a factor of 4-6. 09/09/2008 SAS08_Classify_Defects_Nikora

Impediments to Research or Application
More software failures may have occurred than what is documented in problem reporting database Will require detailed analysis of a statistically significant subset of database to obtain more accurate counts of different types of failures Currently applying text mining/machine learning techniques to Identify software failures Classify identified software failures by type Potential issue involves computation time required to complete an experimental learning run. Example – applying 34 WEKA machine learners to Nelson Green data set to distinguish between SW and non-SW anomalies took approximately 3 weeks per data set. Experiments for remaining time will have to be designed to make most effective use of currently available results so as to minimize computation time required. 09/09/2008 SAS08_Classify_Defects_Nikora

Next steps Complete analysis of failures Complete analysis of ground software ISAs by end of September, 2008. Complete statistical analyses for all failures to identify trends: Proportions of software failures Proportions of Bohrbugs vs. Mandelbugs vs. aging-related bugs Complete experiments with machine learning/data mining; identify most appropriate failure data representations and learning models to distinguish between: Software and non-software failures – find additional software failures in problem reporting system and classify them. Can improve accuracy of software failure type classification Different types of software failures Based on analyses of proportions and trends in failure data, identify/develop appropriate fault prevention/mitigation strategies (e.g., software rejuvenation) Other software improvement/defect analysis tasks and organizations at JPL have expressed interest in collaborating with this effort: JPL Software Product and Process Assurance Group JPL Software Quality Improvement project 09/09/2008 SAS08_Classify_Defects_Nikora

Technical Details

Fault Classifications
Classification Scheme: The following definitions of software fault types are based on [Grottke05a, Grottke05b]: Mandelbug := A fault whose activation and/or error propagation are complex, where “complexity” can either be caused by interactions of the software application with its system-internal environment (hardware, operating system, other applications), or by a time lag between the fault activation and the occurrence of a failure. Typically, a Mandelbug is difficult to isolate, and/or the failures caused by it are not systematically reproducible. (Sometimes, Mandelbugs are – incorrectly – referred to as Heisenbugs.) Bohrbug := A fault that is easily isolated and that manifests consistently under a well-defined set of conditions, because its activation and error propagation lack “complexity.” Complementary antonym of Mandelbug. Aging-related bug := A fault that leads to the accumulation of internal error states, resulting in an increased failure rate and/or degraded performance. Sub-type of Mandelbug. According to these definitions, the classes of Bohrbugs, aging-related bugs, and non-aging-related Mandelbugs partition the space of all software faults. References: [Grottke05a] M. Grottke and K. S. Trivedi, “Software faults, software aging and software rejuvenation,” Journal of the Reliability Engineering Association of Japan 27(7):425–438, 2005. [Grottke05b] M. Grottke and K. S. Trivedi, “A classification of software faults,” Supplemental Proc. Sixteenth International Symposium on Software Reliability Engineering, 2005, pp. Previous Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Mission Characteristics Summary
ID Launch order Normal. duration Fault type proportions BOH NAM ARB UNK 1 10 0.705 0.000 1.000 2 4 0.911 0.571 0.379 0.043 0.007 3 17 0.226 0.815 0.130 0.019 0.037 14 0.388 0.429 0.143 5 1.318 0.500 6 12 0.292 0.810 0.048 7 13 0.519 0.231 0.538 8 0.074 9 15 0.376 0.522 0.435 0.595 0.270 0.135 11 0.582 0.554 0.369 0.062 0.015 0.087 0.857 0.706 0.667 0.333 18 0.171 0.643 0.343 0.014 0.753 16 0.657 0.481 0.272 - 1.246 Average proportions 0.510 0.382 0.070 0.038 Standard deviations 0.303 0.285 0.129 0.120 Accomplishments 09/09/2008 SAS08_Classify_Defects_Nikora

Analysis Results Fault type proportions for the eight projects with the largest number of unique faults Accomplishments Next Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Analysis Results (cont’d)
Proportion of Bohrbugs for the four earlier missions Proportion of non-aging-related Mandelbugs for the four earlier missions Accomplishments Next Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Proportion of Bohrbugs for missions 3 and 9, and 95% confidence interval based on the four earlier missions Proportion of Bohrbugs for missions 6 and 14, and 95% confidence interval based on the four earlier missions Accomplishments Next Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Proportion of non-aging-related Mandelbugs for missions 3 and 9, and 95% confidence interval based on the four earlier missions Proportion of non-aging-related Mandelbugs for missions 6 and 14, and 95% confidence interval based on the four earlier missions Accomplishments 09/09/2008 SAS08_Classify_Defects_Nikora

Machine Learning/Text Mining Results
Discriminate between FSW, GSW, Procedural, and “Other” anomalies Use standard text-mining techniques to convert natural-language text in problem description, problem verification, and corrective action fields of anomaly report to vectors. Training set built from anomaly reports analyzed by Nelson Green et al. 3 representations used to date – word counts, word frequencies, TFxIDF. These representations will also be tried with text that includes parts-of-speech (POS) information. POS information can be obtained with publicly-available tools. Apply machine learners implemented in WEKA machine learning environment. 34 machine learners applied. Used 10-fold cross-validation used to build learning models. “Leave-1-out” cross-validation not used to build learning models because of computing time required. “Best” learner found for pd, pf, accuracy, precision, and F-measure Also “best” learner found based on distance of ROC curve from ideal point (0, 1) Discriminate between Bohrbugs, Mandelbugs, and aging-related bugs Training set based on results of classifying flight software anomalies. Developing separate training set for ground software anomalies. Accomplishments Next Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Accomplishments Flight software failures vs. all other failures Next Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Ground software failures vs. all other failures Accomplishments Next Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Accomplishments Flight and ground software failures vs. all other failures Next Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Procedural/process errors vs. all other failures Accomplishments 09/09/2008 SAS08_Classify_Defects_Nikora

Classifying Software Faults to Improve Fault Detection Effectiveness

Similar presentations

Presentation on theme: "Classifying Software Faults to Improve Fault Detection Effectiveness"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Classifying Software Faults to Improve Fault Detection Effectiveness

Similar presentations

Presentation on theme: "Classifying Software Faults to Improve Fault Detection Effectiveness"— Presentation transcript:

Similar presentations

About project

Feedback