Classifying Software Faults to Improve Fault Detection Effectiveness

Slides:



Advertisements
Similar presentations
Integra Consult A/S Safety Assessment. Integra Consult A/S SAFETY ASSESSMENT Objective Objective –Demonstrate that an acceptable level of safety will.
Advertisements

Failure Modes and Effects Analysis A Failure Modes and Effects Analysis (FMEA) tabulates failure modes of equipment and their effects on a system or plant.
SE 450 Software Processes & Product Metrics Reliability: An Introduction.
Software Defect Modeling at JPL John N. Spagnuolo Jr. and John D. Powell 19th International Forum on COCOMO and Software Cost Modeling 10/27/2004.
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 1 Software Test and Analysis in a Nutshell.
(c) 2007 Mauro Pezzè & Michal Young Ch 3, slide 1 Basic Principles.
Swami NatarajanJuly 14, 2015 RIT Software Engineering Reliability: Introduction.
Software Testing Prasad G.
National Aeronautics and Space Administration SAS08_Classify_Defects_Nikora1 Software Reliability Techniques Applied to Constellation Allen P. Nikora,
1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.
SAS_08_AADL_Exec_Gluch MAC-T IVV Model-Based Software Assurance with the SAE Architecture Analysis & Design Language (AADL) California Institute.
CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.
1 Software Testing and Quality Assurance Lecture 33 – Software Quality Assurance.
Protecting the Public, Astronauts and Pilots, the NASA Workforce, and High-Value Equipment and Property Mission Success Starts With Safety Believe it or.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
California Institute of Technology Formalized Pilot Study of Safety- Critical Software Anomalies Dr. Robyn Lutz and Carmen Mikulski This research was carried.
California Institute of Technology Formalized Pilot Study of Safety- Critical Software Anomalies Dr. Robyn Lutz and Carmen Mikulski Software Assurance.
CprE 458/558: Real-Time Systems
Fault Tolerance Benchmarking. 2 Owerview What is Benchmarking? What is Dependability? What is Dependability Benchmarking? What is the relation between.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Intelligent Systems Software Assurance Symposium 2004 Bojan Cukic & Yan Liu, Robyn Lutz & Stacy Nelson, Chris Rouff, Johann Schumann, Margaret Smith July.
National Aeronautics and Space Administration SAS08_Classify_Defects_Nikora1 Classifying Software Faults to Improve Fault Detection Effectiveness Allen.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
1 Experience from Studies of Software Maintenance and Evolution Parastoo Mohagheghi Post doc, NTNU-IDI SEVO Seminar, 16 March 2006.
SAS_05_Contingency_Lutz_Tal1 Contingency Software in Autonomous Systems Robyn Lutz, JPL/Caltech & ISU Doron Tal, USRA at NASA Ames Ann Patterson-Hine,
SAS_08_Legacy_Safety_Hill Assurance and Recertification of Safety Critical Software In Legacy Systems Janie Hill NASA Kennedy Space Center, Florida
2/21/20161 California Institute of Technology Software Anomaly Trends in JPL Missions Assurance Technology Program Office presented by Allen P. Nikora.
SwCDR (Peer) Review 1 UCB MAVEN Particles and Fields Flight Software Critical Design Review Peter R. Harvey.
1 Creating Situational Awareness with Data Trending and Monitoring Zhenping Li, J.P. Douglas, and Ken. Mitchell Arctic Slope Technical Services.
 System Requirement Specification and System Planning.
Week#3 Software Quality Engineering.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 15: Sample size and Power Marshall University Genomics.
Introduction to Machine Learning, its potential usage in network area,
Software Defects Cmpe 550 Fall 2005
Chapter 19: Network Management
Inspecting Software Requirement Document
SOFTWARE TESTING Date: 29-Dec-2016 By: Ram Karthick.
OPERATING SYSTEMS CS 3502 Fall 2017
Introduction Edited by Enas Naffar using the following textbooks: - A concise introduction to Software Engineering - Software Engineering for students-
School of Business Administration
Hardware & Software Reliability
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Software Verification and Validation
Software Architecture in Practice
Chapter 18 Maintaining Information Systems
Software Testing An Introduction.
Chapter 8 – Software Testing
Software Reliability Definition: The probability of failure-free operation of the software for a specified period of time in a specified environment.
IEEE Std 1074: Standard for Software Lifecycle
Object oriented system development life cycle
Intelligent Systems Software Assurance Symposium 2004
Some Simple Definitions for Testing
BASICS OF SOFTWARE TESTING Chapter 1. Topics to be covered 1. Humans and errors, 2. Testing and Debugging, 3. Software Quality- Correctness Reliability.
Chapter 12 Using Descriptive Analysis, Performing
Introduction Edited by Enas Naffar using the following textbooks: - A concise introduction to Software Engineering - Software Engineering for students-
Fault Injection: A Method for Validating Fault-tolerant System
Survey phases, survey errors and quality control system
Lecture 09:Software Testing
leaks thru rupture sticks open closed
Fault Tolerance Distributed Web-based Systems
Survey phases, survey errors and quality control system
IENG 451 / 452 Data Collection: Data Types, Collection Protocols
Chapter 13 Quality Management
MANUFACTURING SYSTEMS
Chapter # 7 Software Quality Metrics
Introduction To Distributed Systems
The role of metadata in census data dissemination
Chapter 7 Software Testing.
Abstractions for Fault Tolerance
Presentation transcript:

Classifying Software Faults to Improve Fault Detection Effectiveness Technical Briefing NASA OSMA Software Assurance Symposium September 9-11, 2008 Contingency Software in Autonomous Systems Objective: 1) enhance diagnostic techniques to identify failures 2) provide software contingencies to mitigate failures 3) perform tool-based verification of contingency software and 4) apply results to ARP (and MSL) to pave the way to more resilient, adaptive unmanned systems. The contingency plan for the autonomous rotorcraft before SW contingencies is to terminate flight - good because the rotorcraft doesn’t fly away and crash somewhere (like highway 101) when it runs out of fuel. Challenging because flight termination means “full down collective and throttle idle setting” which can result in hard landing. Our SW contingencies support mission completion or improved flight termination (landing in some cases). Summary: Mitigate failures via software contingencies resulting in safer, more reliable autonomous vehicles in space and in FAA national airspace. Uniqueness: 1) Adding intelligent diagnostic capabilities without introducing non-determinism. 2) Responding to anomalous situations currently beyond the scope of the nominal fault protection. 3) Contingency planning using the SAFE (Software Adjusts Failed Equipment) approach. Allen P. Nikora, JPL/Caltech This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology under a contract with the National Aeronautics and Space Administration. The work was sponsored by the NASA Office of Safety and Mission Assurance under the Software Assurance Research Program led by the NASA Software IV&V Facility. This activity is managed locally at JPL through the Assurance and Technology Program Office 09/09/2008 SAS08_Classify_Defects_Nikora

SAS08_Classify_Defects_Nikora Agenda Problem/Approach Relevance to NASA Accomplishments and/or Tech Transfer Potential Related Work Technology Readiness Level Data Availability Impediments to Research or Application Next Steps 09/09/2008 SAS08_Classify_Defects_Nikora

SAS08_Classify_Defects_Nikora Problem/Approach All software systems contain faults Different types of faults exhibit different types of failure behavior Different types of faults require different identification techniques Some faults are easier to find than others. Likelihood of detecting and removing software faults during development and testing, as well as the possible strategies for dealing with residual faults during mission operations depend on the fault type. Goals are to Determine the relative frequencies of specific types of faults and to identify trends in those frequencies Develop effective techniques for identifying and removing faults or making their effects. Develop guidelines, based on the analysis of faults and failures, for applying the techniques based on the context of current and future missions. 09/09/2008 SAS08_Classify_Defects_Nikora

Problem/Approach (cont’d) What must be done? Analyze software failure data (test and operations) from historical, current JPL and NASA missions and classify the underlying software faults. Further classify the faults by criticality (e.g., non-critical, significant mission impact, mission critical), and detection phase. Perform statistical analysis Proportions of faults of each category. Conditional frequencies (e.g., percentage of critical faults among aging-related bugs, percentage of aging-related bugs among the critical faults). Trends in conditional frequencies (within and across missions). Determine criteria for further classifying faults (e.g., for the aging-related bugs: faults causing round-off errors, faults causing memory leaks, etc.) to identify classes of faults with high criticality and low detectability. For highly critical faults that are difficult to detect prior to release, develop techniques for: Identifying component(s) most likely to contain these types of faults. Improving the detectability of the faults with model-based verification or static analysis tools, as well as during testing. Masking the faults via fault-tolerance (e.g, software rejuvenation for aging-related faults) Such techniques must be able to accurately distinguish between behavioral changes resulting from normal changes in the system’s operating environment input space and those brought about by aging-related faults. Develop guidelines for implementing techniques in the context of current, future missions. 09/09/2008 SAS08_Classify_Defects_Nikora

SAS08_Classify_Defects_Nikora Relevance to NASA Different types of faults have different types of effects. Choose fault identification/mitigation strategies based on types of failures encountered in system being developed Bohrbugs Deterministically cause failures Easiest to find during testing Fault-tolerance of the operational system can mainly be achieved with design diversity Mandelbugs difficult to find, isolate, and correct during testing Re-execution of an operation that failed because of a Mandelbug will generally not result in another failure Fault-tolerance can be achieved by simple retries or more sophisticated approaches like checkpointing, and recovery-oriented computing Aging-related Tendency of causing a failure increases with the system run-time Proactive measures that clean the internal system state (software rejuvenation) and thus reduce the failure rate are useful Aging can be a significant threat to NASA software systems, (e.g., continuously operating planetary exploration spacecraft flight control systems), since aging-related faults are often difficult to find during development Related work Rejuvenation has been implemented in many different kinds of software systems, including telecommunications system], transaction processing systems], and cluster servers Various types of software systems, like web servers and military systems, have been found to age 09/09/2008 SAS08_Classify_Defects_Nikora

Accomplishments and/or Tech Transfer Potential Collected over 40,000 failure records from JPL problem reporting system Operational failures and failures observed during system test, ATLO operations All failures (software and non-software) Over 2 dozen projects represented Planetary exploration Earth-orbiter Instruments Continued analysis of software failures Classified flight software failures for 18 projects Classification of ground software failures for same 18 missions in progress Completed statistical analysis of flight software failure data Started application of machine learning/data mining techniques to improve classification accuracy: Software vs. non-software failures Types of software failures Supervised vs. unsupervised learning 09/09/2008 SAS08_Classify_Defects_Nikora

SAS08_Classify_Defects_Nikora Related Work Working with Software Quality Initiative project at JPL to analyze software failure reports across multiple projects Work performed includes: Applying software reliability modeling techniques to multiple sets of failure data for a current flight project to determine. Investigating use of text mining/machine learning techniques for discriminating between ground software anomalies encountered during test and ground software anomalies encountered during operations Both types of anomalies reported using same problem reporting system; operations vs. test anomalies not consistently reported. 09/09/2008 SAS08_Classify_Defects_Nikora

Potential Applications Improved SW reliability, SW development practices for NASA software systems Identify/develop most appropriate techniques for identifying most common types of defects Identify/develop appropriate techniques for preventing introduction of most common types of defects Identify/develop fault mitigation techniques for most common types of defects Applicable to mission/safety-critical software systems Human-rated systems Robotic planetary exploration systems Critical ground support systems (e.g., planning and sequencing, navigation, engineering analysis) 09/09/2008 SAS08_Classify_Defects_Nikora

Technology Readiness Level Current 3 (defect classification) Target Level 3 “Analytical and experimental critical function and/or characteristic proof-of-concept achieved in a laboratory environment” Limited functionality Small representative datasets 09/09/2008 SAS08_Classify_Defects_Nikora

SAS08_Classify_Defects_Nikora Data Availability Data collected from JPL Problem Reporting system Failure reports during test, operations for current, historical missions going back to Voyager Over 40,000 failure reports Software and non-software Flight vs. ground software Detailed descriptions of observed problem, analysis and verification, and corrective action in PR database. Additional problem reports available for DSN software Software-related failures are tagged by a “Cause” field However, analysis to date indicates that “Cause” field is not reliable. For a limited sample of problem reports All problems identified as “Software” are software-related, but Some problems identifies as non-software are also software For a subset of problem reports analyzed by Nelson Green et al., analysis indicates that relying on “Cause” field to identify software anomalies may undercount them by a factor of 4-6. 09/09/2008 SAS08_Classify_Defects_Nikora

Impediments to Research or Application More software failures may have occurred than what is documented in problem reporting database Will require detailed analysis of a statistically significant subset of database to obtain more accurate counts of different types of failures Currently applying text mining/machine learning techniques to Identify software failures Classify identified software failures by type Potential issue involves computation time required to complete an experimental learning run. Example – applying 34 WEKA machine learners to Nelson Green data set to distinguish between SW and non-SW anomalies took approximately 3 weeks per data set. Experiments for remaining time will have to be designed to make most effective use of currently available results so as to minimize computation time required. 09/09/2008 SAS08_Classify_Defects_Nikora

SAS08_Classify_Defects_Nikora Next steps Complete analysis of failures Complete analysis of ground software ISAs by end of September, 2008. Complete statistical analyses for all failures to identify trends: Proportions of software failures Proportions of Bohrbugs vs. Mandelbugs vs. aging-related bugs Complete experiments with machine learning/data mining; identify most appropriate failure data representations and learning models to distinguish between: Software and non-software failures – find additional software failures in problem reporting system and classify them. Can improve accuracy of software failure type classification Different types of software failures Based on analyses of proportions and trends in failure data, identify/develop appropriate fault prevention/mitigation strategies (e.g., software rejuvenation) Other software improvement/defect analysis tasks and organizations at JPL have expressed interest in collaborating with this effort: JPL Software Product and Process Assurance Group JPL Software Quality Improvement project 09/09/2008 SAS08_Classify_Defects_Nikora

Technical Details

Fault Classifications Classification Scheme: The following definitions of software fault types are based on [Grottke05a, Grottke05b]: Mandelbug := A fault whose activation and/or error propagation are complex, where “complexity” can either be caused by interactions of the software application with its system-internal environment (hardware, operating system, other applications), or by a time lag between the fault activation and the occurrence of a failure. Typically, a Mandelbug is difficult to isolate, and/or the failures caused by it are not systematically reproducible. (Sometimes, Mandelbugs are – incorrectly – referred to as Heisenbugs.) Bohrbug := A fault that is easily isolated and that manifests consistently under a well-defined set of conditions, because its activation and error propagation lack “complexity.” Complementary antonym of Mandelbug. Aging-related bug := A fault that leads to the accumulation of internal error states, resulting in an increased failure rate and/or degraded performance. Sub-type of Mandelbug. According to these definitions, the classes of Bohrbugs, aging-related bugs, and non-aging-related Mandelbugs partition the space of all software faults. References: [Grottke05a] M. Grottke and K. S. Trivedi, “Software faults, software aging and software rejuvenation,” Journal of the Reliability Engineering Association of Japan 27(7):425–438, 2005. [Grottke05b] M. Grottke and K. S. Trivedi, “A classification of software faults,” Supplemental Proc. Sixteenth International Symposium on Software Reliability Engineering, 2005, pp. 4.19-4.20. Previous Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Mission Characteristics Summary ID Launch order Normal. duration Fault type proportions BOH NAM ARB UNK 1 10 0.705 0.000 1.000 2 4 0.911 0.571 0.379 0.043 0.007 3 17 0.226 0.815 0.130 0.019 0.037 14 0.388 0.429 0.143 5 1.318 0.500 6 12 0.292 0.810 0.048 7 13 0.519 0.231 0.538 8 0.074 9 15 0.376 0.522 0.435 0.595 0.270 0.135 11 0.582 0.554 0.369 0.062 0.015 0.087 0.857 0.706 0.667 0.333 18 0.171 0.643 0.343 0.014 0.753 16 0.657 0.481 0.272 - 1.246 Average proportions 0.510 0.382 0.070 0.038 Standard deviations  0.303 0.285 0.129 0.120 Accomplishments 09/09/2008 SAS08_Classify_Defects_Nikora

SAS08_Classify_Defects_Nikora Analysis Results Fault type proportions for the eight projects with the largest number of unique faults Accomplishments Next Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Analysis Results (cont’d) Proportion of Bohrbugs for the four earlier missions Proportion of non-aging-related Mandelbugs for the four earlier missions Accomplishments Next Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Analysis Results (cont’d) Proportion of Bohrbugs for missions 3 and 9, and 95% confidence interval based on the four earlier missions Proportion of Bohrbugs for missions 6 and 14, and 95% confidence interval based on the four earlier missions Accomplishments Next Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Analysis Results (cont’d) Proportion of non-aging-related Mandelbugs for missions 3 and 9, and 95% confidence interval based on the four earlier missions Proportion of non-aging-related Mandelbugs for missions 6 and 14, and 95% confidence interval based on the four earlier missions Accomplishments 09/09/2008 SAS08_Classify_Defects_Nikora

Machine Learning/Text Mining Results Discriminate between FSW, GSW, Procedural, and “Other” anomalies Use standard text-mining techniques to convert natural-language text in problem description, problem verification, and corrective action fields of anomaly report to vectors. Training set built from anomaly reports analyzed by Nelson Green et al. 3 representations used to date – word counts, word frequencies, TFxIDF. These representations will also be tried with text that includes parts-of-speech (POS) information. POS information can be obtained with publicly-available tools. Apply machine learners implemented in WEKA machine learning environment. 34 machine learners applied. Used 10-fold cross-validation used to build learning models. “Leave-1-out” cross-validation not used to build learning models because of computing time required. “Best” learner found for pd, pf, accuracy, precision, and F-measure Also “best” learner found based on distance of ROC curve from ideal point (0, 1) Discriminate between Bohrbugs, Mandelbugs, and aging-related bugs Training set based on results of classifying flight software anomalies. Developing separate training set for ground software anomalies. Accomplishments Next Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Machine Learning/Text Mining Results Accomplishments Flight software failures vs. all other failures Next Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Machine Learning/Text Mining Results Ground software failures vs. all other failures Accomplishments Next Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Machine Learning/Text Mining Results Accomplishments Flight and ground software failures vs. all other failures Next Slide 09/09/2008 SAS08_Classify_Defects_Nikora

Machine Learning/Text Mining Results Procedural/process errors vs. all other failures Accomplishments 09/09/2008 SAS08_Classify_Defects_Nikora