An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.

Slides:



Advertisements
Similar presentations
Design of Experiments Lecture I
Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Bidding Protocols for Deploying Mobile Sensors Reporter: Po-Chung Shih Computer Science and Information Engineering Department Fu-Jen Catholic University.
An Empirical Study on Reliability Modeling for Diverse Software Systems Xia Cai and Michael R. Lyu Dept. of Computer Science & Engineering The Chinese.
Alvin Kwan Division of Information & Technology Studies
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Coverage-Based Testing Strategies and Reliability Modeling for Fault- Tolerant Software Systems Presented by: CAI Xia Supervisor: Prof. Michael R. Lyu.
1 Testing Effectiveness and Reliability Modeling for Diverse Software Systems CAI Xia Ph.D Term 4 April 28, 2005.
An Empirical Study on Testing and Fault Tolerance for Software Reliability Engineering Michael R. Lyu, Zubin Huang, Sam Sze, Xia Cai The Chinese University.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
1 Validation and Verification of Simulation Models.
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
1 Software Testing and Quality Assurance Lecture 30 – Testing Systems.
Testing an individual module
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Reliability Modeling for Design Diversity: A Review and Some Empirical Studies Teresa Cai Group Meeting April 11, 2006.
1 The Effect of Code Coverage on Fault Detection Capability: An Experimental Evaluation and Possible Directions Teresa Xia Cai Group Meeting Feb. 21, 2006.
1 Software Testing and Quality Assurance Lecture 5 - Software Testing Techniques.
Enjoyability of English Language Learning from Iranian EFL Learners' Perspective.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.
How Significant Is the Effect of Faults Interaction on Coverage Based Fault Localizations? Xiaozhen Xue Advanced Empirical Software Testing Group Department.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Chapter 6 : Software Metrics
Evaluation of software engineering. Software engineering research : Research in SE aims to achieve two main goals: 1) To increase the knowledge about.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Number of Blocks per Pole Diego Arbelaez. Option – Number of Blocks per Pole Required magnetic field tolerance of ~10 -4 For a single gap this can be.
Evaluating a Research Report
Analyzing and Interpreting Quantitative Data
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Bug Localization with Machine Learning Techniques Wujie Zheng
What Do We Know about Defect Detection Methods P. Runeson et al.; "What Do We Know about Defect Detection Methods?", IEEE Software, May/June Page(s):
Quality Assessment for CBSD: Techniques and A Generic Environment Presented by: Cai Xia Supervisor: Prof. Michael Lyu Markers: Prof. Ada Fu Prof. K.F.
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Experimental Research Methods in Language Learning Chapter 9 Descriptive Statistics.
By James Miller et.all. Presented by Siv Hilde Houmb 1 November 2002
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
CprE 458/558: Real-Time Systems
1 Ch. 1: Software Development (Read) 5 Phases of Software Life Cycle: Problem Analysis and Specification Design Implementation (Coding) Testing, Execution.
Assessing Peer Support and Usability of Blogging Technology Yao Jen Chang Department of Electronic Engineering Chung-Yuan Christian University, Taiwan.
CSc 461/561 Information Systems Engineering Lecture 5 – Software Metrics.
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Rescaling Reliability Bounds for a New Operational Profile Peter G Bishop Adelard, Drysdale Building, Northampton Square,
Software Development Problem Analysis and Specification Design Implementation (Coding) Testing, Execution and Debugging Maintenance.
Chapter 6: Analyzing and Interpreting Quantitative Data
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
Computer Science 1 Systematic Structural Testing of Firewall Policies JeeHyun Hwang 1, Tao Xie 1, Fei Chen 2, and Alex Liu 2 North Carolina State University.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
8/23/00ISSTA Comparison of Delivered Reliability of Branch, Data Flow, and Operational Testing: A Case Study Phyllis G. Frankl Yuetang Deng Polytechnic.
Lecture №4 METHODS OF RESEARCH. Method (Greek. methodos) - way of knowledge, the study of natural phenomena and social life. It is also a set of methods.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Week#3 Software Quality Engineering.
Mingze Zhang, Mun Choon Chan and A. L. Ananda School of Computing
Experience Report: System Log Analysis for Anomaly Detection
Software Testing.
Random Testing: Theoretical Results and Practical Implications IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2012 Andrea Arcuri, Member, IEEE, Muhammad.
Discussions on Software Reliability
Random Testing.
Testing, Reliability, and Interoperability Issues in CORBA Programming Paradigm 11/21/2018.
Presented by: CAI Xia Ph.D Term2 Presentation April 28, 2004
Using Automated Program Repair for Evaluating the Effectiveness of
By Hyunsook Do, Sebastian Elbaum, Gregg Rothermel
Meta-analysis, systematic reviews and research syntheses
Presentation transcript:

An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005

2 Outline  Introduction  Motivation  Experimental evaluation Fault analysis Failure probability Fault density Reliability improvement  Discussions  Conclusion and future work

3 Introduction  N-version programming is one of the main techniques for software fault tolerance  It has been adopted in some mission-critical applications  Yet, its effectiveness is still an open question What is reliability enhancement? Is the fault correlation between multiple versions a big issue that affects the final reliability?

4 Introduction (cont’)  Empirical and theoretical investigations have been conducted based on experiments, modeling, and evaluations Avizienis and Chen (1977), Knight and Leveson (1986), Kelly and Avizienis (1983), Avizienis, Lyu and Schuetz (1988), Eckhardt et al (1991), Lyu and He (1993) Eckhardt and Lee (1985), Littlewood and Miller (1989), Popov et al. (2003) Belli and Jedrzejowicz (1990), Littlewood. et al (2001), Teng and Pham (2002)  No conclusive estimation can be made because of the size, population, complexity and comparability of these experiments

5 Research questions  What is the reliability improvement of NVP?  Is fault correlation a big issue that will affect the final reliability?  What kind of empirical data can be comparable with previous investigations?

6 Motivation  To address the reliability and fault correlation issues in NVP  To conduct a comparable experiment with previous empirical studies  To investigate the “variant” and “invariant” features in NVP

7 Experimental background  Some features about the experiment Complexity Large population Well-defined Statistical failure and fault records  Previous empirical studies UCLA Six-Language project NASA 4-University project Knight and Leveson’s experiment Lyu-He study

8 Experimental setup  RSDIMU avionics application  34 program versions  A team of 4 students  Comprehensive testing exercised Acceptance testing: 800 functional test cases and 400 random test cases Operational testing: 100,000 random test cases  Failures and faults collected and studied  Qualitative as well as quantitative comparisons with NASA 4-University project performed

9 Experimental description  Geometry  Data flow diagram

10 Comparisons between the two projects  Qualitative comparisons General features Fault analysis in development phase & operational test  Quantitative comparisons Failure probability Fault density Reliability improvement

11 General features comparison

12 Faults in development phase

13 Distribution of related faults

14 Fault analysis in development phase  Common related faults  Display module (easiest part)  Calculation in wrong frame of reference  Initialization problems  Missing certain scaling computation  Faults in NASA project only  Division by zero  Incorrect conversion factor  wrong coordinate system problem.

15 Fault analysis in development phase (cont’)  Both cause and effect of some related faults remain the same  Related faults occurred in both easy and difficult subdomains  Some common problems, e.g., initialization problem, exist for different programming languages  The most fault-prone module is the easiest part of the application

16 Faults in operational test

17 Faults in operational test (cont’)  These faults are all related to the same module, i.e., sensor failure detection and isolation problem  Fault pair (34.2 & 22.1) : 25 coincidence failures  Fault pair (34.3 & 29.1) : 32 coincidence failures  Yet these two pairs are quite different in nature  Version 34 shows the lowest quality Poor program logic and design organization Hard coding  The overall performance of NVP derived from our data would be better if the data from version 34 are ignored

18 Input/Output domain classification  Normal operations are classified as: Si,j = {i sensors previously failed and j of the remaining sensors fail | i = 0, 1, 2; j = 0, 1 }  Exceptional operations: S others

19 Failures in operational test  States S 0,0, S 1,0 and S 2,0 are more reliable than states S 0,1, S 1,1, S 2,1  Exceptional state reveals most of the failures  The failure probability in S 0,1 is the highest  The programs inherit high reliability on average

20 Coincident failures  Two or more versions fail at the same test case, whether the outputs identical or not  The percentage of coincident failures versus total failures is low: Version 22: 25/618=4% Version 29: 32/2760=1.2% Version 32: (25+32)/1351=4.2%

21 Fault density  Six faults identified in 4 out of 34 versions  The size of these versions varies from 1455 to 4512 source lines of code  Average fault density: one fault per 10,000 lines  It is close to industry-standard for high quality software systems

22 Failure bounds for 2-version system  Lower and upper bounds for coincident failure probability under Popov et al model  DP1: normal test cases without sensor failures dominates all the testing cases  DP3: the test cases evenly distributed in all subdomains  DP2: between DP1 & DP3 Version pair DP1DP2DP3 Lower bound Upper bound Lower bound Upper bound Lower bound Upper bound (22,34) (29,34) Average in our project 1.25* * * * Average in NASA project 2.32*

23 Quantitative comparison in operational test  NASA 4-university project: 7 out of 20 versions passed the operational testing  Coincident failures were found among 2 to 8 versions  5 out of 7 faults were not observed in our project

24 Observations  The different on fault number and fault density is not significant  In NASA project: The number of failures and coincident failures in NASA project is much higher Although there is coincident failures in 2- to 8-version combinations, the reliability improvement for 3-version system still achieves 80~330 times better  In our project: Average failure rate is 50 times better The reliability improvement for 3-version system is 30~60 times better

25 Invariants  Reliable program versions with low failure probability  Similar number of faults and fault density  Distinguishable reliability improvement for NVP, with 10 2 to 10 4 times enhancement  Related faults observed in both difficult and easy parts of the application

26 Variants  Compared with NASA project, our project: Some faults not observed Less failures less coincident failures Only 2-version coincident failures (other than 2- to 8- version failures) The overall reliability improvement is an order of magnitude larger

27 Discussions  The improvement of our project may attributed to stable specification better programming training experience in NVP experiment cleaner development protocol different programming languages & platforms

28 Discussions (cont’)  The hard-to-detected faults are only hit by some rare input domains  New testing strategy is needed to detect such faults: Code coverage? Domain analysis?

29 Conclusion  An empirical investigation is performed to evaluate reliability features by a comprehensive comparisons on two NVP projects  NVP can provides distinguishable improvement for final reliability according to our empirical study  Small number of coincident failures provides a supportive evidence for NVP  Possible attributes that may affect the reliability improvement are discussed

30 Future Work  Apply more intensive testing on both Pascal and C programs  Conduct cross-comparison on these program versions developed by different programming languages  Investigate the reliability enhancement of NVP based on the combined set of program versions

Thank you ! Q & A