An Empirical Study on Reliability Modeling for Diverse Software Systems Xia Cai and Michael R. Lyu Dept. of Computer Science & Engineering The Chinese.

Slides:



Advertisements
Similar presentations
Design of Experiments Lecture I
Advertisements

Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.
Test Case Filtering and Prioritization Based on Coverage of Combinations of Program Elements Wes Masri and Marwa El-Ghali American Univ. of Beirut ECE.
1 Software Reliability Growth Models Incorporating Fault Dependency with Various Debugging Time Lags Chin-Yu Huang, Chu-Ti Lin, Sy-Yen Kuo, Michael R.
Spring INTRODUCTION There exists a lot of methods used for identifying high risk locations or sites that experience more crashes than one would.
Coverage-Based Testing Strategies and Reliability Modeling for Fault- Tolerant Software Systems Presented by: CAI Xia Supervisor: Prof. Michael R. Lyu.
Software Reliability Engineering: A Roadmap
1 Software Testing and Quality Assurance Lecture 33 – Software Quality Assurance.
1 Testing Effectiveness and Reliability Modeling for Diverse Software Systems CAI Xia Ph.D Term 4 April 28, 2005.
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
Software Testing Using Model Program DESIGN BY HONG NGUYEN & SHAH RAZA Dec 05, 2005.
An Empirical Study on Testing and Fault Tolerance for Software Reliability Engineering Michael R. Lyu, Zubin Huang, Sam Sze, Xia Cai The Chinese University.
An Experimental Evaluation of the Reliability of Adaptive Random Testing Methods Hong Zhu Department of Computing and Electronics, Oxford Brookes University,
Evaluating Hypotheses
Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.
An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Software Defect Modeling at JPL John N. Spagnuolo Jr. and John D. Powell 19th International Forum on COCOMO and Software Cost Modeling 10/27/2004.
1 Validation and Verification of Simulation Models.
Scientific method - 1 Scientific method is a body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and.
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Reliability Modeling for Design Diversity: A Review and Some Empirical Studies Teresa Cai Group Meeting April 11, 2006.
Experimental Evaluation
1 The Effect of Code Coverage on Fault Detection Capability: An Experimental Evaluation and Possible Directions Teresa Xia Cai Group Meeting Feb. 21, 2006.
Statistical Critical Path Selection for Timing Validation Kai Yang, Kwang-Ting Cheng, and Li-C Wang Department of Electrical and Computer Engineering University.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.
L Berkley Davis Copyright 2009 MER301: Engineering Reliability1 LECTURE 2: Chapter 1: Role of Statistics in Engineering Chapter 2: Data Summary and Presentation.
by B. Zadrozny and C. Elkan
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Evaluation of software engineering. Software engineering research : Research in SE aims to achieve two main goals: 1) To increase the knowledge about.
Chapter 8 Introduction to Hypothesis Testing
Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement.
1 Software Testing and Quality Assurance Lecture 33 – Software Quality Assurance.
Distributed QoS Evaluation for Real- World Web Services Zibin Zheng, Yilei Zhang, and Michael R. Lyu July 07, 2010 Department of Computer.
BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing Yilei Zhang, Zibin Zheng, and Michael R. Lyu
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
West Virginia University Towards Practical Software Reliability Assessment for IV&V Projects B. Cukic, E. Gunel, H. Singh, V. Cortellessa Department of.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Test Drivers and Stubs More Unit Testing Test Drivers and Stubs CEN 5076 Class 11 – 11/14.
Software Reliability in Nuclear Systems Arsen Papisyan Anthony Gwyn.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.
Safety Critical Systems 5 Testing T Safety Critical Systems.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
Ensemble Methods in Machine Learning
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Classification Ensemble Methods 1
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Developing Aerospace Applications with a Reliable Web Services Paradigm Pat. P. W. Chan and Michael R. Lyu Department of Computer Science and Engineering.
Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.
Week#3 Software Quality Engineering.
Experience Report: System Log Analysis for Anomaly Detection
A Collaborative Quality Ranking Framework for Cloud Components
Random Testing: Theoretical Results and Practical Implications IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2012 Andrea Arcuri, Member, IEEE, Muhammad.
Random Testing.
A Modified Naïve Possibilistic Classifier for Numerical Data
Test Case Purification for Improving Fault Localization
Network Screening & Diagnosis
Presented by: CAI Xia Ph.D Term2 Presentation April 28, 2004
Institute of Computing Tech.
Machine Learning: Lecture 5
Presentation transcript:

An Empirical Study on Reliability Modeling for Diverse Software Systems Xia Cai and Michael R. Lyu Dept. of Computer Science & Engineering The Chinese University of Hong Kong

Dept. of Computer Science & Engineering 2 Outline Introduction Introduction Objectives and previous work Objectives and previous work Analyses and investigations on reliability models for diverse software systems Analyses and investigations on reliability models for diverse software systems Reliability bounds model by Popov,Strigini, et alReliability bounds model by Popov,Strigini, et al System reliability model by Dugan and LyuSystem reliability model by Dugan and Lyu Discussion Discussion Conclusion Conclusion

Dept. of Computer Science & Engineering 3 Introduction Design diversity is one of the two main techniques for software fault tolerance Design diversity is one of the two main techniques for software fault tolerance The rationale of this approach is the expectation that software programs built differently will fail differently The rationale of this approach is the expectation that software programs built differently will fail differently Reliability models attempt to estimate the probability of coincident failures in multiple versions Reliability models attempt to estimate the probability of coincident failures in multiple versions Empirical data are highly demanded for evaluation and cross-validation of the usefulness and/or effectiveness of these models Empirical data are highly demanded for evaluation and cross-validation of the usefulness and/or effectiveness of these models

Dept. of Computer Science & Engineering 4 Reliability models for design diversity Eckhardt and Lee (1985) Eckhardt and Lee (1985) Variation of difficulty on demand spaceVariation of difficulty on demand space Positive correlations between version failuresPositive correlations between version failures Littlewood and Miller (1989) Littlewood and Miller (1989) Forced design diversityForced design diversity Possibility of negative correlationsPossibility of negative correlations Dugan and Lyu (1995) Dugan and Lyu (1995) Markov reward modelMarkov reward model Tomek and Trivedi (1995) Tomek and Trivedi (1995) Stochastic reward netStochastic reward net Popov, Strigini et al (2003) Popov, Strigini et al (2003) Subdomains on demand spaceSubdomains on demand space Upper/lower bounds for failure probabilityUpper/lower bounds for failure probability Conceptual models Structural models In between

Dept. of Computer Science & Engineering 5 Our objectives To study reliability and fault correlation issues in design diversity by means of mutantation testing To study reliability and fault correlation issues in design diversity by means of mutantation testing To investigate and compare the prediction performance of different existing reliability models for design diversity To investigate and compare the prediction performance of different existing reliability models for design diversity

Dept. of Computer Science & Engineering 6 Our previous work Motivated by the lack of empirical data, we conducted the RSDIMU project in the year Motivated by the lack of empirical data, we conducted the RSDIMU project in the year It took more than 100 students 12 weeks to develop 34 program versions It took more than 100 students 12 weeks to develop 34 program versions 1200 test cases were executed on these program versions 1200 test cases were executed on these program versions 426 mutants were generated by injecting a single fault identified in the testing phase 426 mutants were generated by injecting a single fault identified in the testing phase A number of analyses and evaluations were conducted in our previous work A number of analyses and evaluations were conducted in our previous work

Dept. of Computer Science & Engineering 7 Introduction Introduction Objectives and previous work Objectives and previous work Analyses and investigations on reliability models for diverse software systems Analyses and investigations on reliability models for diverse software systems Reliability bounds model by Popov,Strigini, et alReliability bounds model by Popov,Strigini, et al (PS model) (PS model) System reliability model by Dugan and LyuSystem reliability model by Dugan and Lyu (DL model) (DL model) Discussion Discussion Conclusion Conclusion Outline

Dept. of Computer Science & Engineering 8 PS Model Proposed by P. T. Popov, L. Strigini, J. May and S. Kuball (2003) Target: give the upper and “ likely ” lower bounds for probability of coincident failures Assumptions: Given the knowledge on disjoint subdomains S i on the demand space, i.e., 1)the probability P(S i ) of a random demand being drawn from S i; 2)the probabilities of failure on demand (pfds) of A and B for demands from S i, P A|Si and P B|Si.

Dept. of Computer Science & Engineering 9 PS Model (cont’) Alternative estimates for probability of failures on demand (pfd) of a 1-out-of-2 system Alternative estimates for probability of failures on demand (pfd) of a 1-out-of-2 system

Dept. of Computer Science & Engineering 10 PS Model (cont’) Upper bound of system pfd Upper bound of system pfd “ Likely ” lower bound of system pfd “ Likely ” lower bound of system pfd - under the assumption of conditional independence

Dept. of Computer Science & Engineering 11 Experimental setup Mutants are treated as program versions in our experiment Mutants are treated as program versions in our experiment 1200 test cases are divided into seven categories by the system status 1200 test cases are divided into seven categories by the system status The first 800 test cases (manually designed for functionality testing) are used as qualification test and other 400 test cases (randomly generated) as operational test The first 800 test cases (manually designed for functionality testing) are used as qualification test and other 400 test cases (randomly generated) as operational test

Dept. of Computer Science & Engineering 12 Programs passed qualification test Information on subdomains Failure data and demand profile Failure data and demand profile Upper bounds Lower bounds subdomains Faults in operational test hypothetical real Analysis

Dept. of Computer Science & Engineering 13 Estimation Method Since no failure was observed in some subdomains, we adopt confidence bounds method rather than point estimates method in our experiment Since no failure was observed in some subdomains, we adopt confidence bounds method rather than point estimates method in our experiment One-sided confidence bounds (Bayesian Bounds) are computed for the probabilities of failures One-sided confidence bounds (Bayesian Bounds) are computed for the probabilities of failures 90% confidence upper bounds as well as lower bounds on pfds of mutants in subdomains under all demand profiles were estimated 90% confidence upper bounds as well as lower bounds on pfds of mutants in subdomains under all demand profiles were estimated

Dept. of Computer Science & Engineering 14 Bayesian Bounds under DP4 90% confidence upper bounds on pfds in subdomains 90% confidence upper bounds on pfds in subdomains 90% confidence lower bounds on pfds in subdomains 90% confidence lower bounds on pfds in subdomains

Dept. of Computer Science & Engineering 15 Upper bounds Failure LowerAnalysis Upper bounds on the joint pfds under all Demand Profiles Upper bounds on the joint pfds under all Demand Profiles

Dept. of Computer Science & Engineering 16 Lower Bounds FailureUpper Analysis “ Likely ” lower bounds on the joint pfds under Demand Profiles “ Likely ” lower bounds on the joint pfds under Demand Profiles

Dept. of Computer Science & Engineering 17 Analysis on upper/lower bounds Mutant pairs Failure features Performance comparison Covariance in failures Upper bounds Lower bounds (117, 305) No correlation Observed Fail differently Positive (DP1) Negative (others) Smaller than min(P A,P B ) Larger than P A *P B in DP1 (215, 382) CorrelationObserved Mutant 382 performs worse in all subdomains Always positive Equal to P 215 Larger in all DPs (382, 403) CorrelationObserved Perform differently Positive (DP1&2) Negative(DP3&4) Smaller than min(P A,P B ) Larger in DP1&2 Failure LowerUpper

Dept. of Computer Science & Engineering 18 Discussion With our data, the confidence bounds in PS model are tighter than P A *P B and min(P A, P B ) under most circumstances except With our data, the confidence bounds in PS model are tighter than P A *P B and min(P A, P B ) under most circumstances except One program performs worse than the other in all subdomainsOne program performs worse than the other in all subdomains Negative covariance holds between the failure probability of two programsNegative covariance holds between the failure probability of two programs Difficulties and limitations of PS model Difficulties and limitations of PS model The way to divide the demand space into disjoint subdomainsThe way to divide the demand space into disjoint subdomains The thorough knowledge on the probability and performance of all the versions in each subdomainThe thorough knowledge on the probability and performance of all the versions in each subdomain

Dept. of Computer Science & Engineering 19 DL Model Proposed by Dugan and Lyu (1995) Proposed by Dugan and Lyu (1995) 3-level reliability model 3-level reliability model A Markov model detailing the system structureA Markov model detailing the system structure Two fault trees presenting the causes of failures in the initial configuration and the reconfigured stateTwo fault trees presenting the causes of failures in the initial configuration and the reconfigured state Assumptions Assumptions Unrelated faults: different erroneous resultsUnrelated faults: different erroneous results Related faults: similar erroneous resultsRelated faults: similar erroneous results

Dept. of Computer Science & Engineering 20 DL Model Example: Reliability model of DRB Example: Reliability model of DRB

Dept. of Computer Science & Engineering 21 DL Model (cont’) Fault tree models for 2-, 3-, and 4-version systems Fault tree models for 2-, 3-, and 4-version systems

Dept. of Computer Science & Engineering 22 Results of DL model with our project data The new experimental data is applied to verify the effectiveness and consistency of DL model The new experimental data is applied to verify the effectiveness and consistency of DL model Six mutants with various failure characteristics are employed in the operational test Six mutants with various failure characteristics are employed in the operational test

Dept. of Computer Science & Engineering 23 Results of DL model with our project data Failure characteristics for 2,3,4-version configurations Failure characteristics for 2,3,4-version configurations

Dept. of Computer Science & Engineering 24 Results of DL model with our project data Summary of parameter values Summary of parameter values Prob. of related faults between two versions Prob. of unrelated faults Prob. of related faults in all versions

Dept. of Computer Science & Engineering 25 Results of DL model with our project data Predicted reliability by different configurations Predicted reliability by different configurations

Dept. of Computer Science & Engineering 26 Results of DL model with our project data Predicted safety by different configurations Predicted safety by different configurations

Dept. of Computer Science & Engineering 27 Discussion Compared our project with former project, the reliability and safety performance of DRB, NVP, NSCP shows consistency of DL model with respect to our experimental data Compared our project with former project, the reliability and safety performance of DRB, NVP, NSCP shows consistency of DL model with respect to our experimental data The discrepancy in the first thousands of hours may indicate dependence on operational domains The discrepancy in the first thousands of hours may indicate dependence on operational domains The simplified classification of related and unrelated faults need to be improved by including real-life scenarios The simplified classification of related and unrelated faults need to be improved by including real-life scenarios To achieve more accurate results, the information about the correlation between successive executions should be included To achieve more accurate results, the information about the correlation between successive executions should be included

Dept. of Computer Science & Engineering 28 Comparison of PS & DL Model PS Model PS Model DL Model DL Model Assumptions The whole demand space can be partitioned into disjoint subdomains; knowledge on subdomains should be given The faults among program versions can be classified into unrelated faults and related faults Prerequisite 1.Probability of subdomains 2.Failure probabilities of programs on subdomains 1.Number of faults unrelated and related among versions 2. Probability of hardware and decider failure Target system Specific 1-out-of-2 system configurations All multi-version system combinations Measurement objective Upper and lower bounds for failure probability Average failure probability Experimental results Give tighter bounds under most circumstances, yet whether tighter enough needs further investigation The prediction results agree well with observation, yet may have deviations to a specific system

Dept. of Computer Science & Engineering 29 Conclusion Mutants are employed to investigate the prediction performance of two reliability models Mutants are employed to investigate the prediction performance of two reliability models Advantages, limitations and performance of PS and DL model are compared Advantages, limitations and performance of PS and DL model are compared With our data, the confidence bounds in PS model are tighter than P A *P B and min(P A, P B ) under most circumstances With our data, the confidence bounds in PS model are tighter than P A *P B and min(P A, P B ) under most circumstances

Dept. of Computer Science & Engineering 30 Conclusion The PS approach is helpful with our data to analyze the behaviors of the versions under subdomains in revealing the features of fault correlation among diverse programs The PS approach is helpful with our data to analyze the behaviors of the versions under subdomains in revealing the features of fault correlation among diverse programs Our analyses with DL model about the reliability and safety features of DRB, NVP and NSCP are consist with the original experiment, although there are crossovers in the first thousands of hours in the reliability curves Our analyses with DL model about the reliability and safety features of DRB, NVP and NSCP are consist with the original experiment, although there are crossovers in the first thousands of hours in the reliability curves

Dept. of Computer Science & Engineering 31 Future work More test cases should be employed for cross-validation on the prediction accuracy of PS model and DL model More test cases should be employed for cross-validation on the prediction accuracy of PS model and DL model Other existing reliability models can be applied for further comparisons with our experimental data Other existing reliability models can be applied for further comparisons with our experimental data

Q & A Thank you! Dept. of Computer Science & Engineering