On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

Slides:

Advertisements

Similar presentations

QoS-based Management of Multiple Shared Resources in Dynamic Real-Time Systems Klaus Ecker, Frank Drews School of EECS, Ohio University, Athens, OH {ecker,

Advertisements

Design of Experiments Lecture I

Towards Self-Testing in Autonomic Computing Systems Tariq M. King, Djuradj Babich, Jonatan Alava, and Peter J. Clarke Software Testing Research Group Florida.

Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.

MODELING THE PROGRESSION AND TREATMENT OF HIV Presented by Dwain John, CS Department, Midwestern State University Steven M. Shechter Andrew J. Schaefer.

The Comparison of the Software Cost Estimating Methods

1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.

The AutoSimOA Project Katy Hoad, Stewart Robinson, Ruth Davies Warwick Business School WSC 07 A 3 year, EPSRC funded project in collaboration with SIMUL8.

Properties of Machine Learning Applications for Use in Metamorphic Testing Chris Murphy, Gail Kaiser, Lifeng Hu, Leon Wu Columbia University.

Automatic System Testing of Programs without Test Oracles

The In Vivo Testing Approach Christian Murphy, Gail Kaiser, Ian Vo, Matt Chu Columbia University.

Introduction to experimental errors

An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.

Applications of Metamorphic Testing Chris Murphy University of Pennsylvania November 17, 2011.

Using JML Runtime Assertion Checking to Automate Metamorphic Testing in Applications without Test Oracles Christian Murphy, Kuang Shen, Gail Kaiser Columbia.

1 Validation and Verification of Simulation Models.

Overview of Software Requirements

Soft. Eng. II, Spr. 2002Dr Driss Kettani, from I. Sommerville1 CSC-3325: Chapter 9 Title : Reliability Reading: I. Sommerville, Chap. 16, 17 and 18.

Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.

Chapter 11 Multiple Regression.

8-1 Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft.

Copyright ©2011 Pearson Education 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft Excel 6 th Global Edition.

7-2 Estimating a Population Proportion

1 Software Testing and Quality Assurance Lecture 1 Software Verification & Validation.

Using Runtime Testing to Detect Defects in Applications without Test Oracles Chris Murphy Columbia University November 10, 2008.

Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.

5/24/011 Advanced Tool Integration for Embedded Systems Assurance Insup Lee Department of Computer and Information Science University of Pennsylvania.

[ §4 : 1 ] 4. Requirements Processes II Overview 4.1Fundamentals 4.2Elicitation 4.3Specification 4.4Verification 4.5Validation Software Requirements Specification.

Inferential Statistics

Presenter: Shant Mandossian EFFECTIVE TESTING OF HEALTHCARE SIMULATION SOFTWARE.

Software Reliability Categorising and specifying the reliability of software systems.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 24 Slide 1 Critical Systems Validation 1.

SENG521 (Fall SENG 521 Software Reliability & Testing Software Reliability Tools (Part 8a) Department of Electrical & Computer.

University of Toronto Department of Computer Science © 2001, Steve Easterbrook CSC444 Lec22 1 Lecture 22: Software Measurement Basics of software measurement.

Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.

Respected Professor Kihyeon Cho

Verification and Validation Yonsei University 2 nd Semester, 2014 Sanghyun Park.

1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.

Chapter 8 Architecture Analysis. 8 – Architecture Analysis 8.1 Analysis Techniques 8.2 Quantitative Analysis  Performance Views  Performance.

Software Engineering DKT 311 Lecture 11 Verification and critical system validation.

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.

Confidence Interval Estimation

Verification and Validation Overview References: Shach, Object Oriented and Classical Software Engineering Pressman, Software Engineering: a Practitioner’s.

Evaluation of software engineering. Software engineering research : Research in SE aims to achieve two main goals: 1) To increase the knowledge about.

Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.

© SERG Dependable Software Systems (Mutation) Dependable Software Systems Topics in Mutation Testing and Program Perturbation Material drawn from [Offutt.

PARAMETRIC STATISTICAL INFERENCE

West Virginia University Towards Practical Software Reliability Assessment for IV&V Projects B. Cukic, E. Gunel, H. Singh, V. Cortellessa Department of.

Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.

Conformance Test Experiments for Distributed Real-Time Systems Rachel Cardell-Oliver Complex Systems Group Department of Computer Science & Software Engineering.

MODES-650 Advanced System Simulation Presented by Olgun Karademirci VERIFICATION AND VALIDATION OF SIMULATION MODELS.

Chapter 10 Verification and Validation of Simulation Models

ABSTRACT Hyperglycaemia is prevalent in critical care, and tight control reduces mortality. Targeted glycaemic control can be achieved by frequent fitting.

Software Defects.

Chap 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers Using Microsoft Excel 7 th Edition, Global Edition Copyright ©2014 Pearson Education.

Xiali Hei, Xiaojiang Du, Shan Lin Temple University

Mutation Testing Breaking the application to test it.

Secure Execution of Computations in Untrusted Hosts S. H. K. Narayanan 1, M.T. Kandemir 1, R.R. Brooks 2 and I. Kolcu 3 1 Embedded Mobile Computing Center.

Verification vs. Validation Verification: "Are we building the product right?" The software should conform to its specification.The software should conform.

Software Testing.

Testbed for Medical Cyber-Physical Systems

Prepared By : “Mohammad Jawad” Saleh Nedal Jamal Hoso Presented To :

Verification and Validation Overview

Chapter 10 Verification and Validation of Simulation Models

Critical Systems Validation

Critical Systems Validation

Software Verification and Validation

Software Verification and Validation

Software Verification and Validation

Presentation transcript:

On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup Lee, Oleg Sokolsky, Lori Clarke, Lee Osterweil University of Pennsylvania Loyola University Maryland Columbia University University of Massachusetts Amherst

2 / 27 Overview Simulation software is used widely in the field of health care Simulators must not only accurately model the real world, but be free of software defects as well It is particularly hard to test simulation software because often there is no “test oracle” Our research shows that it is possible to detect defects if properties of the software are violated

3 / 27 Outline Motivating examples Overview of testing approach Study #1: Demonstrating feasibility Study #2: Measuring effectiveness Future work & conclusion

4 / 27 Flow of Patients through ED Raunak et al., “Simulating patient flow through an emergency department using process-driven discrete event simulation”, SEHC’09

5 / 27 Glycemic Control (Insulin Pump) King et al., “Prototyping closed loop physiologic control with the Medical Device Coordination Framework”, SEHC’10

6 / 27 Problem Statement Partial oracles may exist for a limited subset of the input domain in simulation software Obvious errors (e.g., crashes) can be detected with certain inputs or testing techniques However, it is difficult to detect subtle computational defects in simulators without test oracles in the general case

7 / 27 What do I mean by “defect”? Deviation of the implementation from the specification Violation of a sound property of the software “Discrete localized” calculation errors  Off-by-one  Incorrect sentinel values for loops  Wrong comparison or mathematical operator Misinterpretation of specification  Parts of input domain not handled  Incorrect assumptions made about input

8 / 27 Research Goals Identify an approach for testing simulation software that is effective even without a test oracle  Reliably detect defects  Increase confidence that the software works Demonstrate feasibility of the approach Measure the effectiveness of the approach

9 / 27 Outline Motivating examples Overview of testing approach Study #1: Demonstrating feasibility Study #2: Measuring effectiveness Future work & conclusion

10 / 27 Observation Many programs without oracles have properties such that certain changes to the input yield predictable changes to the output We can detect defects in these programs by looking for any violations of these “metamorphic properties” This is known as “metamorphic testing”  T.Y. Chen et al., HKUST Tech Report, 1998

11 / 27 Metamorphic Testing If new test case output f(t(x)) is as expected, it is not necessarily correct However, if f(t(x)) is not as expected, either f(x) or f(t(x)) – or both! – is wrong x f f(x) Initial test case t(x) f f(t(x)) New test case t f(x) and f(t(x)) are “pseudo-oracles” Transformation function based on metamorphic properties of f

12 / 27 Metamorphic Testing Example Consider a function to determine the standard deviation of a set of numbers abcdef Initial input cebafd New test case #1 2a2b2c2d2e2f New test case #3 s std_dev s ? 2s ? std_dev s ? New test case #2 a+2b+2c+2d+2e+2f+2

13 / 27 Related Work Verification of simulation models  O. Balci, 1997 Winter Simulation Conf.  R. Sargent, 2005 Winter Simulation Conf. Applying metamorphic testing to applications without test oracles  T.Y. Chen et al., Info. and Soft. Tech., 2002

14 / 27 Outline Motivating examples Overview of testing approach Study #1: Demonstrating feasibility Study #2: Measuring effectiveness Future work & conclusion

15 / 27 Feasibility Study Goal: Demonstrate that metamorphic testing is feasible for testing simulation software We first identify metamorphic properties in the applications of interest  JSim: discrete event simulator (patients in ED)  GCS: glycemic control simulator (insulin pump) We then apply metamorphic testing and look for defects

16 / 27 Metamorphic Properties JSim: Flow of patients through ED  Increasing number of resources (e.g., beds) should not increase average patient length of stay  Increasing number of resources should not decrease other resources’ utilization rates  Multiplying the time necessary for each step by a positive constant c should increase the overall time by c GCS: glycemic control system (insulin pump)  A patient who weighs more should get more insulin  A patient who produces more endogenous glucose should get more insulin  The modeled insulin absorption rate should vary inversely with the insulin distribution volume

17 / 27 JSim Findings

18 / 27 Unexpected JSim Findings IDArrival Time Departure Time Length of Stay IDArrival Time Departure Time Length of Stay Average LOS with 1 nurse Average LOS with 2 nurses

19 / 27 Outline Motivating examples Overview of testing approach Study #1: Demonstrating feasibility Study #2: Measuring effectiveness Future work & conclusion

20 / 27 Measuring Effectiveness Goal: Estimate the effectiveness of metamorphic testing at detecting defects in simulators We first systematically seed the software with defects We then measure the number that are detected

21 / 27 Methodology Mutation testing was used to seed defects into each application  Reverse comparison operators  Change math operators  Introduce off-by-one errors For each program, we created multiple versions, each with exactly one mutation We ignored mutants that yielded outputs that were obviously wrong, caused crashes, etc. Effectiveness is determined by measuring what percentage of the mutants were “killed”

22 / 27 Results ApplicationJSim GCS Control GCS Patient Mutants generated Usable mutants Mutants detected Effectiveness100%24.4%68.4%

23 / 27 Analysis: JSim “Statistical metamorphic testing” useful for killing mutants related to non-deterministic event timing If timing range is [A, B] and observed mean is μ, then mean μ’ for range [10A, 10B] should be around 10μ Because of mutant, range is actually [A, B-1] Over many executions, observed mean μ’ has statistically significant difference from expected mean 10μ

24 / 27 Analysis: GCS Metamorphic testing not as effective in control algorithm (rules for delivering insulin) Rules are usually of the form “if patient blood sugar is x then adjust infusion rate by y” Single mutants did not have much effect on overall insulin delivered These may be detected by more “straightforward” software testing approaches

25 / 27 Outline Motivating examples Overview of testing approach Study #1: Demonstrating feasibility Study #2: Measuring effectiveness Future work & conclusion

26 / 27 Future Work Formalizing the process of identifying metamorphic properties for simulators Consider the use of metamorphic testing for validation  If a property is violated, does that mean there is a defect, or is the property simply unsound?  If the property is unsound, is this simulator appropriate for the task it is meant to model?

27 / 27 Conclusion We have demonstrated that metamorphic testing is an effective technique for testing simulation software It can increase confidence in the implementation It also helps increase understanding of how the software behaves

On Effective Testing of Health Care Simulation Software Christian Murphy, University of Pennsylvania M.S. Raunak, Loyola University Maryland