Testing Challenges for Next-Generation CPS Software

Slides:



Advertisements
Similar presentations
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Advertisements

Model Counting >= Symbolic Execution Willem Visser Stellenbosch University Joint work with Matt Dwyer (UNL, USA) Jaco Geldenhuys (SU, RSA) Corina Pasareanu.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Dagstuhl Intro Mike Whalen. 2 Mike Whalen My main goal is to reduce software verification and validation (V&V) cost and increasing.
Testing an individual module
1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
©Ian Sommerville 2000Software Engineering, 6th edition. Chapter 19Slide 1 Verification and Validation l Assuring that a software system meets a user's.
System/Software Testing
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
CPIS 357 Software Quality & Testing
CMSC 345 Fall 2000 Unit Testing. The testing process.
VTT-STUK assessment method for safety evaluation of safety-critical computer based systems - application in BE-SECBS project.
Coverage – “Systematic” Testing Chapter 20. Dividing the input space for failure search Testing requires selecting inputs to try on the program, but how.
Agenda Introduction Overview of White-box testing Basis path testing
1 Software Testing. 2 Path Testing 3 Structural Testing Also known as glass box, structural, clear box and white box testing. A software testing technique.
Testing Testing Techniques to Design Tests. Testing:Example Problem: Find a mode and its frequency given an ordered list (array) of with one or more integer.
Test Coverage CS-300 Fall 2005 Supreeth Venkataraman.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
White-box Testing.
SOFTWARE METRICS. Software Process Revisited The Software Process has a common process framework containing: u framework activities - for all software.
Model Counting with Applications to CodeHunt Willem Visser Stellenbosch University South Africa.
SAFEWARE System Safety and Computers Chap18:Verification of Safety Author : Nancy G. Leveson University of Washington 1995 by Addison-Wesley Publishing.
Software Quality Assurance and Testing Fazal Rehman Shamil.
Dynamic Testing.
Dynamic White-Box Testing What is code coverage? What are the different types of code coverage? How to derive test cases from control flows?
Verification vs. Validation Verification: "Are we building the product right?" The software should conform to its specification.The software should conform.
Introduction to Software Testing (2nd edition) Chapter 5 Criteria-Based Test Design Paul Ammann & Jeff Offutt
Artificial Neural Networks This is lecture 15 of the module `Biologically Inspired Computing’ An introduction to Artificial Neural Networks.
Software Testing. Software Quality Assurance Overarching term Time consuming (40% to 90% of dev effort) Includes –Verification: Building the product right,
Experience Report: System Log Analysis for Anomaly Detection
A Review of Software Testing - P. David Coward
Software TestIng White box testing.
Software Testing.
Testing Tutorial 7.
Deep Feedforward Networks
Software Testing.
Software Testing.
Control Flow Testing Handouts
John D. McGregor Session 9 Testing Vocabulary
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 4 Control Flow Testing
CompSci 230 Software Construction
Input Space Partition Testing CS 4501 / 6501 Software Testing
Simulation based verification: coverage
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
CS5123 Software Validation and Quality Assurance
Outline of the Chapter Basic Idea Outline of Control Flow Testing
Structural testing, Path Testing
Types of Testing Visit to more Learning Resources.
John D. McGregor Session 9 Testing Vocabulary
White-Box Testing.
Software Development Cycle
UNIT-4 BLACKBOX AND WHITEBOX TESTING
Introduction to Software Testing Chapter 2 Model-Driven Test Design
John D. McGregor Session 9 Testing Vocabulary
Software Testing (Lecture 11-a)
White-Box Testing.
Software testing.
Automatic Test Generation SymCrete
Control Structure Testing
CSE403 Software Engineering Autumn 2000 More Testing
Software Development Cycle
Artificial Intelligence 12. Two Layer ANNs
Software Testing “If you can’t test it, you can’t design it”
Applying Use Cases (Chapters 25,26)
Software Development Cycle
Modeling IDS using hybrid intelligent systems
UNIT-4 BLACKBOX AND WHITEBOX TESTING
Mutation Testing Faults are introduced into the program by creating many versions of the program called mutants. Each mutant contains a single fault. Test.
Presentation transcript:

Testing Challenges for Next-Generation CPS Software Mike Whalen University of Minnesota Heterogeneity: Low-level: hardware failures, system resets, etc. High level: command and control or human interfaces will tend to focus on higher-level goals Tests on data storage on the Mars Rover split into two parts. Tests for low-level flash storage, produced by random testing and model-checking,hardware simulators, fault injection, very high coverage and complex spec Presentation doesn’t ever talk about environment for “closed loop” tests Presentation doesn’t ever talk about concurrency – this is certainly a distinguishing factor for CPS. Also, didn’t really talk about IoT where you have cloud and device interactions. Wish I could have talked about more stuff! 7/12/2017

Acknowledgements Rockwell Collins: Steven Miller, Darren Cofer, Lucas Wagner, Andrew Gacek, John Backes University of Minnesota: Mats P. E. Heimdahl, Sanjai Rayadurgam, Matt Staats, Ajitha Rajan, Gregory Gay Funding Sponsors: NASA, Air Force Research Labs, DARPA 7/12/2017

Who Am I? My main aim is in reducing verification and validation (V&V) cost and increasing rigor Applied automated V&V techniques on industrial systems at Rockwell Collins for 6 ½ years Proofs, bounded analyses, static analysis, automated testing Combining several kinds of assurance artifacts I’m interested in requirements as they pertain to V&V. Main research thrusts in testing Factors in testing: how do we make testing experiments fair and repeatable? Test metrics: What are reasonable metrics for testing safety-critical systems? What does it mean for a metric to be reasonable? 7/12/2017

Software Size Graphic: Andrea Busnelli 7/12/2017

The Future of Software Engineering December 2010 Slide courtesy Lockheed Martin, Inc. 7/12/2017 Twin-SPIN

Software Connectivity 7/12/2017

Image courtesy of energyclub.stanford.edu Networked Vehicles Currently: Bluetooth and OnStar Adaptive Cruise Control Platooning Traffic Routing Emergency Response Adaptive traffic lights What could possibly go wrong? Image courtesy of energyclub.stanford.edu 7/12/2017

Attacks on Embedded Systems Poland Tram Stuxnet FBI - iPhone Poland Tram Hack: 14 year old derails four trains and forces emergency stops Miller: Remote Car Hack 7/12/2017

Hypotheses CPS testers are facing enormous challenges of scale and scrutiny Substantially larger code bases Increased attention from attackers Thorough use of automation is necessary to increase rigor for CPS verification Requires understanding of factors in testing Common coverage metrics are not as well-suited for CPS as for general purpose software Structure of programs, oracles is important for automated testing! Creating intelligent / adaptive systems will make the testing problem harder Use of “deep learning” for critical functionality We have little knowledge of how to systematically white-box test deep-learning generated code such as neural nets Should I talk about differences between GP software and embedded software here? 7/12/2017

Testing Process Test Suite Test Inputs Specification Model/Program Executed On Specification Implements Model/Program Program Path Create Additional Oracle Correct/incorrect Evaluated by Assess Test Coverage Metric 7/12/2017

Testing Artifacts J. Gourlay. A mathematical framework for the investigation of testing. TSE, 1983 Staats, Whalen, and Heimdahl, Programs, Tests, and Oracles: The Foundations of Testing Revisited. ICSE 2011 7/12/2017

Testing Artifacts – In Practice Argument here about two things: 1. Embedded programs often have different characteristics than general-purpose software 2. first that the choice of oracle is very important and tends to be less accurate for embedded systems 7/12/2017

Staats’ Framework     7/12/2017

Theory in Practice 7/12/2017

Complete Adequacy Criteria   I.e.: is your testing regimen adequate, given the program structure, specification, oracle and test suite? 7/12/2017

Complete Adequacy Criteria   Fault Finding Effectiveness, 100% Branch Coverage Output-Only Oracle Maximum Oracle DWM_1 55% 83% DWM_2 14% Latctl_Batch 33% 89% Vertmax_Batch 32% 85% 7/12/2017

Complete Adequacy Criteria   Fault Finding Effectiveness, 100% Branch Coverage Output-Only Oracle Maximum Oracle DWM_1 55% 83% DWM_2 14% Latctl_Batch 33% 89% Vertmax_Batch 32% 85% 7/12/2017

Complete Adequacy Criteria   7/12/2017

Complete Adequacy Criteria Gay, Staats, Whalen, and Heimdahl, The Risks of Coverage-Directed Test Case Generation, FASE 2012, TSE 2015.   7/12/2017

MC/DC Effectiveness DWM_2 Code structure has large effect!   DWM_2 Code structure has large effect! Choice of oracle has large effect! Vertmax_Batch DWM_3 7/12/2017

Goals for “Good” Test Metric: Inozemtseva and Holmes, Coverage Is Not Strongly Correlated with Test Suite Effectiveness, ICSE 14 Effective at finding faults; Better than random testing for suites of the same size Better than other metrics This often requires accounting for oracle Robust to changes in program structure Reasonable in terms of the number of required tests and cost of coverage analysis Zhang and Mesbah: Assertions Are Strongly Correlated with Test Suite Effectiveness, FSE 15 7/12/2017

Another way to look at MC/DC Masking MC/DC can be expressed: Describes whether a condition is observable in a decision (i.e., not masked) Problem 1: any masking after the decision is not accounted for. Problem 2: we can rewrite programs to make decisions large or small (and MC/DC easy or hard to satisfy!) Where means, For program P, the computed value for the nth instance of expression e is replaced by value v For MC/DC, given decision D for each condition c in D, we want a pair of test cases ti and tj that ensure c is observable for both true and false values. 7/12/2017

Reachability and Observability 7/12/2017

Examining Observability With Model Counting and Symbolic Evaluation [ true ] test (X,Y) int test(int x, int y) { int z; if (y == x*10) S0; else S1; if (x > 3 && y > 10) S2; S3; return z; } [ Y=X*10 ] S0 [ Y!=X*10 ] S1 [ X>3 & 10<Y=X*10] S2 [ X>3 & 10<Y!=X*10] S2 [ Y=X*10 & !(X>3 & Y>10) ] S3 [ Y!=X*10 & !(X>3 & Y>10) ] S3 Test(1,10) reaches S0,S3 Test(0,1) reaches S1,S3 Test(4,11) reaches S1,S2 Work by: Willem Visser, Matt Dwyer, Jaco Geldenhuys, Corina Pasareanu, Antonio Filieri, Tevfik Bultan ISSTA ‘12, ICSE ‘13, PLDI’14, SPIN ‘15, CAV ‘15 7/12/2017

Probabilistic Symbolic Execution 104 [ true ] y=10x The statement z = 10 gets visited in 99.9% of tests int test(int x, int y: 0..99) { int z; if (y == x*10) S0; else z = 10; if (x > 1 && y > 10) z = 8; S3; return z; } [ Y=X*10 ] [ Y!=X*10 ] 9990 10 x>3 & y>10 x>3 & y>10 But it only affects the outcome in 14% of tests Check with Willem’s data: 98 x values * 89 y values – 10 combos. Y!=X*10  9990 X <= 1  2 for any value of y: 200 – 10 = 190 Y <= 10  11 for any value of x other than 0..1: 1100 = 8538 1452 6 4 [ X>3 & 10<Y=X*10] [ Y=X*10 & !(X>3 & Y>10) ] 7/12/2017 [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

Probabilistic SE Hard to reach Easy to reach Easy to observe 104 [ true ] y=10x Hard to reach Easy to reach [ Y=X*10 ] [ Y!=X*10 ] 9990 10 x>3 & y>10 x>3 & y>10 Easy to observe (Somewhat) Hard to observe Now: suppose we put assertions at different points in the code. What are we doing? We are increasing observability of the code. 8538 1452 6 4 [ X>3 & 10<Y=X*10] [ Y=X*10 & !(X>3 & Y>10) ] 7/12/2017 [ X>3 & 10<Y!=X*10] [ Y!=X*10 & !(X>3 & Y>10) ]

Location, Location, Location More important than chicken or bull How hard is it to kill a mutant? Just, Jalali, Inozemtseva, Ernst, Holmes, and Fraser. Are mutants a valid substitute for real faults in software testing? FSE 2014 Yao, Harmon, Jia, A study of Equivalent and Stubborn Mutation Operators using Human Analysis of Equivalence. ICSE 2014 Location, Location, Location Spoiler Alert Not hard at all More important than chicken or bull W. Visser, What makes killing a mutant hard? ASE 2016. 7/12/2017

They saw something interesting In the initial results They saw something interesting 7/12/2017

What did they find? Stubborn Barrier public static int classify(int i, int j, int k) { if ((i <= 0) || (j <= 0) || (k <= 0)) return 4; int type = 0; if (i == j) type = type + 1; if (i == k) type = type + 2; if (j == k) type = type + 3; if (type == 0) { if ((i + j <= k) || (j + k <= i) || (i + k <= j)) type = 4; else type = 1; return type; } if (type > 3) type = 3; else if ((type == 1) && (i + j > k)) type = 2; else if ((type == 2) && (i + k > j)) type = 2; else if ((type == 3) && (j + k > i)) type = 2; else type = 4; Stubborn Barrier Almost all Mutations are Stubborn (<1%) 7/12/2017

Only 3% of inputs pass here Why? public static int classify(int i, int j, int k) { if ((i <= 0) || (j <= 0) || (k <= 0)) return 4; int type = 0; if (i == j) type = type + 1; if (i == k) type = type + 2; if (j == k) type = type + 3; if (type == 0) { if ((i + j <= k) || (j + k <= i) || (i + k <= j)) type = 4; else type = 1; return type; } if (type > 3) type = 3; else if ((type == 1) && (i + j > k)) type = 2; else if ((type == 2) && (i + k > j)) type = 2; else if ((type == 3) && (j + k > i)) type = 2; else type = 4; Only 3% of inputs pass here Should be 30 minutes in. Almost all Mutations are Stubborn (<1%) 7/12/2017

A (Very) Small Experiment on Operator Mutations Programs Muts Stubborn < 0.1% Really < 0.1% Always 100% Easy > 33% TRI-YHJ 5 4 TRI-V1 19 1 8 18 TRI-V2 7 TCAS 38 9 28 Arithmetic Operators: Triangle calculator: “general purpose” Software TCAS “Embedded” software Programs Muts Stubborn < 0.1% Really < 0.1% Always 100% Easy > 33% TRI-YHJ 40 5 24 TRI-V1 85 6 3 4 61 TRI-V2 55 38 TCAS 185 32 12 46 Relational Operators: Three versions of a triangle program and the Seimens’ TCAS example 7/12/2017

A (Very) Small Experiment on Operator Mutations Programs Muts Stubborn < 0.1% Really < 0.1% Always 100% Easy > 33% TRI-YHJ 5 4 TRI-V1 19 1 8 18 TRI-V2 7 TCAS 38 9 28 Arithmetic Operators: Triangle calculator: “general purpose” Software TCAS “Embedded” software Programs Muts Stubborn < 0.1% Really < 0.1% Always 100% Easy > 33% TRI-YHJ 40 5 24 TRI-V1 85 6 3 4 61 TRI-V2 55 38 TCAS 185 32 12 46 Relational Operators: Three versions of a triangle program and the Seimens’ TCAS example 7/12/2017

Why is observability an issue for embedded systems? Often long tests are required to expose faults from earlier computations Rate Limiters Hysteresis / de-bounce Feedback bounds System Modes Physical systems can impede observability Cannot observe all outputs Or cannot observe them accurately Fault tolerance logic can impede observability Richer oracle data than system outputs required Structure of programs can impede observability Graphical dataflow notations (Simulink / SCADE) put conditional blocks at the end of computation flows rather than at the beginning. 7/12/2017

Idea: lift observation from decisions to programs Observable MC/DC Idea: lift observation from decisions to programs Explicitly account for oracle Strength should be unaffected by simple program transformations (e.g., inlining) For MC/DC, given decision D for each condition c in D, we want a pair of test cases ti and tj that ensure c is observable for both true and false values. Whalen, Gay, You, Staats, and Heimdahl. Observable Modified Condition / Decision Coverage. ICSE 2013 7/12/2017

DWM1 Research questions: -> Effectiveness of fault finding, especially for output-only oracle -> Robustness to inlining -> Test suite size -> Effect of oracle DWM1 7/12/2017

DWM2 Latctl Vertmax Microwave Research questions: -> Effectiveness of fault finding, especially for output-only oracle -> Robustness to inlining -> Test suite size -> Effect of oracle 7/12/2017 Vertmax Microwave

Adoption in SCADE and Mathworks Tools SCADE: Generalization to all variables. Called Stream Coverage Also in discussions with The Mathworks on these ideas Currently they support an inlining solution for MCDC 7/12/2017

Testing Code for Complex Mathematics 7/12/2017

Testing Complex Mathematics Metrics describing branching logic often miss errors in complex mathematics Errors often exist in parts of the “numerical space” rather than portions of the CFG - Overflow / underflow - Loss of precision - Divide by zero - Oscillation - Transients 7/12/2017

Matinnejad, Nejati & Briand: Metrics for Complex Mathematics Use multi-objective search-based testing to try to maximize the diversity of outputs vectors in terms of distance and the number of numerical features, and to maximize failure features in a test suite. 7/12/2017

Output Diversity -- Vector-Based Matinnejad, Nejati, and Briand, Automated Test Suite Generation for Time-Continuous Simulink Models. ICSE 2016 Output Matinnejad, Nejati, Briand, Bruckmann, Poull, Search-based automated testing of continuous controllers: Framework, tool support, and case studies. I&ST 57 (2015) ADD a slide for coverage Matinnejad, Nejati, Briand, Bruckmann: Effective test suites for mixed discrete-continuous stateflow controllers. FSE 2015 Output Signal 1 Time Output Signal 2

Failure-based Test Generation Maximizing the likelihood of presence of specific failure patterns in output signals Instability Discontinuity Output

Search-Based Test Generation Procedure Initial Test Suite Slightly Modifying Each Test Input Output-based Heuristics

Output Diversity: Comparison with Random Seeded faults into mathematical software with few branches Measure deviation from expected values Two metrics: Of is failure diversity Ov is variable diversity q is size of test suite Q0 is max deviation Of is failure diversity Ov is variable diversity FR is % of faults revealed 7/12/2017

Output Diversity: Comparison with SLDV CAVEATS Not much branching logic in models (MCDC strength) MCDC is not very good at catching relational or arithmetic faults SLDV is not designed for non-linear arithmetic and continuous time However, this demonstrates need for new kinds of metrics and generation tools 7/12/2017

Example: Neural Nets Build a network of “neurons” that map from inputs to outputs Each node performs a summation and has a threshold to “fire” Each connection has a weight, which can be positive or negative As of 2017, neural networks typically have a few thousand to a few million units and millions of connections. Neural nets are trained rather than programmed. Linear models Kernel methods like Gaussian processes Vector machines By Glosser.ca - Own work, Derivative of File:Artificial neural network.svg, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=24913461

Machine Learning Use cases: (Self-) diagnosis Predictive Maintenance Condition Monitoring Anomaly Detection / Event Detection Image analysis in production Pattern recognition Increasingly proposed for use in safety-critical applications: road following, adaptive control 7/12/2017

Neural Net Code Structure (in MATLAB) function [y1] = simulateStandaloneNet(x1)   % Input 1   x1_step1_xoffset = 0;   x1_step1_gain = 0.200475452649894;   x1_step1_ymin = -1;  % Layer 1   b1 = [6.0358701949520981;2.725693924978148;0.58426771719145909;-5.1615078566382975];   IW1_1 = [-14.001919491063946;4.90641117353245;-15.228280764533135;-5.264207948688032];    % Layer 2   b2 = -0.75620725148640833;   LW2_1 = [0.5484626432316061 -0.43580234386123884 -0.085111261420612969 -1.1367922825337915]; % Output 1   y1_step1_ymin = -1;   y1_step1_gain = 0.2;   y1_step1_xoffset = 0;   % ===== SIMULATION ========   % Dimensions   Q = size(x1,2); % samples   % Input 1   xp1 = mapminmax_apply(x1,x1_step1_gain, x1_step1_xoffset,x1_step1_ymin);   % Layer 1   a1 = tansig_apply(repmat(b1,1,Q) + IW1_1*xp1);   % Layer 2   a2 = repmat(b2,1,Q) + LW2_1*a1;   % Output 1   y1 = mapminmax_reverse(a2,y1_step1_gain, y1_step1_xoffset,y1_step1_ymin); end 7/12/2017

…continued Code observations: No branches! No relational operators! % ===== MODULE FUNCTIONS ======== % Map Minimum and Maximum Input Processing Function function y = mapminmax_apply(x, settings_gain, settings_xoffset, settings_ymin)   y = bsxfun(@minus,x,settings_xoffset);   y = bsxfun(@times,y,settings_gain);   y = bsxfun(@plus,y,settings_ymin); End % Sigmoid Symmetric Transfer Function function a = tansig_apply(n)   a = 2 ./ (1 + exp(-2*n)) - 1;   % Map Minimum and Maximum Output Reverse-Processing Function function x = mapminmax_reverse(y, settings_gain, settings_xoffset, settings_ymin)   x = bsxfun(@minus,y,settings_ymin);   x = bsxfun(@rdivide,x,settings_gain);   x = bsxfun(@plus,x,settings_xoffset); end Code observations: No branches! No relational operators! 7/12/2017

So, how do we test this? Black-box reliability testing? How do we determine the input distributions? How do we gain sufficient confidence for safety-critical use? Ricky W. Butler, George B. Finelli: The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software Mutation testing? What do we mutate? What is our expectation as to the output effect? A specialized testing regime? 7/12/2017

7/12/2017

To Recap CPS systems are getting enormous Character of CPS systems is different than problems in common benchmark suites! Understanding factors in CPS systems influencing test is key to effective testing Observability is important but more difficult in CPS systems Testing complex mathematical code needs more research will be necessary to help gain confidence in safety-critical “deep learning”-generated code. 7/12/2017

Citations J. Gourlay. A mathematical framework for the investigation of testing. TSE, 1983 Staats, Whalen, and Heimdahl, Programs, Tests, and Oracles: The Foundations of Testing Revisited. ICSE 2011 Gay, Staats, Whalen, and Heimdahl, The Risks of Coverage-Directed Test Case Generation, FASE 2012, TSE 2015. Inozemtseva and Holmes, Coverage Is Not Strongly Correlated with Test Suite Effectiveness, ICSE 14 Zhang and Mesbah: Assertions Are Strongly Correlated with Test Suite Effectiveness, FSE 15 Just, Jalali, Inozemtseva, Ernst, Holmes, and Fraser. Are mutants a valid substitute for real faults in software testing? FSE 2014 Yao, Harmon, Jia, A study of Equivalent and Stubborn Mutation Operators using Human Analysis of Equivalence. ICSE 2014 W. Visser, What makes killing a mutant hard? ASE 2016. Whalen, Gay, You, Staats, and Heimdahl. Observable Modified Condition / Decision Coverage. ICSE 2013 Matinnejad, Nejati, and Briand, Automated Test Suite Generation for Time-Continuous Simulink Models. ICSE 2016 Matinnejad, Nejati, Briand, Bruckmann, Poull, Search-based automated testing of continuous controllers: Framework, tool support, and case studies. I&ST 57 (2015) Matinnejad, Nejati, Briand, Bruckmann: Effective test suites for mixed discrete-continuous stateflow controllers. FSE 2015 Machine Learning for Cyber Physical Systems. 2015, 2016, 2017. Springer. Ricky W. Butler, George B. Finelli: The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software, TSE 1993 7/12/2017

Basic Idea of ML Some examples of ML techniques: Linear Models Parameters are things like number and size of hidden layers, weight initialization scheme, etc. Some examples of ML techniques: Linear Models Kernel methods like Gaussian processes Vector machines Image from: https://www.kth.se/polopoly_fs/1.578616!/RH_MachineLearning_presentation.pdf

Differences between CPS software and GP Testing Benchmarks Some ad-hoc observations Standard GP Testing Benchmarks – (Say) defects4j or Java Datatypes CPS Software Types Rich data types throughout Usually simple, non-recursive data Minimal test depth to failure Short sequences of operations or single inputs can trigger behavior Often long input sequences are required. For controllers, often hundreds of input “steps” are necessary to trigger erroneous behavior Statement observability Straightforward; can be assisted with mocks and stubs Often poor. Effects of executed line of code are often (1) masked by other logic, or (2) delayed by other logic for a non-trivial amount of time until visible at output. Worse with HIL testing Complexity of Numerics Usually not complex (Apache.math is an exception, but most library routines are small) Usually long and complex Timing Usually ignored Usually important! 7/12/2017