5 th May 2009 Assurance, Confidence and Software Safety Dr. Richard Hawkins
Background to the problem System System elements h/w s/w h/w s/w Safety/hazard analysis Safety requirements plus Integrity requirements ■ For s/w very difficult to demonstrate safety requirements to integrity > ~10 -3 ■ For s/w need different approach, since can’t directly demonstrate an integrity
Software Safety Arguments ■ Traditionally safety of software aspects of systems demonstrated using a prescriptive approach o Process defined in a standard o Process varied according to risk or criticality of sw failure ■ Move towards a goal-based approach o RA does not prescribe a method o Responsibility of developer to demonstrate safety to RA Production of a software safety argument ■ Why move to a goal-based approach? o Does controlling process necessarily control hazardous contribution? Perhaps…but implicit o The developer will always know what is best to do to demonstrate the system is safe o Increased flexibility allows easier adoption of new techniques and technologies
The challenge ■ But, creating compelling software safety arguments is difficult and costly ■ Interviewed a number of stakeholders to find out the key challenges o Difficult to determine what activities it was necessary to undertake to support the software safety case Particularly, how to ensure what you do is appropriate for the level of risk. When have you done enough? o How can sufficiency of the safety argument be judged? o How can you determine up front what you will need to do? So that you can manage the activities
Addressing risk ■ It is necessary that the safety argument produced is commensurate with the system risk ■ Goal-based approach explicitly addresses risk reduction ■ But we still need to determine if mitigations put in place are sufficient for given risk ■ There have been some attempts to define the evidence required for different risks o However relationship between generated evidence and risk reduction achieved is unclear o So justifying sufficiency remains very difficult
Assurance ■ Why is it so difficult? ■ Safety arguments are very rarely deductive o If we know premises to be true, then we will also believe the conclusion with certainty ■ Mostly we have inductive arguments o We can't demonstrate the conclusion of the argument with certainty, only with probability ■ This probability represents the level of confidence we have in truth of the claim ■ The term assurance is often used in safety arguments o The assurance of a claim is justifiable confidence in the truth of that claim
Assurance ■ Lots of factors affect assurance.. o Assumptions made o Inductive gap How strongly do sub-claims or evidence give reason to believe the claim is true? o Trustworthiness of evidence What is the quality of the evidence? How well does it meet its intent? o Visibility of system and environment information o etc ■ These subjectivities are always there o Present in a prescriptive approach, but left implicit ■ By reasoning explicitly about them, it is easier to justify
Compelling Software Safety Arguments ■Necessary to demonstrate that sufficient assurance has been achieved ■Must consider assurance throughout the development of the software safety argument ■Guidance on this split into two parts o Software safety argument pattern catalogue o Assurance based argument development method
Software Safety Argument Pattern Catalogue ■Capture good practice for compelling software safety arguments o Based upon Existing patterns Current practice for software safety arguments ■Developed to provide flexibility o Instantiable for a wide range of systems Diverse development processes Different hazards and safety requirements Etc.. ■To be sufficiently compelling the correct implementation decisions must be made o Patterns must be instantiated within the framework of the assurance based argument development method (more later…)
Software Safety Argument Pattern Catalogue ■Patterns currently provided in the pattern catalogue… ■High-level software safety argument pattern o High-level structure for a software safety argument ■Software contribution safety argument pattern o Arguments that the contributions made by software to system hazards are acceptably managed ■DSSR identification software safety argument pattern o Arguments that DSSRs from one ‘tier’ are captured at the next ■Hazardous contribution software safety argument pattern o Considers additional hazardous contributions at each tier ■Strategy justification software safety argument pattern o Argument that the adopted strategy is acceptable
Software Contribution Safety Argument Pattern ■Must consider all ways in which errors introduced into software could lead to the software contribution ■Different development process used on different projects o Always have various ‘tiers’ of design ■At each tier must address requirements of the higher level o DSSRs from the previous tier must be adequately addressed o Consider additional hazardous contributions that may be introduced at each tier ■Instantiation decisions made here will have large impact on assurance
Software Contribution Safety Argument Pattern
DSSR Identification Sw Safety Argument Pattern ■(DSSRs) from a previous tier of development adequately captured at the next tier of development Design mitigations Allocate and decompose DSSRs –Define additional DSSRs ■Don’t necessarily need to instantiate for every tier but.. o Violates traceability requirements o Increases uncertainty o Must be able to justify this is acceptable
DSSR Identification Sw Safety Argument Pattern
Hazardous Contribution Sw Safety Argument Pattern ■Potentially hazardous failures could be introduced at each tier o Must identify HSFM at that tier FHA HAZOP Etc… o Must address each identified HSFM Definition of further DSSRs ■Don’t necessarily need to instantiate for every tier but… o Violates traceability requirements o Increases uncertainty o Must be able to justify this is acceptable
Hazardous Contribution Sw Safety Argument Pattern
Strategy Justification Sw Safety Argument Pattern ■The strategy adopted is acceptable from assurance point of view o Justify implementation decisions made are appropriate ■The confidence achieved in the claim is acceptable o Provides explicit justification o Based on ACARP assessment ■Can be used to justify any strategy for which justification may be required to convince a reader ■Pattern is used in context to the strategy to which it relates ■Will look at this in detail after ACARP discussion…
Assurance Based Argument Development Method ■Even if using patterns for guidance how can we be sure the argument is sufficiently compelling? ■Must explicitly consider assurance throughout argument development ■At every step in constructing the argument it is inevitable that information will be lost o Defining the safety claims o Deciding on strategy (argument approach) o Identifying assumptions and context o Providing evidence ■Losing information increases uncertainty, which affects assurance o Assurance deficits ■To construct compelling arguments must understand where assurance deficits come from
Sw safety argument development method ■ There is an existing safety argument development method o This can be used to develop software safety arguments Assurance is not explicitly considered Potential for assurance deficits
Extended 6-step method ■ To extend the 6-step method o Considered how assurance deficits may occur at each step o Use this to inform decisions about how to construct the argument ■ Perform deviation analysis on each of the steps No or None - More - Less - As well as - Part of - Other than – Reverse ■ Apply and interpret guidewords for each step ■ Consider deviation effect on assurance o What information is being lost? o How would that information affect assurance? o Is it worth knowing that information?
Consideration of Assurance During Argument Construction
ACARP ■Possible to increase assurance in a claim by gaining more relevant information - address assurance deficit o But is it cost-effective to do so? o Diminishing returns? o How do we know when we’ve increased confidence sufficiently? ■DS Issue 4 Part 2 Annex B states… o “B1.1 – The goal of risk management as defined by this standard is to show that safety risks can be tolerated and are at levels that are ALARP” o “B3.2 – [For systems containing complex electronic elements]…much of the effort only improves confidence that requirements have been met. In applying ALARP, the confidence achieved should be proportionate to the risk.” ■This leads us onto a consideration of ACARP (As Confident As Reasonably Practicable)
Assurance Deficits ■Developer must be able to justify they are ACARP in the truth of the claim ■Ensuring sufficient confidence is achieved requires that all assurance deficits are acceptably managed ■Potential assurance deficits may be identified from o Assurance based argument development method o The patterns ■For all identified assurance deficits o Consider if they’re acceptable o Attempt to address the deficit o Justify any residual assurance deficit
Impact Assessment ■To determine if identified assurance deficit should be addressed o Must consider the impact of assurance deficit ■What is effect of not having the information on the claim being supported? o What is still assured and what isn’t? ■How bad would it be if the claim was undermined in this way? ■Important to consider in terms of risk o Only through considering risk can be know how bad it is Importance to overall system safety
ACARP Assessment ■Can use ACARP to categorise impact of assurance deficits o Intolerable Potential impact on the claim of assurance deficit cannot be justified under any circumstances o ACARP Assurance deficit is tolerable only if the cost of taking measures to address assurance deficit is grossly disproportionate to the benefit The greater the impact of the assurance deficit, the more, proportionately, system developers are expected to spend to address it o Broadly acceptable Impact of assurance deficit is negligible, further increases in confidence need not be sought
Justifying Sufficiency ■Addressing assurance deficit requires ‘buying’ more information relevant to the safety claim o Is it worth spending the money to get that information? ■Demonstrating ACARP requires that both the cost and impact of addressing assurance deficits be determined o To judge if the cost is grossly disproportionate to the impact ■In theory could do formal cost-benefit analysis based on quantitative assessment of o Cost of available options o Costs associated with potential impact ■In most cases for ACARP, qualitative approach is more appropriate o But relies on providing explicit justification why residual assurance deficits are acceptable Justification based on (qualitative) ACARP assessment Where appropriate provide an argument
Unit Testing Example Identified that the DSSRx from the LL design must be decomposed to two separate DSSRs at code level for two different modules This is a DSSR from the low level design We are assuming in this example that no additional code level hazardous failure modes were identified (unrealistic, but simpler) Since there is verification at other levels, and traceability, will this be sufficient?
Unit Testing Example ■We do have some confidence in the truth of the claim ‘DSSRy addressed by code module A’ o Provided by the trustworthiness argument…
Unit Testing Example ■This is providing some confidence that the code of module A is free from errors ■To consider if the confidence is sufficient, must consider what additional information could be provided relevant to the claim o What is the potential assurance deficit here? ■We could provide information about any errors that were made in implementing the module… ■Unit testing could provide information about this ■What is the impact of this information? o What is the effect of not doing unit testing on Goal:DSSRyADDCode? o How bad would it be if there was no way of knowing about errors introduced during implementation?
Unit Testing Example ■Without unit testing there is… o No mechanism for identifying any errors which are introduced at the code level o No way of determining whether the errors could affect the achievement of DSSRy ■The effect on risk can be determined by considering the potential hazardous effect of unidentified errors ■The potential hazardous effect is that the safety requirement (DSSRy, DSSRx, and upwards) is not met in operation ■Impact reflected by risk at system level o Defines relative importance of affected safety requirements to system safety ■Impact will also depend upon relative assurance of module A code free from errors
Unit Testing Example ■Not performing unit testing contributes to the DSSR not being met… o iff there is an error which unit testing would’ve identified which leads to the DSSR failure ■The more confidence there aren’t errors in code module o The less likely it is that there will be an error which leads to DSSR failure o The impact of not doing unit testing is reduced ■Assurance in this other aspect of the support for Goal:DSSRyADDCode can reduce the impact of the assurance deficit
Unit Testing Example ■If DSSRy has low importance to system safety (low risk) ■And have high assurance that code is error free o based on argument of trustworthiness of code ■Potentially could justify that… o High cost of unit testing is grossly disproportionate to benefit gained o Since impact of addressing assurance deficit is low ■Where impact determined to be higher such justification may not be possible o Unit testing would be considered reasonably practicable ■Other strategies could increase confidence further through providing further information relating to o Presence of errors in the code o Lack of errors in the code (trustworthiness)
Unit Testing Example ■Other methods could provide similar information to unit testing o E.g. static code analysis ■Important to consider what additional information is provided relative to the claim ■Static analysis will only increase confidence further if providing information unit testing does not ■Must consider o Weaknesses or limitations of unit testing o The nature of the claim Different DSSRs may require different support –E.g. timing vs omission Some may require a combination of techniques to provide required information
Unit Testing Example ■Could also gain additional assurance in the trustworthiness of the code o Provide more information about the rigour of the processes used ■Provides the opportunity to perform trade-offs between o Cost of increasing confidence in lack of errors vs o Cost of increasing confidence in identification of errors ■Where will most benefit be gained? ■Where impact of assurance deficit is high, such trade- offs are unlikely to be justifiable however
Unit Testing Example ■The impact of providing unit testing depends upon the effect of the information it provides in support of the claim ■But also depends on its trustworthiness o Are the test team independent of the development team? o Were the processes for generating, executing and analysing test cases Systematic and thorough Implemented with rigour o Etc.. ■Possible to provide a trustworthiness argument for unit testing as well
5 th May 2009 Software Systems Engineering Initiative