1 Copyright © 2003 M. E. Kabay. All rights reserved. Critical Systems Specification IS301 – Software Engineering Lecture #18 – M. E. Kabay, PhD, CISSP Dept of Computer Information Systems Norwich University
2 Copyright © 2003 M. E. Kabay. All rights reserved. Acknowledgement All of the material in this presentation is based directly on slides kindly provided by Prof. Ian Sommerville on his Web site at Used with Sommerville’s permission as extended by him for all non-commercial educational use Copyright in Kabay’s name applies solely to appearance and minor changes in Sommerville’s work or to original materials and is used solely to prevent commercial exploitation of this material
3 Copyright © 2003 M. E. Kabay. All rights reserved. Topics Software reliability specification Safety specification Security specification
4 Copyright © 2003 M. E. Kabay. All rights reserved. Dependable Systems Specification Processes and techniques for developing specification for System availability Reliability Safety Security
5 Copyright © 2003 M. E. Kabay. All rights reserved. Functional and Non- Functional Requirements System functional requirements Define error checking Recovery facilities and features Protection against system failures Non-functional requirements Required reliability Availability of system
6 Copyright © 2003 M. E. Kabay. All rights reserved. System Reliability Specification Hardware reliability P{hardware component failing}? Time to repair component? Software reliability P{incorrect output}? Software can continue operation after error HW often causes stoppage Operator reliability P{operator error}?
7 Copyright © 2003 M. E. Kabay. All rights reserved. What Happens When All Components Must Work? Consider system with 2 components A and B where P{failure of A} = P A P{failure of B} = P B P{not A} = 1 – P{A} P{A&B} = P{A}*P{B} I.e., at least 1 will fail Operation of system depends on both of them P{A will not fail} = (1 – P A ) P{B will not fail} = (1 – P B ) P{A & B will both not fail} = (1 – P A ) (1 – P B ) P{system failure} = 1 – [(1 – P A ) (1 – P B )]
8 Copyright © 2003 M. E. Kabay. All rights reserved. General Principles If there are a number of elements i with probability of failure P i and all of them have to work for the system to work, then the probability of system failure P S is Therefore, as number of components (all of which need to function) increases then probability of system failure increases P S = 1 - (1 – P i ) i
9 Copyright © 2003 M. E. Kabay. All rights reserved. Component Replication If components with failure probability P are replicated so that system works as long as any one of components works, then probability of system failure is P S = P{all will fail} = P n If the system will fail if any of the components fail, then probability of system failure is P S = P{at least 1 will fail} = P{not all will work} = 1 - (1 – P) n
10 Copyright © 2003 M. E. Kabay. All rights reserved. Examples of Functional Reliability Requirements Predefined range for all values input by operator shall be defined and system shall check all operator inputs fall within predefined range System shall check all disks for bad blocks when it initialized System must use N-version programming to implement braking control system System must be implemented in safe subset of Ada and checked using static analysis
11 Copyright © 2003 M. E. Kabay. All rights reserved. Non-Functional Reliability Specification Required level of system reliability required should be expressed in quantitatively Reliability a dynamic system attribute: Reliability specifications related to source code meaningless: “No more than N faults/1000 lines” -- BAD Useful only for post-delivery process analysis -- trying to assess quality of development techniques Appropriate reliability metric should be chosen to specify overall system reliability
12 Copyright © 2003 M. E. Kabay. All rights reserved. Reliability Metrics Reliability metrics: units of measurement of system reliability Count number of operational failures Relate to demands on system Time system has been operational Long-term measurement program Required to assess reliability of critical systems
13 Copyright © 2003 M. E. Kabay. All rights reserved. Reliability Metrics
14 Copyright © 2003 M. E. Kabay. All rights reserved. Probability of Failure on Demand (POFOD) Probability system will fail when service request made. Useful when demands for service intermittent and relatively infrequent Appropriate for protection systems where services demanded occasionally and where there serious consequence if service not delivered Relevant for many safety-critical systems with exception management components Emergency shutdown system in chemical plant
15 Copyright © 2003 M. E. Kabay. All rights reserved. Rate of Fault Occurrence (ROCOF) Reflects rate of occurrence of failure in system ROCOF of means 2 failures likely in each 1000 operational time units e.g. 2 failures per 1000 hours of operation Relevant for operating systems, transaction processing systems where system has to process large number of similar requests relatively frequent Credit card processing system, airline booking system
16 Copyright © 2003 M. E. Kabay. All rights reserved. Mean Time to Failure Measure of time between observed failures of system. reciprocal of ROCOF for stable systems MTTF of 500 means mean time between failures 500 time units Relevant for systems with long transactions i.e. where system processing takes long time. MTTF should be longer than transaction length Computer-aided design systems where designer will work on design for several hours, word processor systems
17 Copyright © 2003 M. E. Kabay. All rights reserved. Availability Measure of fraction of time system available for use Takes repair and restart time into account Availability of means software available for 998 out of 1000 time units Relevant for non-stop, continuously running systems Telephone switching systems, railway signaling systems
18 Copyright © 2003 M. E. Kabay. All rights reserved. Failure Consequences Reliability measurements do NOT take consequences of failure into account Transient faults may have no real consequences Other faults may cause Data loss Corruption Loss of system service Identify different failure classes Use different metrics for each of these. Reliability specification must be structured
19 Copyright © 2003 M. E. Kabay. All rights reserved. Failure Consequences When specifying reliability, it not just number of system failures matter but consequences of these failures Failures have serious consequences clearly more damaging than those where repair and recovery straightforward In some cases, therefore, different reliability specifications for different types of failure may be defined
20 Copyright © 2003 M. E. Kabay. All rights reserved. Failure Classification
21 Copyright © 2003 M. E. Kabay. All rights reserved. Steps to Reliability Specification For each sub-system, analyze consequences of possible system failures From system failure analysis, partition failures into appropriate classes For each failure class identified, set out reliability using appropriate metric. Different metrics may be used for different reliability requirements Identify functional reliability requirements to reduce chances of critical failures
22 Copyright © 2003 M. E. Kabay. All rights reserved. Bank Auto-Teller System Expected usage statistics Each machine in network used 300 times day Lifetime of software release 2 years Each machine handles about 220,000 transactions over 2 years Total throughput Bank has 1,000 ATMs ~300,000 database transactions per day ~110M transactions per year
23 Copyright © 2003 M. E. Kabay. All rights reserved. Bank ATM (cont’d) Types of failure Single-machine failures Affect individual ATM Network failures Affect groups of ATMs Lower throughput Central database failures Potentially affect entire network
24 Copyright © 2003 M. E. Kabay. All rights reserved. Examples of Reliability Spec. Failure Class ExampleReliability Metric Permanent, non- corrupting System fails to operate w/ any card input. SW must be restarted to correct failure. ROCOF 1 occurrence /1,000 days Transient, non- corrupting Mag stripe data cannot be read on undamaged card that is input POFOD 1 in 1,000 transactions Transient, corrupting Pattern of transactions across network causes DB corruption Unquantifiable! Should never happen in lifetime of system
25 Copyright © 2003 M. E. Kabay. All rights reserved. Specification Validation Impossible to validate very high reliability specifications empirically E.g., in ATM example: “no database corruptions” =POFOD of less than 1 in 220 million If transaction takes 1 second, then simulating one day’s ATM transactions on a single system would take 300,000 seconds = 3.5 days Testing a single run of 110M transactions would take 3.5 years It would take longer than system’s lifetime (2 years) to test it for reliability
26 Copyright © 2003 M. E. Kabay. All rights reserved. Topics Software reliability specification Safety specification Security specification
27 Copyright © 2003 M. E. Kabay. All rights reserved. Safety Specification Safety requirements of system should be Separately specified Based on analysis of possible hazards and risks Safety requirements Usually apply to system as whole rather than to individual sub-systems In systems engineering terms, safety of system is emergent property
28 Copyright © 2003 M. E. Kabay. All rights reserved. Safety Life-Cycle
29 Copyright © 2003 M. E. Kabay. All rights reserved. Safety Processes Hazard and risk analysis Assess hazards and risks of damage associated with system Safety requirements specification Specify set of safety requirements which apply to system Designation of safety-critical systems Identify sub-systems whose incorrect operation may compromise system safety. Ideally, these should be as small part as possible of whole system. Safety validation Check overall system safety
30 Copyright © 2003 M. E. Kabay. All rights reserved. Hazard and Risk Analysis
31 Copyright © 2003 M. E. Kabay. All rights reserved. Hazard Analysis Stages Hazard identification Identify potential hazards which may arise Risk analysis and hazard classification Assess risk associated with each hazard Hazard decomposition Decompose hazards to discover their potential root causes Risk reduction assessment Define how each hazard must be taken into account when system designed
32 Copyright © 2003 M. E. Kabay. All rights reserved. Fault-Tree Analysis Method of hazard analysis Starts with identified fault Works backward to causes of fault Used at all stages of hazard analysis Preliminary analysis Detailed SW checking Top-down hazard analysis method May be combined with bottom-up methods Start with system failures Lead to hazards
33 Copyright © 2003 M. E. Kabay. All rights reserved. Fault-Tree Analysis Identify hazard Identify potential causes of hazard Usually several alternative causes Link these on fault-tree with ‘or’ or ‘and’ symbols Continue process until root causes identified Consider following example How data might be lost System where backup process running
34 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Tree Example
35 Copyright © 2003 M. E. Kabay. All rights reserved. Risk Assessment Assesses hazard severity, hazard probability and accident probability Outcome of risk assessment statement of acceptability Intolerable. Must never arise or result in accident As low as reasonably practical (ALARP) Must minimize possibility of hazard given cost and schedule constraints Acceptable. Consequences of hazard acceptable and no extra costs should be incurred to reduce hazard probability
36 Copyright © 2003 M. E. Kabay. All rights reserved. Levels of Risk As low as reasonably practical RISKS COSTS
37 Copyright © 2003 M. E. Kabay. All rights reserved. Risk Acceptability Acceptability of risk determined by human, social and political considerations In most societies, boundaries between regions pushed upwards with time; i.e., society increasingly less willing to accept risk For example, costs of cleaning up pollution may be less than costs of preventing it but pollution may not be socially acceptable Risk assessment often highly subjective Often lack hard data on real probabilities Risks identified as probable, unlikely, etc. depends on who making assessment
38 Copyright © 2003 M. E. Kabay. All rights reserved. Why Do We Lack Firm Risk Probabilities and Costs? Failure of observation – don’t notice Failure of reporting – don’t tell anyone Variability of systems – can’t pool data Difficulty of classifying incidents – can’t compare problems Difficulty of measuring costs – don’t know all repercussions
39 Copyright © 2003 M. E. Kabay. All rights reserved. Risk Reduction System should be specified so hazards do not arise or result in accident Hazard avoidance Design so hazard can never arise during correct system operation Hazard detection and removal Design so hazards are detected and neutralized before they result in accident Damage limitation or mitigation Design so consequences of accident are minimized or at least reduced
40 Copyright © 2003 M. E. Kabay. All rights reserved. Specifying Forbidden Behavior: Examples System shall not allow users to modify access permissions on any files they have not created (security) System shall not allow reverse thrust mode to be selected when aircraft in flight (safety) System shall not allow simultaneous activation of more than three alarm signals (safety)
41 Copyright © 2003 M. E. Kabay. All rights reserved. Topics Software reliability specification Safety specification Security specification
42 Copyright © 2003 M. E. Kabay. All rights reserved. Security Specification Similar to safety specification Not possible to specify security requirements quantitatively Requirements often ‘shall not’ rather than ‘shall’ requirements Differences No well-defined notion of security life cycle for security management Generic threats rather than system specific hazards Mature security technology (encryption, etc.) but problems in transferring into general use – corporate culture
43 Copyright © 2003 M. E. Kabay. All rights reserved. Security Specification Process
44 Copyright © 2003 M. E. Kabay. All rights reserved. Stages in Security Specification (1) Asset identification and evaluation Assets (data and programs) identified Required degree of protection Criticality and sensitivity Threat analysis and risk assessment Possible threats Risks estimated Threat assignment Identified threats related to assets For each identified asset, list of associated threats
45 Copyright © 2003 M. E. Kabay. All rights reserved. Stages in Security Specification (2) Technology analysis Identify available security technologies Assess applicability against identified threats Security requirements specification Policy Procedure Technology
46 Copyright © 2003 M. E. Kabay. All rights reserved. HOMEWORK Apply full Read-Recite-Review phases of SQ3R to Chapter 17 of Sommerville’s text For next class (Tuesday), apply Survey- Question phases to Chapter 18 on Critical Systems Development. For Thursday 30 Nov 2003: REQUIRED Hand in responses to Exercises 17.1(2 points),.2(6),.3(4),.4(4),.5(2),.6(6) and.7(6) = 30 points total OPTIONAL by 6 Nov: 17.8 and/or 17.9 for 3 extra points each.
47 Copyright © 2003 M. E. Kabay. All rights reserved. DISCUSSION