Download presentation
Presentation is loading. Please wait.
Published byAndra White Modified over 9 years ago
1
Secure Systems Research Group - FAU 1 A survey of dependability patterns Ingrid Buckley and Eduardo B. Fernandez Dept. of Computer Science and Engineering Florida Atlantic University Boca Raton, FL, USA January 18, 2007
2
Secure Systems Research Group - FAU 2 Introduction Dependability is that property of a system that allows one to rely on its service Dependability for critical systems is of utter importance in business and critical infrastructures such as hospitals, airport and the electricity grid of a country. Dependability is comprised of several pertinent aspects: Fault Tolerance Safety Availability Reliability
3
Secure Systems Research Group - FAU 3 Introduction cont’d Fault Tolerance as it relates to systems, software and hardware is the ability to remain operable in the presence of faults. Safety is the prevention of catastrophic effects on the environment or the users of the system Availability is the ability of a system to perform its functions when needed. Reliability measures the success with which the system conforms to its specification. We use the Unified Modeling Language (UML), to represent fault tolerance patterns.
4
Secure Systems Research Group - FAU 4 Objectives Classify software and hardware fault tolerance patterns according to their objectives Analyze and evaluate the classified fault tolerance patterns Determine how to improve upon existing patterns. Design new fault tolerance patterns for unsupported areas within critical systems.
5
Secure Systems Research Group - FAU 5 Background A pattern is an encapsulated solution to a recurrent problem that solves a specific problem in a given context and can be tailored to fit different situations. A fault is a defective value in the state of a component or in the design of a system; a fault is the manifestation of an error. An error is a defective value in an erroneous state of a system A system failure occurs when there is a deviation from the system’s specification. A failure is the manifestation of an error. The System Development Life Cycle (SDLC) is the entire process of formal, logical steps taken to develop software.
6
Secure Systems Research Group - FAU 6 Fault Tolerance A system that can mask the effects of a fault and continue operating correctly is said to be fault tolerant. Fault tolerance requires redundancy and diversity which are directly linked to reliability and support availability of a system. Diversity in this sense speaks of having different versions of a function or system where all have the same functionality. The integration of hardware and software fault tolerance to cope with the various kinds of faults that can appear in a software system is a good foundation towards achieving a fault tolerant system. There are several fault tolerance patterns that have already been written and support different levels of the system architecture. Our aim is to focus on hardware and software fault tolerant patterns.
7
Secure Systems Research Group - FAU 7 Fault Tolerance Cont’d Fault Tolerance patterns are a fairly new area in association with critical systems, the need for them has increased with the need to secure systems against failure caused accidentally or intentionally by attackers. Due to the diversity of attacks on different types of systems, it is highly important to have effective fault tolerance techniques to mitigate faults that may lead to a failure in a critical system. To prevent failures the following is required: –Detection - Detecting the occurrence of errors –Locating the unit or component where the error has occurred (diagnosis). –Masking- masking errors so as to prevent malfunctioning of the system if a fault occurs. –Containment of faults -Confine or delimit the effects of the error. –Recovery- Reconfigure the system to remove the faulty unit and erase the effects of the error.
8
Secure Systems Research Group - FAU 8 Hardware Fault Tolerant Patterns Hardware fault tolerance applies hardware replication to enhance the system availability/reliability in the presence of hardware faults. Hardware Fault Tolerance patterns: -The Watch Dog pattern primarily provides protection against time-based faults by creating an alarm whenever liveness messages are not received in a given time frame.
9
Secure Systems Research Group - FAU 9 Hardware Fault Tolerant Patterns Cont’d –Fail Stop Processor : The Fail-Stop Processor pattern mainly aims at transforming errors that lead to Byzantine/complex failures, and is based on redundancy and comparing output from all replicas to reach an agreement. –Acknowledgement : The Acknowledgement pattern detects crash failures and is based on acknowledging the reception of input within a given time interval.
10
Secure Systems Research Group - FAU 10 Software Fault Tolerant Patterns Software fault tolerance applies software redundancy by means of diversity of design to tolerate software faults that can occur at the design, programming or maintaining phases of the software development cycle. Software Fault Tolerance patterns: –Roll forward : The Roll Forward pattern is a failure recovery pattern which detects and recovers from a fault by monitoring two replicas for errors.
11
Secure Systems Research Group - FAU 11 Software Fault Tolerant Patterns Con’t –Input Guard : Input Guard pattern stops erroneous input from propagating the error inside a component. A guard is placed at every access point of the component to check the validity of the input. –Fault Container : The Fault Container pattern provides the same benefits as the combination of the Input Guard and the Output Guard patterns, because it prevents an error from being propagated inside and outside a given component.
12
Secure Systems Research Group - FAU 12 Hardware/Software Fault Tolerance Pattern The Software Redundancy Pattern deals with hardware, software and environmental faults at the same time.
13
Secure Systems Research Group - FAU 13 Patterns diagram for the fault tolerance domain
14
Secure Systems Research Group - FAU 14 Analysis of Patterns Pattern Advantage Disadvantage WatchdogCan be used improve deadlock detection, where strokes can be keyed or contains data to identify strokes from different computational steps. Does not actually checks that the internal computation processing is correct AcknowledgementThe design complexity introduced by the is very low. Does not introduce any space overhead Does not provide means to tolerate faults in a system. Rather, it provides means detect errors. It introduces relatively elevated space overhead that is proportional to the number of simultaneous errors it can deal with Fail Stop ProcessorIntroduces low time overhead since the processors function in parallel The processors are replicas of the original system on which the Fail-Stop Processor pattern is applied, without any additional functionality. meaning that in practice the processors can be replicas of a legacy system, which cannot be subject to any internal changes such as those that are needed if additional functionality would be required by the processors. The error on the monitored system is detected only after some input has been issued to it. The timeout must be set based on the time it takes for the input to reach the monitored system plus the time it takes for the acknowledge to reach monitoring system.
15
Secure Systems Research Group - FAU 15 Analysis of Patterns Cont’d Pattern AdvantageDisadvantage Roll ForwardThe time overhead imposed by this pattern is low when errors occur: the failed replica is discarded, and the unaffected replica processes the subsequent inputs. The time overhead imposed by this pattern in the absence of errors is high; before the replica Is able to receive and process new input, it must copy its new state to the other replica. Input GuardIt stops the contamination of the guarded component from erroneous input that does not conform to the specification of the guarded component. There are various ways that the Input Guard pattern can be implemented, each providing different benefits with respect to the time or space overhead introduced by the guard. Cannot prevent the propagation of errors that do conform with the specification of the guarded component. Has significant time and space over head Fault ContainerIt stops of errors expressed as input and output content or timing that does not conform to a component specification from entering or exiting that component. The undefined behavior of the container in the presence of errors allows its combination with error detection and error masking patterns The Fault Container pattern cannot prevent the propagation of errors that do not conform with the specification of the contained component. Unless combined with some error detection and system recovery mechanisms, this pattern will result in send- or receive-omission failures (i.e. failure to send output or receive input of the contained component).
16
Secure Systems Research Group - FAU 16 Conclusion There is a need to improve upon current Fault Tolerant Patterns based on our analysis. New Fault Tolerance Patterns are necessary to provide dependability in distributed systems because many of the fault Tolerance patterns are very similar and do not provide a comprehensive support for errors that can lead to failure.
17
Secure Systems Research Group - FAU 17 Future Work Safety, Availability and Reliability Patterns being researched. Defining areas of need where current Fault Tolerance Patterns are lacking or require improvement. Designing new Fault Tolerance Patterns.
18
Secure Systems Research Group - FAU 18 Recommendations and Questions Feed back:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.