Budapesti Műszaki és Gazdaságtudományi Egyetem Méréstechnika és Információs Rendszerek Tanszék Hibatűrő rendszerek tervezési mintái Segédfóliák az Autonóm.

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

Principles of Engineering System Design Dr T Asokan
System Integration and Performance
Budapesti Műszaki és Gazdaságtudományi Egyetem Méréstechnika és Információs Rendszerek Tanszék Hibatűrő rendszerek tervezési mintái Autonóm és hibatűrő.
Lecture 8: Testing, Verification and Validation
Principles of Engineering System Design Dr T Asokan
Chapter 19: Network Management Business Data Communications, 5e.
EECE499 Computers and Nuclear Energy Electrical and Computer Eng Howard University Dr. Charles Kim Fall 2013 Webpage:
FIU Chapter 7: Input/Output Jerome Crooks Panyawat Chiamprasert
2002 Conference & 8th Annual General Meeting Maintenance System with DeviceNet Presented by Tai Tateishi for OMRON Maintenance System with DeviceNet Masaru.
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
8. Fault Tolerance in Software
SIM5102 Software Evaluation
Managing Information Systems Information Systems Security and Control Part 2 Dr. Stephania Loizidou Himona ACSC 345.
Solver & Optimization Problems n An optimization problem is a problem in which we wish to determine the best values for decision variables that will maximize.
1 CSc Senior Project Software Testing. 2 Preface “The amount of required study of testing techniques is trivial – a few hours over the course of.
1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.
Management System Auditing
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
E.R.P.S University of Palestine. Risks in an ERP environment : The use of ERP systems clearly introduces additional risks into the system environment.
Today’s Lecture application controls audit methodology.
Solver & Optimization Problems n An optimization problem is a problem in which we wish to determine the best values for decision variables that will maximize.
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Extreme Programming Software Development Written by Sanjay Kumar.
March 13, 2001CSci Clark University1 CSci 250 Software Design & Development Lecture #15 Tuesday, March 13, 2001.
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
Transaction Processing System
Business Analysis and Essential Competencies
Audit Risk. "Audit risk" means the risk that the auditor gives an inappropriate audit opinion when the financial statements are materially misstated Audit.
Testing Basics of Testing Presented by: Vijay.C.G – Glister Tech.
Event Management & ITIL V3
Introduction to Software Testing. Types of Software Testing Unit Testing Strategies – Equivalence Class Testing – Boundary Value Testing – Output Testing.
Cohesion and Coupling CS 4311
Software Testing Yonsei University 2 nd Semester, 2014 Woo-Cheol Kim.
ISO NON-CONFORMANCE, CORRECTIVE AND PREVENTIVE ACTION.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Transactions and Locks A Quick Reference and Summary BIT 275.
CprE 458/558: Real-Time Systems
IT Risks and Controls Revised on Content Internal Control  What is internal control?  Objectives of internal controls  Types of internal controls.
February 15, 2004 Software Risk Management Copyright © , Dennis J. Frailey, All Rights Reserved Simple Steps for Effective Software Risk Management.
An introduction to Fault Detection in Logic Circuits By Dr. Amin Danial Asham.
TESTING THE NEW SYSTEM Various Approaches to Testing.
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
HNDIT23082 Lecture 09:Software Testing. Validations and Verification Validation and verification ( V & V ) is the name given to the checking and analysis.
Quality and reliability management in projects (seminar)
1 Phase Testing. Janice Regan, For each group of units Overview of Implementation phase Create Class Skeletons Define Implementation Plan (+ determine.
Incident Management A disruption in normal or standard business operation that affects the quality of service Goal: restore normal service as quickly as.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
ICS Area Managers Training 2010 ITIL V3 Overview April 1, 2010.
ITIL V3 Foundation Certification Exam Questions & Answers Sets Exin Certifications Presents.
Chapter 29: Program Security Dr. Wayne Summers Department of Computer Science Columbus State University
Defect testing Testing programs to establish the presence of system defects.
Powerpoint Templates Data Communication Muhammad Waseem Iqbal Lecture # 07 Spring-2016.
Week#3 Software Quality Engineering.
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
Information Systems Security
2. OPERATING SYSTEM 2.1 Operating System Function
Software Testing An Introduction.
Chapter 8 – Software Testing
Design for Quality Design for Quality and Safety Design Improvement
Lecture 09:Software Testing
Fault Tolerance Distributed Web-based Systems
Baisc Of Software Testing
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Regression Testing.
Quality and reliability management in projects
TYPES OF TESTING.
Internal Control Internal control is the process designed and affected by owners, management, and other personnel. It is implemented to address business.
Presentation transcript:

Budapesti Műszaki és Gazdaságtudományi Egyetem Méréstechnika és Információs Rendszerek Tanszék Hibatűrő rendszerek tervezési mintái Segédfóliák az Autonóm és hibatűrő inf. rsz. tárgyhoz Kocsis Imre

Ismétlés: singleton

Ismétlés: Facade

Ismétlés: Observer

Architekturális mintanyelv

Units of Mitigation  How can you keep the whole system from being unavailable when an error occurs?  „Design the system into parts that will contain both any errors and the error recovery. Choose the divisions that make sense for your system. Design the rest of the system around these parts that represent the basic units of error mitigation.”

Correcting Audits  Faulty data causes errors.  „Detect and correct data errors as soon as possible. Check related data for errors, correct and record the occurence of the error.”

Redundancy  How can we reduce the amount of time between error detection and the resumption of normal operation after error recovery?  „Provide redundant capabilities that support quick activation to enable error processing to continue in parallel with normal execution.”

Minimize Human Intervention  How can we prevent people from doing the wrong things and causing errors?  „Design the system in a way that it is able to process and resolve errors automatically, before they become failures. This speeds error recovery and reduces the risk of procedural errors.”

Maximize Human Participation  Should the system ignore people totally? That will reduce procedural errors.  „Know the user and their availability. Design the system to enable knowledgeable operating personnel to participate. […] Provide appropriate Maintenance Interfaces and Fault Observer capabilities […]”

Maintenance Interface  Should maintenance and application requests be intermingled on the application input and output channels?  „Provide a separate interface to the system for the (almost) exclusive use of maintenance interactions.”

Someone in Charge  Anything can go wrong, even during error processing. When this happens the system might stop doing the error processing in addition to not doing the normal processing.  „All fault tolerance related activities have some component of the system that is clearly in charge and has the ability to determine correct completion and the responsibility to take action if it does not complete correctly.”

Escalation  What does the system do when its attempt to process an error in a component is not acheiving the correct effect?  „When recovery or mitigation is failing, escalate the action to the next more drastic action.”

Detektálási minták

Fault Correlation  What fault is activating?  „Look at the unique signature of the error to sort it into the fault category for which error processing steps are known.”

Error Containment Barrier  What is the first thing that the system must do when it detects an error?  „Isolate the error to a unit of mitigation. Stop the error flow with a barrier, quarantine and initiate either error recovery or error mitigation.”

System Monitor  How does one part of a system keep track that another part is alive and functioning?  „Create a Monitor to study system behavior, or the behavior of specific parts of the system to make sure that they continue operating correctly. When the watched components stop, the monitor should report the occurence to the Fault Observer and initiate corrective actions.”

Detektálási minták

Existing Metrics  How to measure the severity of an overload without contributing to the overload?  „Use pre-existing indicators already tied to the resource as an indicator of the system’s overload condition.”  Megjegyzés: nem csak a teljesítményre igaz!

Detektálási minták

Routine Maintenance  How can we keep preventable errors from occuring?  „Perform routine, preventive maintenance on the system.”

Detektálási minták

Routine Exercises  How do you know that Redundant elements that will be called into service by a Failover in case of an error or failure will actually work?  „Routinely exercise, or execute the system components that will be required in an error situation. This will identify latent faults.”

Detektálási minták

Helyreállítási minták

Quarantine  How can the system prevent errors from spreading?  „Establish a barrier around the element that prevents it from both contributing to the useful work and also prevents it from propagating its error into other parts of the system.”

Helyreállítási minták