Application of Fault Injection to Globus Grid Middleware Nik Looker & Jie Xu University of Leeds, Leeds. LS2 9JT, UK Tianyu Wo & Jinpeng Huai Beihang University, Beijing , PRC 1 School of Computing FACULTY OF ENGINEERING
A Historical Perspective
Dependability & Security To understand dependability it is important to understand the three main concepts that it utilises: Attributes Measurements of how Dependable and Secure a system is Threats Things that may affect the Dependability and Security of a system Means Ways of increasing the Dependability and Security of a system
Attributes Availability The probability that a service is present and ready for use Reliability The capability of maintaining the service and service quality Safety The absence of catastrophic consequences Confidentiality Information is accessible only to those authorised to use it Integrity The absence of improper system alterations Maintainability To undergo modifications and repairs
Threats Fault A fault is a defect in a system Error An error is a discrepancy between the behaviour of a system and its specified behaviour within the system boundary i.e. it enters an unspecified state Failure A failure is an instance in time when a system displays behaviour that is contrary to its specification at the system boundary
Fault-Error-Failure Chains As a general rule: A fault, when activated, can lead to an error An error is an invalid state An invalid state generated by an error may lead to either another error or a failure A generated error can be treated as another fault A failure is an observable deviation from the specified behaviour at the system boundary
Means Dependability means are ways of breaking fault-error- failure chains. Four main classifications: Fault Prevention Fault Removal Fault Forecasting Fault Tolerance
Fault Injection MTBF may be very large Attempt to speed up this process by injecting faults Cause the execution of seldom used control pathways within a system Either A failure may occur System’s fault tolerance mechanism will handle the fault or the failure will go undetected and uncorrected :-( Network Level Fault Injection Corrupt Drop Reorder
Network Level Fault Injection
Modified Network Level Fault Injection This allows a fault injector to intercept an entire middleware message, and thus we can decode it and modify specific parts of it.
Grid-FIT
Injecting Faults in a Production Environment
System Model
Extended Fault Model
Extended Failure Model
Failure Detection
Application to Globus Initial experiments were based around Web Services This resulted in the WS-FIT tool (Web Service - Fault Injection Technology) Ultimate aim was to apply this method to Grids This has resulted in the Grid-FIT tool Modifications and initial experiments have been conducted Modified hooks to work with Globus Adapted FIT decoding to Globus message structure Repeated an earlier set of experiments rewritten for Globus 4
Test Case
Results
Future Work Apply Grid-FIT to complex systems CoLaB Short for Collaboration of Leeds and Beihang, is a joint laboratory founded by the Beihang University, PRC & University of Leeds, UK. in The primary mission of CoLaB is research in Software and Security, each linked through a common objective To support the needs of the next generation of Internet computing. CROWN Short for China Research and Development environment Over Wide- area Network, is a grid test bed to facilitate scientific activities in different disciplines. We are currently working on integrating Grid-FIT with CROWN This will give Grid-FIT a large test bed to refine its method and models This will give CROWN a native Dependability Assessment method Part of the integration will be to integrate Grid-FIT as an Eclipse plug-in
Demonstrations & Workshop Demonstrations Venue: White Rose Grid Stall Wednesday 20th September13:45 – 14:30 Thursday 21st September10: :45 CROWNTianyu Wo FT-Grid Paul Grid-FIT Nik Mini-Workshop on UK-China e-Science Collaborations Venue: Conference Room 1 Wednesday 20th September 17: :00