Fault Tolerance in Distributed Systems Gökay Burak AKKUŞ Cmpe516 – Fault Tolerant Computing.

Slides:



Advertisements
Similar presentations
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Advertisements

Lecture 8: Testing, Verification and Validation
UK e-Science All Hands Meeting 2005 Paul Groth, Simon Miles, Luc Moreau.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Fault-Tolerant Systems Design Part 1.
Software Fault Injection for Survivability Jeffrey M. Voas & Anup K. Ghosh Presented by Alison Teoh.
Critical Software Security Through Replication and Virtualization A Research Proposal Dennis Edwards Sharon Simmons Arangamanikkannan Manickam.
Reliability on Web Services Presented by Pat Chan 17/10/2005.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
Making Services Fault Tolerant
Transparent Robustness in Service Aggregates Onyeka Ezenwoye School of Computing and Information Sciences Florida International University May 2006.
Distributed components
Presentation 7 part 2: SOAP & WSDL. Ingeniørhøjskolen i Århus Slide 2 Outline Building blocks in Web Services SOA SOAP WSDL (UDDI)
Latest techniques and Applications in Interprocess Communication and Coordination Xiaoou Zhang.
A New Computing Paradigm. Overview of Web Services Over 66 percent of respondents to a 2001 InfoWorld magazine poll agreed that "Web services are likely.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project.
Smart Redundancy for Distributed Computation George Edwards Blue Cell Software, LLC Yuriy Brun University of Washington Jae young Bang University of Southern.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
Reliability on Web Services Pat Chan 31 Oct 2006.
Distributed Information Systems - The Client server model
Constructing Reliable Software Components Across the ORB M. Robert Rwebangira Howard University Future Aerospace Science and Technology.
1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek.
Software Issues Derived from Dr. Fawcett’s Slides Phil Pratt-Szeliga Fall 2009.
.NET Mobile Application Development Remote Procedure Call.
SIMULATING ERRORS IN WEB SERVICES International Journal of Simulation: Systems, Sciences and Technology 2004 Nik Looker, Malcolm Munro and Jie Xu.
DISTRIBUTED PROCESS IMPLEMENTAION BHAVIN KANSARA.
Maintaining and Updating Windows Server 2008
THE NEXT STEP IN WEB SERVICES By Francisco Curbera,… Memtimin MAHMUT 2012.
1 Web Services Distributed Systems. 2 Service Oriented Architecture Service-Oriented Architecture (SOA) expresses a software architectural concept that.
UK e-Science All Hands Meeting 2005 Paul Groth, Simon Miles, Luc Moreau.
ASG - Towards the Adaptive Semantic Services Enterprise Harald Meyer WWW Service Composition with Semantic Web Services
SAMANVITHA RAMAYANAM 18 TH FEBRUARY 2010 CPE 691 LAYERED APPLICATION.
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 09. Review Introduction to architectural styles Distributed architectures – Client Server Architecture – Multi-tier.
Distributed Systems: Concepts and Design Chapter 1 Pages
Web Services Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
XML Web Services Architecture Siddharth Ruchandani CS 6362 – SW Architecture & Design Summer /11/05.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Practical Byzantine Fault Tolerance
Fault-Tolerant Systems Design Part 1.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.
Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Survey of Adding Fault Tolerance to Service Oriented Architecture Ingrid Buckley 03/26/09.
1 Reliable Web Services by Fault Tolerant Techniques: Methodology, Experiment, Modeling and Evaluation Term Presentation Presented by Pat Chan 3 May 2006.
CprE 458/558: Real-Time Systems
Safety-Critical Systems 7 Summary T V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis.
Fault-Tolerant Systems Design Part 1.
Recording the Context of Action for Process Documentation Ian Wootten Cardiff University, UK
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Middleware for Fault Tolerant Applications Lihua Xu and Sheng Liu Jun, 05, 2003.
Chapter : 9 Architectural Design
1 Developing Aerospace Applications with a Reliable Web Services Paradigm Pat. P. W. Chan and Michael R. Lyu Department of Computer Science and Engineering.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Provenance in Distr. Organ Transplant Management EU PROVENANCE project: an open provenance architecture for distributed.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Maintaining and Updating Windows Server 2008 Lesson 8.
Software Architecture Patterns (3) Service Oriented & Web Oriented Architecture source: microsoft.
A service Oriented Architecture & Web Service Technology.
Week#3 Software Quality Engineering.
Service Oriented Architecture (SOA) Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
COTS testing Torbjørn Skramstad.
Sabri Kızanlık Ural Emekçi
COTS testing Tor Stålhane.
Reliable Web Services: Methodology, Experiment and Modeling International Conference on Web Services (ICWS 2007) Pat. P. W. Chan, Michael R. Lyu Department.
Presentation transcript:

Fault Tolerance in Distributed Systems Gökay Burak AKKUŞ Cmpe516 – Fault Tolerant Computing

Distributed Systems  Main focus on Services based systems Web Services Grid Computing...

Service Orientation  diverse programming languages  on diverse platforms  Span organisational boundaries  Service Oriented Architectures (SOA) Web Services Grid Computing  SOA is an architectural model that emphasises properties of interoperability and location transparency  Collection of services each service can be considered as a resource that is either provided or consumed

Dependability  Dependability is a collective term that encompasses Reliability Performance Maintainability Security  Reliability is the part of dependability concerned with the probability that a given system will behave according to its requirements

SOAs  the development and integration of complex systems by representing software functionality as discoverable services on a network.  A traditional way to increase the dependability of distributed systems is through the use of fault tolerance techniques

 The approach of design diversity Multi-Version design (MVD)  availability of multiple functionally- equivalent services

Comparison  Single-version system  Traditional MVD system  Provenance-aware MVD system

CMF  Common mode failure one of shared services fail, then the failure may propagate back to the calling services.  occurs when independent or nonindependent faults lead to similar errors between versions of an MVD system.

 Such failures are a “worst case” scenario in a fault-tolerant system as such failures may be passed through the system undetected  often safer to return no result, and alert an operator and/or place a system in a safestate, than it is to allow an undetected error occur.

CMF by failure of a shared service  reduces the confidence that can be placed in the results of design diversity-based fault tolerance schemes  Provenance introduced as a solution to this problem

Provenance  The provenance of a piece of data is the documentation of process that led to that data.  Provenance can be used for verifying a process, reproduction of a process and providing context to a piece of result data

Provenance in the context of SOAs  interaction provenance for some data, interaction provenance is the documentation of interactions between actors that led to the data  actor provenance For some data, actor provenance is documentation that can only be provided by a particular actor pertaining to the process that led to the data  In a workflow based SOA interaction, provenance provides a record of the invocations of all the services that are used in a given workflow, including the input and output data of the various invoked services.

Usage of provenance  Through an analysis of interaction provenance, patterns in workflow execution can be detected  The data of whether a common service was invoked by various other services in a workflow can be used in a fault tolerance algorithm to see if any faults in a workflow stem from the misbehaviour of one service.

 Provenance provides a picture of a system's current and past operational state, which can be used to isolate and detect faults  A scheme that performs voting on the results of functionally-equivalent services in order to mask faults of the fault model (next slide) is proposed

PReServ  Provenance Recording for Services  a Java-based Web Services implementation of the Provenance Recording Protocol  provenance aware SOA by using 3 components A provenance store that stores, and allows for queries of provenance A client side library for communicating with the provenance store A handler for the Apache Axis Web Service container that automatically records interaction provenance for Axis based services and clients by recording incoming and outgoing SOAP messages in a specified provenance store.

MVD system  A service i invokes k services in its workflow  a counter Ck stores the number of times a service k is invoked by MVD channel workflows in the system.  if i produces a result that agrees with the consensus result, then every Sk in that service’s workflow is increased by one, else Sk is set to 0.  weightings of each service k is then calculated as

Voting  FT Grid system used for voting  Based on weighting eliminated results are obtained  User defined values are also added for voting process

 If a service k1 has a degree of 1, then only one MVD channel invokes that service  If k1 has a degree of 2, then two MVD channels invoke it  then bias the weightings of Sk based on user-defined settings  Example: a user specifies a bias of 0.95 for a servicewith a degree of 2 then the final weighting of a service where Si has a degree of 2 Wi = Si * 0.95 if any service within a given channel fall below a user-defined minimum weighting, then that channel is discarded from the voting process.

Experiments  a total of 12 web services developed and spread across 5 machines  using Apache Tomcat/Axis as a hosting environment  each with provenance functionality, and each registered with a UDDI server.  5 “Import Duty” services developed  4 “Exchange Rate” services developed  3 “Tax Lookup” services developed

 simulate a design defect and/or malicious attack by perturbing code in two of the exchange rate services – ER3 and ER4  probability of failure (in this case, returning an incorrect value) of 0.33 and 0.5 respectively.

Applied Experiments  Experiment 1 Execute a single version client-side application that invokes a random import duty service, passing it a randomly generated set of parameters. then compare the result it receives against the fault-free local import duty service, and logs whether or not a correct answer has been returned.

 Experiment-2 execute a client-side MVD application with no provenance capability application invokes all 5 import duty services, and waits for the first three results to be returned. application discards the results of any import duty service whose weighting falls below a user-defined value, and performs consensus voting on the remaining results. if no consensus be reached, or the number of channels to vote on are less than three, then the client waits for an additional MVD channel to return results, checks the channel’s weighting to see whether it should be discarded, and then votes accordingly. consensus is reached, or all 5 channels have been This continues until either consensus is reached, or all 5 channels have been invoked then compare the results

 Experiment-3 execute an MVD client-side application with provenance capability. Client invokes all 5 import duty services, and waits for the first three results to be returned. Analyzes provenance records of these channels, and discards the results of any channel that includes a service that falls below a minimum, user- defined weighting. if no consensus be reached, or the number of channels to vote on be less than three, then the MVD application waits for an additional channel to return results, checks to see if this channel should be discarded, and then votes accordingly. This continues until either consensus is reached, or all 5 channels have been invoked Results from the voter are then compared against the local fault free import duty service.

Experimental Results  Each experiment iterates 1000 times  Each experiment is repeated three times.  test system Apache Tomcat Web Services implemented using Apache Axis 1.1, 5 dual 3Ghz Xeon processor machines Fedora Core Linux 2

Generation of Weightings  history-based weighting scheme used  a client application similar to provenance-aware MVD scheme is ran  history weightings based on the consensus results of 1000 invocations of all five import duty services  No logging or verification of results

 the weightings of ER3 and ER4 show significant deviations  This is due to the faults that are injected into ER3 and ER4  Based on the results minimum acceptable weightings are set

Experiment 1- Single version system with no provenance capability  1000 tests on a random import duty service  164 incorrect results  16.4 %undetected incorrect results  Time for UDDI query of import duty service: ms  Total time until a result: 3895 ms.

 Common-mode failures are frequent  each channel has an approximately the same weighting value as there is no provenance data  So unreliable channels are not discarded from voting  Total time for result : 4842 ms  1 sec longer

MVD system with provenance capability  No single common-mode failure occurs  Timing: approximately the same value of experiment-2

Conclusion  Solutions for the provision of dependability in service- oriented architectures are needed  Approach: To extend the concept of design-diversity- based fault tolerance schemes (such as multi-version design) to the service-oriented paradigm  Leverage the benefits of SOAs in order to produce cheaper MVD systems that has traditionally been the case  Problem: Without the knowledge of the workflow of the services that forms channels within the MVD system, the potential arises for multiple channels to depend on the same service  Lead to increased incidence of common mode failure

Conclusion  The technique of provenance to analyze a service’s workflow is proposed  An initial scheme that uses provenance to calculate weightings of channels within an MVD system based on their workflow is detailed  A system is implemented to demonstrate the effectiveness of the scheme  Three different client applications is used to test approach  Single-version system: Fail on 16.4% of test iterations  Traditional MVD fault tolerance: Fail on 7.6% of test iterations  Provenance-aware MVD scheme: Failure rate of 0.6%  More dependable, no-common mode failures occurring & negligible performance overhead

Finally  This paper Details the potential for provenance data to be used during the voting process of an MVD scheme Implements an initial proof-of-concept for the approach  Future work will include investigation into obtaining QoS indicators from the metadata of each service in an MVD channel’s workflow (facilitated through actor provenance) and applying these to the weighting algorithm investigating the relationship between shared components and common-mode failure in more detail (to more finely tune voting scheme)

References  A Provenance-Aware Weighted Fault Tolerance Scheme for Service Based Applications, 2005  FT-Grid: A Fault-Tolerance System for e-Science, 2005

Questions?