1 Reliable Web Services by Fault Tolerant Techniques: Methodology, Experiment, Modeling and Evaluation Term Presentation Presented by Pat Chan 3 May 2006
2 Outline Introduction Problem Statement Methodologies for Web Service Reliability New Reliable Web Service Paradigm Road Map for Experiment Experimental Results and Discussion Conclusion
3 Introduction Service-oriented computing is becoming a reality. Web Service is a promoting technique in the internet. The benefit of interoperability, reusability, and adaptability. Reliability is an important issue. Existing web service model needs to be extended to assure survivability and reliability. We propose experimental settings and offer a roadmap to dependable Web services.
4 Reliability "a measure of the success with which the system conforms to some authoritative specification" Guaranteed delivery Duplicate elimination Ordering Crash tolerance State synchronization
5 What are Web Services ? Self-contained, modular applications built on deployed network infrastructure including XML and HTTP Use open standards for description (WSDL), discovery (UDDI) and invocation (SOAP)
6 Web Services Internet UDDI WSDL HTTP/SOAP WSDL
7 Web Services Architecture SOAP HTTP/SMTPXMLTCP/IP Directory Inspection Building Block Modules Building Block Modules Inter Application Protocols Referral Routing Security License EventingTransactions Reliable Messaging The Internet Description … …
8 Web Services Benefits of WS Service-oriented Highly accessible Open specification Easy integration Simplicity DynamicStandard Web Services Build common infrastructure reducing the barriers of business integration with lower costs and faster speed.
9 Problems of Web Services Transaction Atomicity is not provided Security Insecure Internet transportation Reliability The internet is inherently unreliable No single underlying “transport protocols” address all the reliability issues.
10 Problem Statement Fault-tolerant techniques Replication Diversity Replication is one of the efficient ways for providing reliable systems by time or space redundancy. Increasing the availability of distributed systems Key components are re-executed or replicated Protect against hardware malfunctions or transient system faults. Another efficient technique is design diversity. By independently designing software systems or services with different programming teams, Resort in defending against permanent software design faults. We focus on the analysis of the replication techniques when applied to Web services. A generic Web service system with spatial as well as temporal replication is proposed and investigated.
11 Methodologies for Reliable Web services -- Redundancy Spatial redundancy Static redundancy, all replicas are active at the same time and voting takes place to obtain a correct result. Dynamic redundancy engages one active replica at one time while others are kept in an active or in standby state. Temporal redundancy Redundancy in time
12 Methodologies for Reliable Web services -- Diversity Protect redundant systems against common-mode failures With different designs and implementations, common failure modes will probably cause different error effects. N-version programming, recovery blocks…
13 Failure Response Stages of Web Services Fault confinement Fault detection Diagnosis Fail-over Reconfiguration Recovery Restart Repair Reintegration
14 Fault Confinement Fault Detection FailoverDiagnosis Online Offline Reconfiguration Recovery Restart Repair Reintegration
15 Replication Manager Web service selection algorithm WatchDog UDDI Registry WSDL Web Service IIS Application Database Web Service IIS Application Database Web Service IIS Application Database Client Port Application Database 1.Create web services 2.Select primary web service (PWS) 3.Register 4. Look up 5. Get WSDL 6.Invoke web service 7.Keep check the availability of the PWS 8.If PWS failed, reselect the PWS. 9.Update the WSDL Propose Paradigm
16 RM sends message to the Web Service Reselect a primary Web Service Do not get reply Map the new address to the WSDL System Fail Get reply All Service failed Work Flow of the Replication Manager
17 Road Map for Experiment Research Redundancy in time Redundancy in space Sequentially Parallel Majority voting using N modular redundancy Diversified version of different services
18 Experiments A series of experiments are designed and performed for evaluating the reliability of the Web service, single service without replication, single service with retry or reboot and, service with spatial replication. We will also perform retry or failover when the Web service is down.
19 Summary of the Experiments NoneRetry/ Reboot FailoverBoth (hybrid) Single service, no retry 0-- Single service with retry --1 Single service with reboot --2 Spatial replication -- 34
20 Parameters of the Experiments Parameters Current setting/metric Request frequency1 req/min Polling frequency5 ms Number of replicas5 Client timeout period for retry10 s Failure rate λ# failures/hour Load (profile of the program)% or load function Reboot time10 min Failover time1 s
21 Experimental Results Experiments over 360 hour period (43200 reqs) NormalResource Problem Entry Point Failure Network Level Fault Injection Exp Exp Exp Exp Exp Retry 11.97% to 4.93% Reboot 11.97% to 6.44% Failover 11.97% to 3.56% Retry and Failover 11.97% to 2.59%
22 Number of Failure When the Server is Normal Situation
23 Number of Failure When the Server is Busy
24 Number of Failure When the Server Reboots Periodically
25 Network Level Fault Injection
26 Reliability of the System Over Time
27 (a) (b) P1P1 λ1λ1 μ1C2μ1C2 S-j P2P2 μ2C2μ2C2 λ2λ2 S-j-1 S S-n F λNλN μ*c 2 (1-c 1 )μ* λ*λ* S-1S-2 λ*λ* μ*c2μ*c2 λ*λ* (1-c 1 )μ* F (1-c 1 )μ 1 (1-c 1 )μ 2 (1-c 2 )μ 1 (1-c 2 )μ 2 Reliability Model
28 Reliability Model IDDescriptionValue λnλn Network failure rate0.02 λ*Web service failure rate0.025 λ1λ1 Resource problem rate0.142 λ2λ2 Entry point failure rate0.150 μ*Web service repair rate0.286 μ1μ1 Resource problem repair rate0.979 μ2μ2 Entry point failure repair rate0.979 C1C1 Probability that the RM response on time0.9 C2C2 Probability that the server reboot successfully0.9
29 SHARPE Failure rate Reliability with different failure rate
30 Conclusion Surveyed replication and design diversity techniques for reliable services. Proposed a hybrid approach to improving the availability of Web services. Carried out a series of experiments to evaluate the availability and reliability of the proposed Web service system. Developed the Reliability Model for the proposed system.