Reliable Web Services: Methodology, Experiment and Modeling International Conference on Web Services (ICWS 2007) Pat. P. W. Chan, Michael R. Lyu Department.

Reliable Web Services: Methodology, Experiment and Modeling International Conference on Web Services (ICWS 2007) Pat. P. W. Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek Department of Computer Science and Engineering Humboldt University Berlin, Germany Presented by Pat Chan

Outline Introduction to Web Services Problem Statement
Methodologies for Web Service Reliability New Reliable Web Service Paradigm Optimal Parameters Experimental Results and Discussion Conclusion Introduction Problem Statement Methodologies for Web Service Reliability New Reliable Web Service Paradigm Road Map for Experiment Experimental Results and Discussion Conclusion Future Directions

Introduction Service-oriented computing is becoming a reality.
The problems of service dependability, security and timeliness are becoming critical. We propose experimental settings and offer a roadmap to dependable Web services. With service-oriented computing becoming a reality, there is an increasing demand for dependability. Service-oriented Architectures (SOA) are based on a simple model of roles. Every service may assume one or more roles such as being a service provider, a broker or a user (requestor). This model simplifies interoperability as only standard communication protocols and simple broker-request architectures are needed to facilitate exchange (trade) of services. Not surprisingly, the use of services, especially Web services, became a common practice. The expectations are that services will dominate software industry within the next five years. As services begin to pervade all aspects of life, the problems of service dependability, security and timeliness are becoming critical and appropriate solutions need to be found. Several fault tolerance approaches have been proposed for Web services in the literature [1, 2, 3, 4], but the field still requires appropriate solid theory, appropriate models, effective design paradigms, a practical implementation, and an in-depth experimentation for building highly-dependable Web services [5, 6]. In this paper, we propose such experimental settings and offer a roadmap to dependable Web services. We compose a list of parameters that are closely related to evaluating quality of service from the dependability perspective. We focus on service availability and timeliness and consider them a cornerstone of our approach. Security, which can also be viewed as a part of dependability, is beyond the scope of this paper.

Problem Statement Fault-tolerant techniques
Replication Diversity Replication is one of the efficient ways for providing reliable systems by time or space redundancy. Increasing the availability of distributed systems Key components are re-executed or replicated Protect against hardware malfunctions or transient system faults Another efficient technique is design diversity Employ independently designed software systems or services with different programming teams, Defend against permanent software design faults. We focus on the analysis of the replication techniques when applied to Web services. A generic Web service system with spatial as well as temporal replication is proposed and investigated. There are many fault-tolerant techniques that can be applied to the Web services including replication and diversity. Replication is one of the efficient ways for providing reliable systems by time or space redundancy. Redundancy has long been used as a means of increasing the availability of distributed systems, with key components being re-executed (replication in time) or replicated (replication in space) to protect against hardware malfunctions or transient system faults. Another efficient technique is design diversity. By independently designing software systems or services with different programming teams, diversity provides an ultimate resort in defending against permanent software design faults. In this paper, we focus on the analysis of the replication techniques when applied to Web services. We will analyze the performance and the availability of the Web services with using spatial and temporal redundancy and study the tradeoffs between them. A generic Web service system with spatial as well as temporal replication is proposed and investigated.

Proposed Paradigm Web Service Client UDDI IIS Replication Manager
RR Algorithm / Voting WatchDog UDDI Registry WSDL Web Service IIS Application Database Client Port Register Look up Get WSDL Invoke web service Keep check the availability of all the web. If Web service failed, update the list of availability of Web services Update the WSDL

Proposed Paradigm Round Robin Parallel N-Version Majority result
Web Service IIS Application Database Web Service IIS Application Database Web Service IIS Application Database Web Service IIS Application Database Web Service IIS Application Database Adv: Fully use the resource Majority result Voting Client

Experiments A series of experiments are designed and performed for evaluating the reliability of the Web service, Spatial replication 1 Reboot Retry

Testing system Best Route Finding.
Provide traveling suggestions for users. Starting point and destination. The system needs to provide the best route and the price for the users.

System Architecture

Experimental Setup Examine the computation to communication ratio
Examine the request frequency to limit the load of the server to 75% Fix the following parameters Computation to communication ratio (e.g 10:1) Request frequency

Experimental Setup Communication time: Computation time 143:14 (10:1)
Request frequency 1 request per min Load 78.5% Timeout period of retry 1 min Timeout for Web service in RM 1s (web service specific) Polling frequency 10 requests per min Number of replicas 5 Max number of retries Round-robin rate 1 s

Experiment Parameters
Fault mode Temporary (fault probability: 0.01) Permanent (fault probability: 0.001) Experiment time 5 days (7200 requests) Measure: Number of failures Average response time (ms) Failure definition: 5 retries are allowed. If there is still no correct result from the Web service after 5 retries, it is considered as a failure.

Experimental Result with Round-robin (failures / response time in ms)
Experiments Single server Single server with retry reboot (continues no response for 3 requests) server with retry and Spatial Replication RR Hybrid approach RR+Retry RR+ Reboot All round spatial + Retry (5 times) + reboots Normal case 0 / 183 0 / 193 0 / 190 0 / 187 0 / 188 0 / 195 Temp 705 / 190 0 / 223 723 / 231 0 / 238 711 / 187 0 / 233 726 / 188 0 / 231 Perm 6144 / -- 6337 / -- 1064 / -- 5 / 2578 5637 / -- 5532 / -- 152 / 187 0 / 191

Experimental Result with N-Version (failures / response time in ms)
Experiments Single server Single server with retry reboot (continues no response for 3 requests) server with retry and Spatial Replication Voting Hybrid approach Voting+ Retry Voting + Reboot All round spatial + (5 times) + reboots Normal case 0 / 318 0 / 320 0 / 315 0 / 319 0 / 322 0 / 321 Temp 429 / 321 0 / 356 423 / 364 0 / 325 0 / 324 Perm 3861 / -- 3864 / -- 614 / -- 3 / 4027 1544 / 323 1546 / 324 63 / 324 0 / 323

Varying the parameters
Number of tries Timeout period for retry in single server Timeout period for retry in our paradigm Polling frequency Number of replicas Load of server

Number of tries Number of tries Number of failures in Temp
failures in Perm 95 76 1 2 3 4 5

Timeout period for retry in single server
Number of failures in Temp Perm 95 7265 2 7156 5 7314 6 6890 7 189 8 82 9 11 10 12 14 16 18

Timeout period for retry in single server
# of failure If the timeout period for retry is too short, the server does not reboot on time to provide the service again. Timeout period

Timeout period for retry in our paradigm
(s) Number of failures in Temp Perm 2 81 5 10 20 As the timeout period for retry is too short, the replica cannot reboot on time and cause the number of failures increases. Also, another cause of the failure is that the replication manager cannot response on time to switch the primary web service. There are 5 tries, and the reboot time for a server is around 50 second,

Polling frequency Polling frequency (number of requests per min)
Number of failures in Temp in Perm 7124 1 811 2 30 5 12 10 15 213 254 20 1124 1023 If the polling frequency is low, the replication manager cannot response on time and the request will still send to the failed server and cause the failures in the system. When the polling frequency increases, the situation improves. The replication manager can response on time and reduce the number of failures

Polling frequency # of failure Polling frequency

Number of Replicas Number of replicas Number of failures in Temp
Number of failures in Perm No replica 91 8152 2 356 3 4

Load of Web Server Load of the web server (%) Number of failures in
Temp Perm 70 75 80 2 3 85 10 14 90 512 528 95 3214 3125 100 8792 8845 110 8997 8994

Summary of Parameters Number of tries = 2
Timeout period for retry in single server = 10s Timeout period for retry in our paradigm = 5s Polling frequency = 10 request per min Number of replicas = 3 Load of server = < 75%

Petri-Net (Four identical replicas)

Petri-Net (N-version Web service with voting)

Reliability Model (a) (b) P1 λ1 μ1c2 S-j P2 μ2c2 λ2 S S-n F λN μ*c2
λ* S-1 S-2 (1-c1)μ1 (1-c1)μ2 (1-c2)μ1 (1-c2)μ2 C1 = P( RM response on time ) C2 = P( reboot success ) We develop the reliability model of the proposed Web service paradigm with Markov chain model. The model is shown in Figure 8. In Figure 8(a), the state s represents the normal execution state of the system with n Web service replicas. In the event of an error, the primary Web service failed, the system will either go into the other states, i.e s-j which represents the system with n-j working replicas remaining, the replication manager response on time, or it will go to the failure state F with conditional probability c1. λ* denotes this error rate at which recovery cannot complete in this state and c1 represents the probability that the replication manager response on time to switch to another Web service. When the failed Web service is repaired, the system will back to the previous state, s-j+1. μ* denotes the rate at which successful recovery is performed in this state, c2 represents the probability that the failed Web service server reboot successfully. If the Web service is failed, it will go to another Web service. When all Web services are failed, the system enters the F state. λn is the network failure rate. s-1 to s-n represents the working state of the n Web service replicas and the reliability model of each Web service is shown in Figure 8(b). There are two types of failures are simulated in our experiments, they are P1 recourses problem (server busy) and P2 entry point failure (server reboot). If failure occurred in the Web service, either the Web service can be repaired with μ1 or μ2 repair rate with conditional probability c1 or the error cannot be recovered, it will go to the next state s-j-1 with one less Web service replica available. If the replication manager cannot response on time, it will go to the failure state. From the graph, two formulas can be obtained:

Reliability Model ID Description Value λn Network failure rate 0.02 λ*
Web service failure rate 0.025 λ1 Resource problem rate 0.142 λ2 Entry point failure rate 0.150 μ* Web service repair rate 0.286 μ1 Resource problem repair rate 0.979 μ2 Entry point failure repair rate C1 Probability that the RM response on time 0.9 C2 Probability that the server reboot successfully

Outcome (SHARPE)

Conclusion Surveyed replication and design diversity techniques for reliable services. Proposed a hybrid approach to improving the availability of Web services. Carried out a series of experiments to evaluate the availability and reliability of the proposed Web service system. Optimal parameters are obtained. In the paper, we surveyed replication and design diversity techniques for reliable services and proposed a hybrid approach to improving the availability of Web services. Furthermore, we carried out a series of experiments to evaluate the availability and reliability of the proposed Web service system. From the experiments, we conclude that both temporal and spatial redundancy is important to the availability improvement of the Web service. In the future, we plan to test the proposed scheme with wider variety of fault injection scenarios. Moreover, we will also evaluate the schemes with design diversity techniques.

Reliable Web Services: Methodology, Experiment and Modeling International Conference on Web Services (ICWS 2007) Pat. P. W. Chan, Michael R. Lyu Department.

Similar presentations

Presentation on theme: "Reliable Web Services: Methodology, Experiment and Modeling International Conference on Web Services (ICWS 2007) Pat. P. W. Chan, Michael R. Lyu Department."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reliable Web Services: Methodology, Experiment and Modeling International Conference on Web Services (ICWS 2007) Pat. P. W. Chan, Michael R. Lyu Department.

Similar presentations

Presentation on theme: "Reliable Web Services: Methodology, Experiment and Modeling International Conference on Web Services (ICWS 2007) Pat. P. W. Chan, Michael R. Lyu Department."— Presentation transcript:

Similar presentations

About project

Feedback