Download presentation
Presentation is loading. Please wait.
Published byShanna Barrett Modified over 9 years ago
1
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability
2
Dependability In Systems Engineering, dependability is a measure of system’s availability, reliability and maintainability It is ability of system to deliver services that can be justifiably trusted Often considered as third axis of system quality
3
Dependability ontology
4
Dependability challenges in cloud computing Lack of trust in shared virtualized infrastructures Management of cloud computing service by a single provider or vendor is in fact a single point of failure APIs are proprietary Virtualization increases complexity Higher resource utilization Common mode outages Multiple administrative domains Legal and privacy implications
5
Threats to dependability Faults, Errors and Failures A fault in a system is a deviation from its expected behavior Faults may arise due to hardware failure, software bugs, user error and network problems
6
Fault Tolerance Ability of a system to continue providing services to its user in case of failure of some of its components Faults can be introduced at: Application level Virtual machine level Physical resource level
7
Fault Tolerance Application Fault Tolerance: Application health is continuously monitored by special software components called sensors Sensor may trigger specific procedures to start repairing process of an application that is malfunctioning Example : Vmware App HA
8
Fault Tolerance Virtual Machine Fault Tolerance: Can be detected by both customer and service provider Customers can detect virtual machine failure by monitoring its state with the help of sensors deployed in the cloud Cloud service provider can provide VM fault tolerance by installing a single sensor per physical server that monitors all virtual machines hosted on that server
9
Fault Tolerance Physical Machine Fault Tolerance: Can be implemented by cloud service provider by monitoring state of physical server machines and in case of hardware failure, resume all virtual machines on new server
10
Fault Tolerance Techniques Reactive Fault Tolerance In case of failure, these techniques reduce the effect of failure on application execution Proactive Fault Tolerance These techniques work by predicting faults and proactively replacing the suspected components with working ones
11
Reactive Fault Tolerance Check pointing Replication Job migration SGuard Retry Task resubmission User defined exception handling Rescue workflow
12
Proactive Fault Tolerance Software Rejuvenation Self-Healing Pre-emptive migration
13
Tools for implementing fault tolerance HA proxy: Open source high availability and load balancing solution for TCP and HTTP based applications De facto standard open source load balancer ASSUE Automatic Software Self-healing Using REscue points Uses rescue points to detect, tolerate and recover from software faults
14
Tools for implementing fault tolerance SHelp: Upgraded version of ASSURE Uses weighted values to rescue points and error virtualization techniques so that applications bypass the faulty path
15
Tools for implementing fault tolerance
16
High Availability Can be achieved by having redundant failover servers Can be achieved at application level, infrastructure level, data center level
17
Types of Virtual Machines High Availability Load sharing Both replicas are active Service requests are equally distributed between both of them Updated dedicated hot standby Two identical virtual machines execute on two different physical servers Both virtual machines are fully synchronized with state information VMware Fault Tolerance is an example
18
Types of Virtual Machines High Availability Not dedicated hot standby Standby VM running in parallel with active VM Standby is not fully synchronized VMware HA and Symantec’s Veritas Cluster Server are examples
19
Types of Virtual Machines High Availability Shared hot standby Uses check pointing mechanism to update the standby replica Requires fewer resources for standby replica Cold standby Standby replica is powered off and lies on storage media Brought to service when active VM fails Useful for situations where availability requirements are low
20
Conclusion Dependability is one of the major challenges in cloud computing Adoption of cloud computing can be increased by addressing the dependability challenges
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.