ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability

Dependability In Systems Engineering, dependability is a measure of system’s availability, reliability and maintainability It is ability of system to deliver services that can be justifiably trusted Often considered as third axis of system quality

Dependability ontology

Dependability challenges in cloud computing Lack of trust in shared virtualized infrastructures Management of cloud computing service by a single provider or vendor is in fact a single point of failure APIs are proprietary Virtualization increases complexity Higher resource utilization Common mode outages Multiple administrative domains Legal and privacy implications

Threats to dependability Faults, Errors and Failures A fault in a system is a deviation from its expected behavior Faults may arise due to hardware failure, software bugs, user error and network problems

Fault Tolerance Ability of a system to continue providing services to its user in case of failure of some of its components Faults can be introduced at:  Application level  Virtual machine level  Physical resource level

Fault Tolerance Application Fault Tolerance:  Application health is continuously monitored by special software components called sensors  Sensor may trigger specific procedures to start repairing process of an application that is malfunctioning  Example : Vmware App HA

Fault Tolerance Virtual Machine Fault Tolerance:  Can be detected by both customer and service provider  Customers can detect virtual machine failure by monitoring its state with the help of sensors deployed in the cloud  Cloud service provider can provide VM fault tolerance by installing a single sensor per physical server that monitors all virtual machines hosted on that server

Fault Tolerance Physical Machine Fault Tolerance:  Can be implemented by cloud service provider by monitoring state of physical server machines and in case of hardware failure, resume all virtual machines on new server

Fault Tolerance Techniques Reactive Fault Tolerance  In case of failure, these techniques reduce the effect of failure on application execution Proactive Fault Tolerance  These techniques work by predicting faults and proactively replacing the suspected components with working ones

Reactive Fault Tolerance Check pointing Replication Job migration SGuard Retry Task resubmission User defined exception handling Rescue workflow

Proactive Fault Tolerance Software Rejuvenation Self-Healing Pre-emptive migration

Tools for implementing fault tolerance HA proxy:  Open source high availability and load balancing solution for TCP and HTTP based applications  De facto standard open source load balancer ASSUE  Automatic Software Self-healing Using REscue points  Uses rescue points to detect, tolerate and recover from software faults

Tools for implementing fault tolerance SHelp:  Upgraded version of ASSURE  Uses weighted values to rescue points and error virtualization techniques so that applications bypass the faulty path

Tools for implementing fault tolerance

High Availability Can be achieved by having redundant failover servers Can be achieved at application level, infrastructure level, data center level

Types of Virtual Machines High Availability Load sharing  Both replicas are active  Service requests are equally distributed between both of them Updated dedicated hot standby  Two identical virtual machines execute on two different physical servers  Both virtual machines are fully synchronized with state information  VMware Fault Tolerance is an example

Types of Virtual Machines High Availability Not dedicated hot standby  Standby VM running in parallel with active VM  Standby is not fully synchronized  VMware HA and Symantec’s Veritas Cluster Server are examples

Types of Virtual Machines High Availability Shared hot standby  Uses check pointing mechanism to update the standby replica  Requires fewer resources for standby replica Cold standby  Standby replica is powered off and lies on storage media  Brought to service when active VM fails  Useful for situations where availability requirements are low

Conclusion Dependability is one of the major challenges in cloud computing Adoption of cloud computing can be increased by addressing the dependability challenges

ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Similar presentations

Presentation on theme: "ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Similar presentations

Presentation on theme: "ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability."— Presentation transcript:

Similar presentations

About project

Feedback