Presented by Rukmini and Diksha Chauhan Virginia Tech 2 nd May, 2007 Movement-Based Checkpointing and Logging for Recovery in Mobile Computing Systems.

Slides:

Advertisements

Similar presentations

MSS S i Half Reply Other MSSs ACCEPT OR REJECT MH H i NEWTKT REQUESTACK ACKs ART Start ART End AWT Start AWT End.

Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.

Smart Routers for Cross-Layer Integrated Mobility and Service Management in Mobile IPv6 Systems Authors: Ding-Chau Wang. Weiping He. Ing-Ray Chen Presented.

Uncoordinated Checkpointing The Global State Recording Algorithm.

Silberschatz and Galvin  Operating System Concepts Module 16: Distributed-System Structures Network-Operating Systems Distributed-Operating.

1 Performance Char’ of Region- Based Group Key Management --- in Mobile Ad Hoc Networks --- by Ing-Ray Chen, Jin-Hee Cho and Ding-Chau Wang Presented by.

Efficient Constraint Monitoring Using Adaptive Thresholds Srinivas Kashyap, IBM T. J. Watson Research Center Jeyashankar Ramamirtham, Netcore Solutions.

Agent Caching in APHIDS CPSC 527 Computer Communication Protocols Project Presentation Presented By: Jake Wires and Abhishek Gupta.

Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.

CS 582 / CMPE 481 Distributed Systems Fault Tolerance.

Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.

© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.

Queueing Analysis for Access Points with Failures and Handoffs of Mobile Stations in Wireless Networks Chen Xinyu and Michael R. Lyu The Chinese Univ.

CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.

Dept. of Computer Science & Engineering, CUHK Performance and Effectiveness Analysis of Checkpointing in Mobile Environments Chen Xinyu

Chapter 8 : Transaction Management. u Function and importance of transactions. u Properties of transactions. u Concurrency Control – Meaning of serializability.

Dept. of Computer Science & Engineering, CUHK Fault Tolerance and Performance Analysis in Wireless CORBA Chen Xinyu Supervisor: Markers: Prof.

16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.

SRDS’03 Performance and Effectiveness Analysis of Checkpointing in Mobile Environments Xinyu Chen and Michael R. Lyu The Chinese Univ. of Hong Kong Hong.

A Survey of Rollback-Recovery Protocols in Message-Passing Systems M. Elnozahy, L. Alvisi, Y. Wang, D. Johnson Carnegie Mellon University Presented by:

1 Rollback-Recovery Protocols II Mahmoud ElGammal.

1 On Failure Recoverability of Client-Server Applications in Mobile Wireless Environments Ing-Ray Chen, Baoshan Gu, Sapna E. George and Sheng- Tzong Cheng.

Authors: Ing-Ray Chen Weiping He Baoshan Gu Presenters: Yao Zheng.

Checkpointing and Recovery. Purpose Consider a long running application –Regularly checkpoint the application Expensive task –In case of failure, restore.

High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.

Hierarchical agent-based secure and reliable multicast in wireless mesh networks Yinan LI, Ing-Ray Chen Robert Weikel, Virginia Sistrunk, Hung-Yuan Chung.

70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.

A Survey of Rollback-Recovery Protocols in Message-Passing Systems.

 Protocols used by network systems are not effective to distributed system  Special requirements are needed here.  They are in cases of: Transparency.

PRESENTED BY A. B. C. 1 User Oriented Regional Registration- Based Mobile Multicast Service Management in Mobile IP Networks Ing-Ray Chen and Ding-Chau.

EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Fault Tolerant Systems

ENERGY-EFFICIENT FORWARDING STRATEGIES FOR GEOGRAPHIC ROUTING in LOSSY WIRELESS SENSOR NETWORKS Presented by Prasad D. Karnik.

Locating Mobile Agents in Distributed Computing Environment.

Advanced Computer Networks Topic 2: Characterization of Distributed Systems.

12. Recovery Study Meeting M1 Yuuki Horita 2004/5/14.

Checkpointing and Recovery. Purpose Consider a long running application –Regularly checkpoint the application Expensive task –In case of failure, restore.

ISADS'03 Message Logging and Recovery in Wireless CORBA Using Access Bridge Michael R. Lyu The Chinese Univ. of Hong Kong

04/06/2016Applied Algorithmics - week101 Dynamic vs. Static Networks  Ideally, we would like distributed algorithms to be: dynamic, i.e., be able to.

Serverless Network File Systems Overview by Joseph Thompson.

1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.

PRoPHET+: An Adaptive PRoPHET- Based Routing Protocol for Opportunistic Network Ting-Kai Huang, Chia-Keng Lee and Ling-Jyh Chen.

Fault Tolerance in CORBA and Wireless CORBA Chen Xinyu 18/9/2002.

1 Recovery in the Mobile Wireless Environment Using Mobile Agents S. Gadiraju, V. Kumar Presented by Yamin Yu.

Dual-Region Location Management for Mobile Ad Hoc Networks Yinan Li, Ing-ray Chen, Ding-chau Wang Presented by Youyou Cao.

Sapna E. George, Ing-Ray Chen Presented By Yinan Li, Shuo Miao April 14, 2009.

Ding-Chau Wang, Weiping He, Ing-Ray Chen Virginia Tech Presented by Weisheng Zhong and Xuchao Zhang CS 5214 (Fall 2015)

Design and Analysis of Optimal Multi-Level Hierarchical Mobile IPv6 Networks Amrinder Singh Dept. of Computer Science Virginia Tech.

EEC 688/788 Secure and Dependable Computing Lecture 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.

Middleware for Fault Tolerant Applications Lihua Xu and Sheng Liu Jun, 05, 2003.

Rollback-Recovery Protocols I Message Passing Systems Nabil S. Al Ramli.

Authors: Ing-Ray Chen and Ding-Chau Wang Presented by Chaitanya,Geetanjali and Bavani Modeling and Analysis of Regional Registration Based Mobile Service.

DMAP: integrated mobility and service management in mobile IPv6 systems Authors: Ing-Ray Chen Weiping He Baoshan Gu Presenters: Chia-Shen Lee Xiaochen.

Movement-Based Check-pointing and Logging for Recovery in Mobile Computing Systems Sapna E. George, Ing-Ray Chen, Ying Jin Dept. of Computer Science Virginia.

Peter Pham and Sylvie Perreau, IEEE 2002 Mobile and Wireless Communications Network Multi-Path Routing Protocol with Load Balancing Policy in Mobile Ad.

Achieving All the Time, Everywhere Access in Next- Generation Mobile Networks by Marcello Cinque, Domenico Cotroneo and Stefano Russo Presented by Ashok.

A proxy-based integrated cache consistency and mobility management scheme for client-server applications in Mobile IP systems - Weiping He, Ing-Ray Chen.

Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.

Operating Systems Distributed-System Structures. Topics –Network-Operating Systems –Distributed-Operating Systems –Remote Services –Robustness –Design.

1 Fault Tolerance and Recovery Mostly taken from

Authors: Jiang Xie, Ian F. Akyildiz

Prepared by Ertuğrul Kuzan

ECE 544 Protocol Design Project 2016

EECS 498 Introduction to Distributed Systems Fall 2017

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Presentation transcript:

Presented by Rukmini and Diksha Chauhan Virginia Tech 2 nd May, 2007 Movement-Based Checkpointing and Logging for Recovery in Mobile Computing Systems Sapna E.George, Ing-Ray Chen & Ying Jin

Agenda Related work Mobile Computing System Proposed Movement-based Checkpointing and Logging Recovery Schemes Performance Analysis Conclusion

Properties of Mobile Computing Inherent properties Host Mobility Disconnections Wireless bandwidth Limitation Battery Life Storage Hardware failure Software Failure Motivation Propose an efficient failure recovery scheme

Distributed Systems Fault-tolerance schemes Logging Checkpointing Rollback Recovery Definition Domino Effect inter-process dependencies - cascading rollbacks Asynchronous Recovery

Related Work Acharya et al. in [1] describes a distributed uncoordinated checkpointing scheme, where multiple MHs can arrive at a global consistent checkpoint without coordination messages. The paper does not describe how failure recovery is achieved nor does it address the issue of recovery information management in the face of MH movement.

Underlying Model

Basic Definitions Mobile Mobile Host(MH) Mobile support Systems(MSS) Infrastructue machines High speed Static wired n/w

Basic Definitions Cell Local MSS Communication Between MH and MSS-Constraints Process of Communication between MH’s Two one-hop wireless transmissions Arbitrary hops

Basic Definitions Handoff Instantaneously Process MH crosses a cell boundary MH disconnect(MSS1) voluntarily from network to conserve power and reconnect(MSS2) at a later time. MH sends the ID of MSS1 to the new MSS2-initiates handoff procedures.

Processes and States Three States Normal Execution Application-related Computation Sending or receiving messages Logging Save Recovery Write Event Message received from other MH or server User Input or Local Computation

Movement-Based Checkpointing & Message Logging Checkpoints after a certain number of host migrations across cells rather than periodically. Recovery Scheme Combines independent checkpointing and optimistic message logging enabling asynchronous recovery of a MH upon failure. Application recovery mechanisms - optimize recovery cost (failure-free operational cost), recovery time Storage requirements for recovery related information

Movement-Based Checkpointing & Message Logging Scheme uses distance or number of handoffs Parameter to trigger information consolidation MH crosses a distance threshold from the location of the latest checkpoint, the recovery information is collected and transferred to the MH’s local MSS. Recovery protocol – proactively controls no. of checkpoints and logs by movement-based checkpointing strategy additional overhead of unnecessary checkpoints and log consolidation during failure-free operation is avoided.

Checkpointing & Message Logging m is a f (user’s mobility rate, the failure rate and log arrival rate ) – Adaptation to user and Application behaviour

Movement-based Checkpointing and Logging MH –Stored variables cp_seq -stores the sequence number of the latest checkpoint and cp_loc -stores the ID of the MSS that has recorded the latest checkpoint. MSScp- Latest MSS Handoff_counter to 0 MSSlogs (log_set) - IDs of MSSs Activity Checkpoints Logging

Independent Recovery Independent – without Coordination with other hosts. Recovery process MH sends MSS cp_seq, cp_loc and log_set MSS initiates (requests) data collection MSS compiles Logs into list ordered by time Checkpoints Once recovery is completed successfully, a checkpoint of the current state is taken and sent to the MSS and the local variables are reset.

Storage Management at MSS MH’s Disk Unstable Limited MSS’s Disk Stable storage Considerably large storage at MSSs –depleted 1. Temporarily halt –Perform Garbage Collection 2. Alternative Storage 3. Deleting outdated recovery Information

SPN Model Parameters

SPN Model

Transition firing rate Checkpoint Rate of MH During checkpointing: (a)MH takes a snapshot of its current state (b)MH sends the checkpoint to the current MSS through the wireless channel. (c)The MSS stores it in its stable storage. where is the time required to transmit a checkpoint through wireless link.

Transition Firing Rate Recovery Rate of MH i.e, inverse of Recovery Time Recovery Time includes : (a)time to send recovery information requests to the MSSs storing the latest checkpoint and all logs since the latest checkpoint (b) time to transmit the latest checkpoint from the MSS where it is stored (MSScp) to the MSS in which the MH has recovered (MSSrec) through the wired network and through the wireless channel to the MH

Transition Firing Rate c) time to transmit all the logs from the respective MSSs where they are located (MSSlogs) to the MSSrec through the wired network and through the wireless channel to the MH and (d) time to rollback to the last checkpoint and apply all the logs at the MH.

Variables & Represents the number of MSSs storing logs.At most its value is the number of handoffs before failure, i.e. i Represents average hop count between MSScp and MSSrec.

Recovery Time Time Spent on Recovery Requests: Time spent on transmitting the latest checkpoint to MH:

Recovery Time contd.. Time spent to transmit the logs to MH: where n is the number of log entries since the last checkpoint Time spent to rollback to the last checkpoint and apply the logs: Total Recovery time after i movements:

Recovery Cost per failure The SPN model’s underlying Markov model has 2M+1 states. The average recovery time per failure is given by: The total failure-free operations cost(or time spent on checkpointing and logging before failure) is given by: where denotes the number of checkpoints before failure and denotes the number of log entries before failure.

Recovery Cost per failure contd… Total Cost of Recovery per failure is the weighted sum of the average recovery time and the total time spent on the checkpointing and logging per failure and is given by: where w1 and w2 are the weights associated with recovery time and failure-free operation cost. This paper uses w1 = w2 = 0.5 to account for the situation where is equally proportional to and

Recovery Probability The recovery probability is defined as the probability that recovery time is less than or equal to T

Results and Analysis The SPN model was implemented and analyzed using the SPNP s/w The following parameter values were kept constant in all the runs. size of a log entry is 50B, size of a checkpoint is 2000B, bandwidth of the wired network is 2Mbps, ratio of bandwidth of wireless to wired network (r) is 0.1, Telog is s. Tlog_w is 0.002s and Tckp_w is 0.08s. Model parameters such as mobility rate, log arrival rate, failure rate, and movement threshold were varied across runs

Results and Analysis contd… Recovery Probability vs. Recovery Time.

Results and Analysis contd… Recovery Probability vs. Log Arrival Rate.

Results and Analysis contd… Recovery Probability vs. Failure Rate.

Results and Analysis contd… Recovery Probability vs. Movement Threshold.

Results and Analysis contd… Recovery Time vs. Movement Threshold.

Results and Analysis contd… Determining Optimal Movement Threshold that minimizes Recovery Cost Per Failure.

Applicability Results can be applied in the following manner: Build a Table at static time covering possible parameter values of the mobility rate and failure rate of the MH and log arrival rate of the mobile applications List the optimal M value to minimize the recovery cost per failure for each parameter set. Select optimal M dynamically at runtime based on the measured rates to minimize the recovery cost per failure.

Summary Implemented movement-based checkpointing and logging scheme which checkpoints after M movements (mobility handoffs) as compared to current approaches where checkpoints are taken periodically. A performance model developed based on stochastic Petri nets to identify the optimal M, given the failure, mobility and log arrival rates, to minimize the cost of recovery per failure. The results of performance analysis and the sensitivity of recoverability to the various parameters were shown

Future Work To analyze and compare the proposed approach to existing approaches, in terms of the gain achieved over the use of constant periodic checkpointing. To extend the proposed work to MIPv6 environments.

QUESTIONS ??