Replication & Fault Tolerance CONARD JAMES B. FARAON

Slides:



Advertisements
Similar presentations
Reliable Communication in the Presence of Failures Kenneth Birman, Thomas Joseph Cornell University, 1987 Julia Campbell 19 November 2003.
Advertisements

Reliability on Web Services Presented by Pat Chan 17/10/2005.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Computer Science Lecture 18, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman.
Distributed components
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Virtual Synchrony Ki Suh Lee Some slides are borrowed from Ken, Jared (cs ) and Justin (cs )
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CS 582 / CMPE 481 Distributed Systems
CS 582 / CMPE 481 Distributed Systems Communications (cont.)
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
Distributed Systems CS Fault Tolerance- Part III Lecture 15, Oct 26, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
Reliable Distributed Systems Virtual Synchrony. A powerful programming model! Called virtual synchrony It offers Process groups with state transfer, automated.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Ken Birman Cornell University. CS5410 Fall
Lab 1 Bulletin Board System Farnaz Moradi Based on slides by Andreas Larsson 2012.
SPREAD TOOLKIT High performance messaging middleware Presented by Sayantam Dey Vipin Mehta.
Peer-to-Peer Distributed Shared Memory? Gabriel Antoniu, Luc Bougé, Mathieu Jan IRISA / INRIA & ENS Cachan/Bretagne France Dagstuhl seminar, October 2003.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
7/26/ Design and Implementation of a Simple Totally-Ordered Reliable Multicast Protocol in Java.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Group Communication Group oriented activities are steadily increasing. There are many types of groups:  Open and Closed groups  Peer-to-peer and hierarchical.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Toward Fault-tolerant P2P Systems: Constructing a Stable Virtual Peer from Multiple Unstable Peers Kota Abe, Tatsuya Ueda (Presenter), Masanori Shikano,
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Hwajung Lee. A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types of groups:
November NC state university Group Communication Specifications Gregory V Chockler, Idit Keidar, Roman Vitenberg Presented by – Jyothish S Varma.
Fault Tolerant Services
Fault Tolerance. Basic Concepts Availability The system is ready to work immediately Reliability The system can run continuously Safety When the system.
Reliable Communication Smita Hiremath CSC Reliable Client-Server Communication Point-to-Point communication Established by TCP Masks omission failure,
Replication and Group Communication. Management of Replicated Data FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM Instructor’s.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Middleware for Fault Tolerant Applications Lihua Xu and Sheng Liu Jun, 05, 2003.
Reliable Communication in the Presence of Failures Kenneth P. Birman and Thomas A. Joseph Presented by Gloria Chang.
Enterprise Computing with Jini Technology Mark Stang and Stephen Whinston Jan / Feb 2001, IT Pro presented by Alex Kotchnev.
Fault Tolerance (2). Topics r Reliable Group Communication.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Lecture 9: Multicast Sep 22, 2015 All slides © IG.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Distributed Systems Lecture 7 Multicast 1. Previous lecture Global states – Cuts – Collecting state – Algorithms 2.
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
More on Fault Tolerance
Fault Tolerance Prof. Orhan Gemikonakli
Fault Tolerance Chap 7.
CS514: Intermediate Course in Operating Systems
Reliable group communication
Replication Middleware for Cloud Based Storage Service
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)
Outline Announcements Fault Tolerance.
COT 5611 Operating Systems Design Principles Spring 2012
Distributed Systems CS
Replication Improves reliability Improves availability
Middleware for Fault Tolerant Applications
EEC 688/788 Secure and Dependable Computing
CSE 486/586 Distributed Systems Concurrency Control --- 3
Lecture 21: Replication Control
Indirect Communication Paradigms (or Messaging Methods)
Indirect Communication Paradigms (or Messaging Methods)
Distributed Systems (15-440)
Lecture 21: Replication Control
Last Class: Fault Tolerance
Presentation transcript:

Replication & Fault Tolerance CONARD JAMES B. FARAON ISIS Project & JGroups Replication & Fault Tolerance CONARD JAMES B. FARAON

OVERVIEW BACKGROUND GOALS ARCHITECTURES FEATURES DRAWBACKS SUMMARY

ISIS Project Background Kenneth Birman, Robert Cooper, Keith Marzullo. (1980’s) Cornell University. Research: scalability, security, management. Commercialized Distributed System: Stock Exchange Air Traffic Control Automation Factory Automation Toolkit (Written in C, Lisp, etc): Virtual Synchrony Model

ISIS Protocols Multicast protocols: Flush protocols FBCAST: unordered CBCAST: causally ordered ABCAST: totally ordered GBCAST: sync-ordered used for managing group membership Flush protocols Message is unstable if some receiver has it but (perhaps) others don’t. All-to-all echo of any unstable messages to processes who have not received a copy of those messages.

ISIS Sample Code

JGroups Background Java Toolkit for reliable and flexible messaging. Inspired by the ISIS research. Developed by Bela Ban. Current Stable Release: May 10, 2017. Consists of the following: Channels Building Blocks Flexible Protocol Stack

Virtual Synchrony (IPC) Ordered, and Reliable Multicast. “Send to all members or to none.” Process Groups- Programs in a network organized themselves. Fault Tolerance Replication using remote message passing.

Goal: Virtual Synchronous Toolkits for the following: Atomic Multicast Process groups & group communication. Deciding how to respond to a request. Concurrency. Synchronization. Replicated Data. Failure Tolerance (Detecting and Reacting). Dynamic configuration. Stable Storage. Recovery. Transactions Protections Consistency

Process Groups Peer group. Client/Server group. Diffusion group. Hierarchical group.

JGroups Architecture

Building Blocks Provides higher level of abstraction. Application passes through the Building Block rather than Channel. Saves development time. (Library for developers) Mostly for fault tolerance and replications. Performs Cluster-wide execution. Performs Cluster wide locking.

Flexible Protocol Stack Approximately 70 protocols available. NAKACK Ordering: FIFO SEQUENCER CAUSAL GMS FAILURE_DETECTOR STATE_TRANSFER FRAGMENTATION SECURITY FLOW CONTROL ETC.

JGroups Sample Code

Client-Server Group

Protocol Defined in XML

Replicated HashMap States are shared using a HashMap. States are shared through serialization. Create several instances of HashMap. Modifications are propagated. Listeners are notified for updates. Read-only requests are locally invoked.

Replication Process Groups- share copies of the data. State Transfer is used to initialize the joining member. Replication using remote message passing or serialization. Updates delivered as events.

JGroups Performance & Drawbacks

ISIS Performance & Drawbacks

Summary Reliability Availability ISIS uses Virtual Synchrony Model. Message Ordering Timeout interval Replication using RPC JGroups adds the following: Building Blocks Flexible Protocol Stack

Questions?

Appendix JGroups Channel Handle to a group. Has “view” of the channel. Unicast Multicast Anycasting (TCP) View = {A, B, C, D, E} Dest = {C,D} A member that crashed or failed are labeled “suspected”

Appendix Diffusion Group Client-Server group. Server multicast messages to the full set of servers and clients. Clients only receive messages. Example: Brokerage Trading Floor.

Appendix Hierarchical Group Used if several groups are needed. Tree-structured sets of groups Group redirects request to an appropriate subgroup.

Appendix Building Blocks examples: MessageDispatcher Deals with sending message requests. Correlating message responses. RpcDispatcher Implements its own Serialization. Method Lookup May exist in multiple channels. And many more…