Reliable Distributed Systems Membership 11/29/2018
http/https 1.1 WS-Addressing SOAP1.2, MTOM XML1.1: XMLScheme, XPath, XSL WS-Policy, WS-MetaDataExchange WSDL1.2, UDDI3.0 WS-Reliability, WS-Membership WS-Security, WS-Trust, WS-SecureConversation WS-Transaction, WS-Coordination WS-BPEL, WS-Choreography WS-Resource, WS-ResourceProperties WS-Notification, WS-Eventing 11/29/2018
Group Membership Foundational concept for high speed data replication protocols. Essential for large scale grid-based virtual organizations and resource discovery and scheduling Solution: Group membership service (GMS) Manage GMS services’ membership and then manage other services’ general membership: 2-tier architecture GMP Group Membership Protocol is used among GMS to manage membership GMS then woks on its group. Another problem is static vs dynamic membership 11/29/2018
Agreement on Membership Detecting failure is a lost cause. Too many things can mimic failure To be accurate would end up waiting for a process to recover Substitute agreement on membership Now we can drop a process because it isn’t fast enough This can seem “arbitrary”, e.g. A kills B… GMS implements this service for everyone else 11/29/2018
Architecture Applications use replicated data for high availability 2PC-like protocols use membership changes instead of failure notification Membership Agreement, “join/leave” and “P seems to be unresponsive” 11/29/2018
Architecture Application processes membership views A {A} {A,B,D} {A,D} {A,D,C} {D,C} GMS processes join B leave GMS join C X Y Z D A seems to have failed 11/29/2018
GMS API Guess? 11/29/2018
GMS API Three operations: Join(process-id, callback) Leave(process-id) Monitor(process-id,callback) GMS needs to be highly available; 11/29/2018
Example Distributed system using the GMS is a air-traffic control system it would require itself to be reconfigured with existing processes after failure of a process. In some cases such as in grid VO it may be fact of life; membership may be changing dynamically. 11/29/2018
WS-Membership: Failure Management in webservices world WS-Membership .. by W.Vogels and C.Re 11/29/2018