FT-ERF Fault-Tolerance in an Event Rule Framework for Distributed Systems Hillary Caituiro-Monge, Graduate Student. Advisor: Javier Arroyo-Figueroa, Ph.D. Presentation 3
Presentation Objectives Understand the Architecture of the Scalable and Fault-Tolerant ERF Architecture Relate Challenges on Active Replication Analyze Core Lacks among RUBIES replicas, with the purpose of Achieve Fault-Tolerance: Lack of Timing Synchronization of Rule Evaluation Cycles (REC) Lack of Consistency of Event Sets (ES) Distributed Agreement Protocol
Presentation Objectives Introduce Research New Objective
SCALABLE AND FAULT TOLERANT ERF ARCHITECTURE DISTRIBUTION DIMENSION REPLICATION DIMENSION
REPLICATION CLASS DIAGRAM
DISTRIBUTION CLASS DIAGRAM
Challenges on Active Replication Strong replica consistency All replicas must have the same state after method invocations Duplicated invocation detection and suppression
Lack of Timing Synchronization of Rule Evaluation Cycles (REC) among RUBIES replicas It is a source of non-deterministic behavior among RUBIES replicas It is not triggered in response to direct or indirect client’s method invocation It is always running Thereby the replicas consistency is not reachable by means of interface based consistency mechanisms
Lack of Timing Synchronization of Rule Evaluation Cycles (REC) among RUBIES replicas Each replica from a group has its independent REC, where the Starting time differs Duration time differs Making a scenario where each group member or replica runs each REC including different events.
Lack of Consistency of Event Sets (ES) among RUBIES replicas It is a source of non-deterministic behavior among RUBIES replicas The ES’ content changes different for each replica The ES’ content changes for two reasons: Incoming events Died events
Lack of Consistency of Event Sets (ES) among RUBIES replicas The ES’ content changes different for each replica, it is as consequence of delivery communication delay of events to each replica.
What is the problem? Each replica, belong to same group, includes dissimilar events for each consecutive equivalent REC execution. As result each RUBIES replica posts different events in different times and with different state. Such behavior is a problem for load distribution and/or replication.
What is the issue? Strong replica consistency Synchronize rule evaluation cycles among RUBIES replicas Turn consistent event sets among RUBIES replicas
How to do it? Distributed Agreement or Consensus Protocol (Currently working in this) RUBIES replicas must start each REC after an agreement. RECs must have an unique ID RECs of same ID must run simultaneously
How to do it? Distributed Agreement or Consensus Protocol (Currently working in this) RUBIES replicas must include same events for RECs of same ID Agreement must include which events will consider Sliding window
Research New Objective The proposed research will focus on the fault-tolerance problem in ERF. The main purpose is to design and implement a strong replica consistency mechanism to achieve fault-tolerance.
Procedure Select an Active Replication Software Must be CORBA Fault-Tolerant Compatible Must be portable Must not be intrusive No commercial Make an Distributed Agreement Protocol Related Above
OGS (Object Group Service) Non-intrusive Service approach. Requiring no change to the underlying ORB Compliant with the CORBA specification Not proprietary. Designed and implemented as a set of CORBA objects. This makes it interoperable between different ORBs. Plans to extend OGS and make it compliant with FT- CORBA specification. White box.
Eternal Systems FTORB Non-intrusive Interception approach. CORBA objects above the ORB support the interfaces of the OMG Fault-Tolerant standard specifications Replication mechanisms below the ORB that provide strong replica consistency Interceptors to reach independence of the ORB and applications.
Others GMS (Group Communication Service) IRL Isis+Orbix Electra AQua
Comparison among Fault-Tolerant CORBA systems Carlo Marchetti et. al. “Architectural Issues on Fault Tolerance in CORBA”, in Proceedings of the SSGRR 2000 Computer & Business Conference, L'Aquila, Italy, 2000
Conclusion For Fault-Tolerance in ERF is necessary the design and implementation of an agreement protocol with the purpose of achieve strong replica consistency. Strong replica consistency will enable ERF for distributed scenarios, such as replication, load distribution, load balancing, and so on.
Thanks