Presentation is loading. Please wait.

Presentation is loading. Please wait.

FLARe: a Fault-tolerant Lightweight Adaptive Real-time Middleware for Distributed Real-time and Embedded Systems Dr. Aniruddha S. Gokhale

Similar presentations


Presentation on theme: "FLARe: a Fault-tolerant Lightweight Adaptive Real-time Middleware for Distributed Real-time and Embedded Systems Dr. Aniruddha S. Gokhale"— Presentation transcript:

1 FLARe: a Fault-tolerant Lightweight Adaptive Real-time Middleware for Distributed Real-time and Embedded Systems Dr. Aniruddha S. Gokhale gokhale@dre.vanderbilt.edu (Co-Advisor) Dr. Douglas C. Schmidt schmidt@dre.vanderbilt.edu (Advisor) Jaiganesh Balasubramanian jai@dre.vanderbilt.edu http://www.dre.vanderbilt.edu/~jai Department of Electrical Engineering and Computer Science Vanderbilt University, Nashville, TN, USA Middleware 2007 Doctoral Symposium (MDS 2007) Newport Beach, CA, USA

2 2 Stringent simultaneous QoS demands, e.g., “never die,” soft real-time, etc. predominantly stateless, tolerates weaker consistency if stateful Distributed Object Computing middleware used to design and develop DRE systems support for highly available systems (e.g., FT-CORBA) end-to-end predictable behavior for requests (e.g., RT-CORBA) Focus: Distributed Real-time Embedded (DRE) Systems Goal is to provide real-time fault tolerance to DRE systems FT uses redundancy; RT assured by resource management

3 3 Active replication client requests multicast and executed at all the replicas strong state consistency deterministic behavior of replicas very fast recovery resource-expensive Passive replication low resource/execution overhead better suited for weaker consistency no restrictions on deterministic behavior enables making tradeoffs between FT and resource consumption applies to a class of soft real-time DRE systems Determining the Replication Scheme for DRE Systems Passive replication better suited for our purpose Goal is to provide RT+FT for DRE systems using passive replication

4 4 Challenges: Using Passive Replication for DRE Systems Challenge 1: Maintain real-time performance of applications at all times Focus: Real-time performance after failover Decision-making algorithms used for electing a new primary Client response times depend on the loads of the processor hosting the failover target Task deadlines are met if the CPU utilization is under a threshold Failure could affect multiple clients – failover to multiple processors

5 5 Challenges: Using Passive Replication for DRE Systems Challenge 2: Fast failover on client side Focus: Faster and predictable failover Client-side middleware could maintain static list of references Round-robin approach of trying out different references Faster failover – but not appropriate failover No RT guarantee after failover Client-side middleware need to be updated with references based on dynamic operating conditions

6 6 Challenges: Using Passive Replication for DRE Systems Challenge 3: FT+RT in spite of resource overloads Focus: Dynamic reconfigurations and overload management Long running systems – continued operation through many failures Periodic loss of resources – simultaneous failures Graceful degradation of applications Operate higher priority applications at all times Overload management – predictable and fast Alternate degraded and assured functionality

7 7 Challenges: Using Passive Replication for DRE Systems Challenge 4: Resource-aware stateful replication Focus: State consistency in stateful DRE systems State transfer requires CPU and network reservations Support for resource-constrained operations Different consistency models – strong, weak, and no consistency Adapt consistency of certain tolerant applications depending on available resources Utility optimizations – better state consistency for higher priority applications

8 8 Our Approach: FLARe RT-FT Middleware FLARe = Fault-tolerant Lightweight Adaptive Real-time Middleware Transparent and Fast Failover Redirection using client-side portable interceptors catches COMM_FAILURE exceptions and transparently throws LOCATION_FORWAR D exceptions Failure detection can be improved with better protocols – e.g., SCTP

9 9 Our Approach: FLARe RT-FT Middleware Real-time performance after failover monitor CPU utilizations at hosts where backups are deployed adaptive failover target selection algorithms operated by a resource manager failover targets chosen on the least loaded host hosting the backups better chance to provide RT performance

10 10 Our Approach: FLARe RT-FT Middleware Predictable failover failover target decisions computed periodically by the resource manager conveyed to client-side middleware agents – forwarding agents agents work in tandem with portable interceptors redirect clients quickly and predictably to appropriate targets agents periodically/proactively updated when targets change

11 11 Current Progress Initial prototype of FLARe developed using The ACE ORB (TAO) Stateless FT using passive replication Implemented a resource- aware adaptive failover target selection algorithm Compared and contrasted the performance of the FT middleware when using static failover strategies versus adaptive failover strategies Significant reduction in client response times and system utilization Current Progress FLARe is open-source and available at www.dre.vanderbilt.edu

12 12 Proposed Research and Expected Milestones Overload management investigate overload management algorithms that do not degrade application QoS minimum client disturbance implemented within the resource manager extreme resource constrained operating conditions – investigate opportunities to change implementations and reduce overloads Utility optimizations – when to degrade QoS and when not to RT/FT trade-offs Deadline : March 2008

13 13 Proposed Research and Expected Milestones State Synchronization View state synchronization as an aperiodic scheduling problem more slack available – more time available to synchronize state slack devoted for higher priority applications always availability of slack – support for different consistency management schemes (e.g., weak, strong, none) application informs middleware when to synchronize state Deadline : July 2008

14 14 Proposed Research and Expected Milestones Network Reservations View real-time fault- tolerance as an end-to- end scheduling problem network reservations are required for state transfers without reservations, no predictability middleware-mediated mechanisms to use external network QoS mechanisms such as DiffServ network monitors for alternate routes in the presence of failures (leverage existing network research) Deadline : September 2008

15 15 Concluding Remarks Passive replication – a promising approach for DRE systems Resource-aware adaptive fault-tolerance – required for adapting passive replication for DRE system requirements Adaptive algorithms required for trading off RT versus FT requirements Middleware transparently supports FT for applications – works in conjunction with adaptive algorithms to take care of RT requirements as well FLARe is open-source and available at www.dre.vanderbilt.edu


Download ppt "FLARe: a Fault-tolerant Lightweight Adaptive Real-time Middleware for Distributed Real-time and Embedded Systems Dr. Aniruddha S. Gokhale"

Similar presentations


Ads by Google