Joe Loyall, Rick Schantz, Gary Duzan Detection and Reaction to Unplanned Operational Events in Large Scale Distributed Real-Time Embedded Systems Jianming Ye Joe Loyall, Rick Schantz, Gary Duzan BBN Technologies, Cambridge, MA {jye, jloyall, schantz, gduzan}@bbn.com
Outline Distributed Realtime Embedded (DRE) System Overview Unplanned Events in DRE Systems Multi-layer Architecture to Handle Unplanned Events QuO as Underlying Technology Conclusions
DRE System Overview Applications are distributed and network centric Operate in dynamic environments Stringent QoS requirements Resources are constrained and shared Potentially large number of unplanned events Military Systems of Systems Industrial Production Signal Analysis And Geolocation Shipboard Systems Disaster Response Systems FEMA
Unplanned Events in DRE Systems Definitions: Unplanned events are events that are not expected to happen during normal operation and thus they cannot be specified in pre-planned mission requirements. We can further divide unplanned events into two subtypes: unpredictable events and unexpected events. Unpredictable events are those that we can envision might happen, but that we cannot predict when, where, and how they will manifest. For example, hardware defects or failures, unexpected loads, or security breach. Unexpected events are those that cannot even be envisioned. We normally don’t know the nature of these events. Our discussion on unplanned events will primarily focus on unpredictable events rather than unexpected events.
Unplanned Events in DRE Systems cont. Characteristics of Unplanned Events Unplanned events have key symptoms that can be detected, e.g. network hardware defects causing decreased bandwidth capacity and prolonged delay. Each symptom can have multiple causes. For example diminished bandwidth capacity can be caused by hardware defects, unexpected loads, or security breach. Key symptoms of events can lead us to the right remedies to a large number of unplanned events. Need to react quickly to symptoms to minimize adverse effects Determine the cause can be slower and require a larger view of the system Unplanned event handling can be treated as part of overall QoS management strategy
Multi-layer Architecture for Unplanned Events Overview of Multi-layer Architecture (MLA) QoS management is divided into multiple layers Unplanned event handling is part of overall QoS management Rapid symptom handling at lower levels Diagnosis and cause analysis at higher levels The number of QoS management layers can scale up and down based on the size and characteristics of the DRE system. QoS Manager layer i+2 Feedback/Control QoS Manager QoS Manager layer i+1 … QoS Manager QoS Manager QoS Manager QoS Manager … … layer i
Multi-layer Architecture for Unplanned Events cont. QoS manager can be divided into three logical units: Detection - Monitor local conditions and events through probes in the system and operating environment, Decision - Determine the adaptation strategies based on the information from the detection unit as well as the control/policy signals pushed down from higher layer QoS managers and feedback from the lower layer QoS managers. Reaction - Execute and push the adaptation strategies down to the lower layer QoS manager for enforcement.
Multi-layer Architecture for Unplanned Events cont. Handling unplanned events Step Example Detection units probe for symptoms of unplanned events Decreased bandwidth capacity, prolonged delay Decision and reaction units provide rapid reaction to the symptoms of unplanned events based on local information Compress data and/or reduce data rate Higher layer manager(s) figures out a corrective strategy to bring the system back to normal operation If the cause is the defect in the router, then higher level manager may decide to redirect the traffic through a different path; if the cause is external loads, then it may decide to elevate the privilege of the intended traffic
QuO as Underlying Technology QuO middleware framework Works with CORBA, Java RMI Provides measurement, control, and contract-driven adaptation Provides a QoS behavior encapsulation model, called Qosket CORBA Component Model (CCM) support
QuO as Underlying Technology cont. Using QuO in MLA Elements in MLA QuO Components Detection unit System condition objects Decision unit Contracts Controllers Resource Managers Reaction unit Callback objects Delegates Multi-layer feedback/control
Conclusions The Multi-layer Architecture provides the flexibility to handle unplanned events at multiple layers Unplanned event handling using this architecture is analogous to the way exceptions are handled in various programming languages Various QoS management approaches, from centralized to highly distributed and combined strategies in between, can be accommodated within the same architecture QuO adaptive middleware provides support to key functionalities of the architecture For more info visit: http://quo.bbn.com