Download presentation
Presentation is loading. Please wait.
Published byCarl Brodersen Modified over 6 years ago
1
Principles and Patterns for QoS-enabled Fault Tolerant Middleware
Aniruddha Gokhale Bell Labs, Lucent Technologies
2
Motivating Technology Forces
Total Ship C&C Center Total Ship Computing Environments Hardware is becoming faster & cheaper IOM BSE Modalities e.g., MRI, CT, CR, Ultrasound, etc. Ubiquitous & affordable wireless/wireline broadband internet connectivity Distributed real-time & embedded (DRE) system are becoming more complex & mission-critical Increasing demand for COTS-based multi-dimensional quality-of-service (QoS) support e.g., simultaneous requirements for efficiency, predictability, scalability, security, & dependability
3
QoS-enabled dependability
IOM BSE Key open challenge QoS-enabled dependability
4
Research Synopsis Create the new generation of middleware technologies for distributed real-time & embedded (DRE) systems that enable Simultaneous control of multiple QoS properties & Composable & customizable common technology bases Hardware Middleware OS & Protocols Applications Research Challenge Research Approach Impact Domain-specific middleware services shielding lower-level middleware Model description Code generation mapping to lower level middleware Model-driven approach for network mgmt of 3G wireless components Standardized middleware for fault tolerance Influence standards body from experience with prototype Fault Tolerant middleware Standard High-Performance Middleware Iterative process involving benchmarking, quantifying/profiling, and optimizing QoS Enabled Middleware Robust Middleware Based on patterns Dependable, Extensible Andy, add a slide here that summarizes concisely what you are working on and why it represents a research contribution. I’ve cut&pasted an example from later in your talk and from my DARPA pitch, which isn’t perfect for what you’re trying to do but it’s a starting point. Please modify this so that it summarizes what you’re working on and what you’ve contributed thus far and its impact.
5
FT CORBA Contributions and Impact
Influenced the Fault tolerant CORBA standard by incorporating principles and patterns from a prototype called DOORS Patterns-based optimizations in DOORS provides high performance & predictability to DRE systems Andy, it’s not clear from this what your research contributions are. Can you please make this more clear? I recommend that you consider not having this slide unless you can really show how it demonstrates your research contributions. DONE – is this alright?
6
Overview of Fault Tolerant CORBA
Provides a standard set of CORBA interfaces, policies, & services Entity Redundancy of objects is used for fault tolerance via Replication Fault detection & Recovery from failure Features Inter-Operable Group References (IOGR) Replication Manager Fault Detector & Notifier Message Logging for recovery Fault tolerance Domains Andy, please revise all your diagrams to use a different font (e.g., arial rather than times roman) since it’s hard to read the text in all these figures. OK, will do later today
7
FT CORBA Use Case Scenario
Base stations (Node B) manage several thousand subscribers modeled as objects Radio Network Controllers manage several Node Bs modeled as objects requiring high reliability and availability (3 min per year downtime) Andy, the following slides are interesting, but they don’t really help to convey what *you’ve* done, but rather show that you understand how the FT CORBA spec works. I’m not sure what the right approach is to fix this, but you need to figure out how to emphasize the research dimensions of what you’re doing, since otherwise it comes across too much like an “implementation” effort, rather than a research effort. Doug, I want to show this as a use-case scenario and then show that the results presented on next slide do not suffice to support such systems. So we The optimizations.
8
Effect of Polling Interval on Total Recovery Time
Analysis Average failure detection time is half the polling interval Replica Group Management time is constant Key Challenge Minimize replica group management time Setting the right polling interval Recovery Time = Failure detection time + Replica Group Management Time
9
Optimization Opportunities to Improve Fault Tolerant CORBA Performance
ORB Core Optimizations Efficient IOGR parsing & connection establishment Reliable handling & ordering of GIOP messages Predictable behavior during transparent connection establishment & retransmission Tracking requests with respect to the server object group Andy, it seems to me that one way you could position your research is in terms of “improving the performance of middleware by applying patterns and optimization principles.” If so, then I recommend putting this slide much earlier, I.e. right after you introduce the notion of FT-CORBA. Then, in your “Future work” section you could apply the same argument to what you’d like to work on at VU, I.e., “improving the performance of model-generated middleware by applying patterns and optimization principles.” Doug, I will not present the “interactions in FT CORBA” stuff. That way, this slide will immediately follow the result CORBA Service Optimizations Support for dynamic system configuration Bounded recovery time Minimize overhead of FT CORBA components
10
Overview of Patterns Patterns formalize expert knowledge to help generate software architectures by capturing recurring structures & dynamics and resolving common design forces Design patterns capture the static & dynamic roles & relationships in solutions that occur repeatedly Architectural patterns express a fundamental structural organization for software systems that provide a set of predefined subsystems, specify their relationships, & include the rules and guidelines for organizing the relationships between them Optimization principle patterns document rules for avoiding common design & implementation mistakes that degrade performance
11
Challenge: Predictable and Scalable Fault Monitoring
Fault Detector Polling Thread Replica Fault Notifier Naïve implementation of Polling Thread while (true) { for each obj to be monitored { poll the object; if (no response) { report fault to notifier } sleep (polling_interval); HANGS HANGS Andy, there’s a big disconnect with your talk here. You need to motivate why you are showing this stuff now, I.e., is it to illustrate that you understand how to apply patterns to DOORS (if so, what’s the research contribution)? Make sure that you motivate this stuff properly in terms of its research impact. Allocate new thread per monitored object?
12
Predictable & Scalable Fault Monitoring
Context Periodic polling of several objects & fault notification done in the same polling thread can block the thread AMI Problem Blocking can cause missed polls Forces Must guarantee polling of all registered objects Must minimize concurrency overhead Must be scalable Andy, make sure that you can address the question that Sandeep asked you about “scalability” here... Solution Leader-Followers and ACT architectural pattern
13
Challenge: Scalable, Prioritized Fault Recovery
Context RM needs to handle bursts of fault reports Handle faults based on importance Problem Reduced responsiveness Wasted resources during normal conditions Lack of handling faults based on importance Andy, you’ve got too many slides for a 1 hour talk, so I recommend zapping the previous example and just using this example. For one thing, it’s different than the example you gave in STL, so that’ll demonstrate that you’ve got a broader focus… Forces Predictable amount of time for failure recovery irrespective of burst size Prioritize fault handling Solution Apply the Thread Pool Active Object design pattern
14
Challenges using QoS-enabled Middleware
Domain-Specific Middleware Services Network mgmt, SIP, Location-based services, web hosting services Common Middleware Services Distribution Middleware } Distributed Services Fault tolerance, Notification, Naming, Event, Logging, Transaction, etc QoS-enabled Distribution Middleware RT/FT CORBA, RT Java, .NET Infrastructure Middleware Platform independent: e.g., ACE, Java VMs, RogueWave, Microsoft CLR Andy, I like this chart, though it doesn’t really explain what the requirements & challenges are! Maybe you could have two charts, the first of which shows the MDA layers of applicability (I.e., this chart) and the second of which shows the MDA requirements and challenges… Operating Systems & Protocols Realtime OS e.g., VxWorks, PSOS General OS e.g., Solaris, Win2K, Linux Hardware IntServ/DiffServ using COTS Networks
15
Challenges using QoS-enabled Middleware
Meta programming Configuring middleware QoS parameters is hard Needs expert understanding of middleware layers and their APIs Big learning curve for application developers Applications get tightly coupled with the underlying middleware technology Domain-Specific Middleware Services Common Middleware Services Distribution Middleware Infrastructure Middleware Andy, I like this chart, though it doesn’t really explain what the requirements & challenges are! Maybe you could have two charts, the first of which shows the MDA layers of applicability (I.e., this chart) and the second of which shows the MDA requirements and challenges… Operating Systems & Protocols Hardware
16
Challenge: Resolving Economic Forces
Context Solution Reduce/amortize costs associated with existing product lifecycles Less spending by customers => Shrinking revenues and profit margins Maintaining competitive advantage and higher returns on investment Rapid Time-to-market of high quality products Plethora of competing technologies e.g., CORBA CCM, EJB/RMI, COM+, XML/SOAP Platform independent, highly interoperable design Andy, if you’re giving a talk for the ISIS group alone this slide is fine. If you’re giving a faculty job talk, then don’t use this slide since it’s considered “crass” to talk about “economics” in an academic environment… OK. Dealing with disruptive technologies or unprecendented hype Minimal effort in incorporation of upcoming/new enabling technologies
17
Candidate Solution: Model Driven Architecture
Consequence of codifying years of R&D efforts in: Meta-programming techniques, e.g., UML/XML Model-integrated computing Middleware-centric pattern languages Platform independence, interoperability Sophisticated compiler techniques & code generation tools Open R&D question: How can we apply MDA to DRE systems most effectively
18
Challenges using MDA Problems
Models used primarily for feasibility and system schedulability analysis High software lifecycle costs e.g.,mapping models to platforms incurs many “accidental complexities” & low-level platform dependencies Andy, the next several slides are cool, but they don’t explain what *you* are doing in your research. Therefore, I recommend you remove them and/or merge them into a single slide and use it for motivation. For example, maybe you could replace the various overlays on your “Motivating Technology Forces” slide with some of the icons in these slides.
19
AMW Project: MDA in Action
Aurora library management software Aurora generated management code Application-specific management s/w Application non-management software . Code Generator input specification (1) (2) (3) (4) Software libraries provide application-independent management building blocks Code generators from input model descriptions customize building blocks for specific applications Application developers implement application-specific management building blocks Building blocks cooperate to implement management functionality Andy, it’s not really clear how this stuff relates to research topics. If you can use it to motivate research that’s great, otherwise, I recommend omitting it.
20
Aurora Management Workbench
Aurora library management software Aurora generated management code Application-specific management s/w Application non-management software . Code Generator input specification (1) (2) (3) (4) Contributions Fault Escalation strategies during initialization and recovery Component status querying Event Notification Dynamic growth/degrowth Impact Management of network elements in 3G (UMTS) wireless architecture network element = self-contained distributed system hardware = processors + communication network software = network element application + management application network element application distributed, cooperating software processes that interact via message passing to provide a service to network element users management application distributed software processes that cooperatively manage the network element application “management layer”
21
Crossing the Chasm Model-driven multidimensional QoS management (performance, scalability, predictability, dependability, security) in heterogenous networks Possible Application Domains Defense Transportation Healthcare Finance Consumer Products Andy, you might be able to use the first bullet items here in the “Research Synopsis” slide at the beginning of this talk to explain what you’re doing.
22
Concluding Remarks & Future Work
Model Driven Architecture intends to resolve the technological and economic forces, however … Researchers & developers of distributed systems face common challenges, e.g.: Connection management, service initialization, error handling, flow control, event demuxing, distribution, concurrency control, fault tolerance, synchronization, scheduling, & persistence The application of patterns, frameworks, & components can help to resolve these challenges Carefully applying these techniques can yield efficient, scalable, predictable, dependable, & flexible middleware & applications Key challenge Patterns bridging the gap between models and middleware Model-driven multidimensional QoS management
23
EXTRA SLIDES
24
Concluding Remarks & Future Work
Researchers & developers of distributed systems face common challenges, e.g.: Connection management, service initialization, error handling, flow control, event demuxing, distribution, concurrency control, fault tolerance, synchronization, scheduling, & persistence Carefully applying these techniques can yield efficient, scalable, predictable, dependable, & flexible middleware & applications The application of patterns can help to resolve these challenges Model-driven multidimensional QoS management
25
Middleware for Broadband Cable
Video-on-demand Telephony and other services over cable QoS management of infrastructure components
26
Middleware for Broadband Wireless
Voice-over-IP-over-wireless M-Commerce applications Messaging Services Location Services Security Billing QoS management of infrastructure components
27
Middleware for Adhoc Networks
Peer-to-peer communication Efficient routing Fault tolerance and QoS Seamless interoperability between Bluetooth, , 3G
28
Problems with Modeling Approaches
Dynamic DRE combat system QoS requirements historically not supported by COTS i.e., COTS is too big, slow, buggy, incapable, & inflexible Likewise, the proprietary multiple technology bases in DRE combat systems today limit effectiveness by impeding Assurability (of QoS), Adaptability, & Affordability Today, each combat system brings its own: networks computers displays software people Applications Applications Problems Non-scalable tactical performance Inadequate QoS control for joint operations e.g., distributed weapons control High software lifecycle costs e.g., many “accidental complexities” & low-level platform dependencies Sensor Systems Command & Control System Engagement System Weapon Control Systems Weapon Systems Andy, please adapt this slide to explain why current approaches to addressing the technology/economic forces are inadequate… EO Kill Eval Sched Illum Network AAW EG TBM EG AAW AAW MG AAW AAW TMB MG AAW Technology base: Proprietary MW Mercury Link16/11/4 Technology base: DII-COE POSIX ATM/Ethernet Technology base: Proprietary MW POSIX NTDS Technology base: Proprietary MW VxWorks FDDI/LANS Technology base: Proprietary MW POSIX VME/1553 Operating System Operating System Endsystem Wireless/Wireline Networks Endsystem
29
Research Contributions
Context Technical Challenge Approach Impact Meta Programming Domain-specific middleware services N/W elements need mgmt Not reinvent mgmt code High-level Config file Code generation N/W Mgmt of 3G Wireless Components Core Middleware Services Standardized middleware for fault tolerance Influence standards body from experience with prototype Fault Tolerant CORBA Standard Distribution middleware High-Performance, Realtime Middleware Iterative process involving benchmarking, quantifying/profiling, and optimizing QoS Enabled Middleware Pattern languages Robust, configurable middleware Based on middleware-centric patterns Dependable, Extensible, Flexible code Andy, it’s not clear from this what YOUR research contributions are. Can you please make this more clear? Also, make sure that you emphasize that your work has spanned over a number of years, I.e., this isn’t just stuff you did in a year or two, but rather stuff you’ve been working on for the past 7 years or so.
30
Distribution Middleware Research Contributions
Meta programming Domain-Specific Middleware Services Common Middleware Services TAO Real-time CORBA ORB Contributions Distribution Middleware Infrastructure Middleware Operating Systems & Protocols Hardware
31
Distribution Middleware Research Contributions
Andy, since this is older work, I recommend that you move it to later in the talk, I.e., after the DOORS stuff. Moreover, I recommend that you add a slide or two that shows an example of the kinds of work you did on this topic, e.g., the principle patterns + optimizations.
32
Core Middleware Services Research Contributions
Meta programming Domain-Specific Middleware Services Common Middleware Services Fault tolerant CORBA (DOORS) Distribution Middleware Infrastructure Middleware Operating Systems & Protocols Hardware
33
Domain-specific Middleware Contributions
Meta programming HARP/AMW Contributions Domain-Specific Middleware Services Common Middleware Services Distribution Middleware Infrastructure Middleware Operating Systems & Protocols Hardware
34
Domain-specific Middleware and Meta-programming Contributions
Andy, it’s not clear how the title of this slide corresponds to what the diagram is and what the text bullet say. I don’t understand what value this slide has, so please try to motivate it better. Majority of network element software is management software (60 – 80 %) Includes sequenced initialization, fault tolerance, load balancing, growth/degrowth, maintenance, component querying, and event notification Management Software network element = self-contained distributed system hardware = processors + communication network software = network element application + management application network element application distributed, cooperating software processes that interact via message passing to provide a service to network element users management application distributed software processes that cooperatively manage the network element application “management layer” Network Element Code
35
Supporting Interchangeable Behavior
Context FT properties can be set statically (as defaults) or set dynamically Problem Hard-coding properties make the FT-CORBA design inflexible & non-extensible Forces Need highly extensible services that can be composed transparently from configurable properties Solution Apply the Strategy design pattern
36
Consolidating Strategies
Context. FT CORBA implementations can have many properties. e.g.,membership, replication, consistency, monitoring, # of replicas, etc. Problem Risk of combining semantically incompatible properties Forces Ensure semantically compatible properties Simplify management of properties Solution Apply the Abstract Factory design pattern
37
Dynamic Configuration
Context There are many potential FT properties that can be used Problem Static configuration of properties is inflexible & overly resource intensive Forces The behavior of FT-CORBA properties should be decoupled from the time when they are actually configured Andy, can you please enhance the diagram on this page so that it illustrates the “comp.conf” file that would be used to configure the server! Doug – does this look alright? Solution Apply the Component Configurator design pattern
38
Research Directions (TO-DO)
Middleware for broadband cable (set top boxes) 3G wireless services/applications Intelligent networking, billing, network management, real-time monitoring, load balancing Web services/E services E/M commerce FT+RT systems Smart transducer RFP Wireless CORBA
39
Benchmarking Results (TO-DO)
40
Caching Object References
Context Replication Manager queries Naming Service for Fault Detector object reference Problem Adds overhead of querying Forces Minimize time spent in querying for object reference Andy, the point in this slide seems rather “obvious,” i.e., “cache the object reference to avoid going to the naming service.” Is it possible to make this more interesting, e.g., what happens if the object reference becomes invalid, i.e., how are “stale” cached object references handled? Maybe there are some other patterns (e.g., Oberver) that could be used to address this issue! Solution Optimize for common case, store redundant info, eliminate gratuitous waste
41
Interoperable Object Group References
Composite & enhanced Interoperable Object Reference (IOR) for referencing server object groups Comprises one or more TAG_INTERNET_IOP profiles, which in turn must contain a TAG_FT_GROUP and zero or more TAG_IIOP_ALTERNATE_ADDRESS components TAG_PRIMARY component in at most one TAG_INTERNET_IOP profile Client ORBS operate on IOGRs in the same way as with IORs
42
Promising Solution: Fault Tolerant (FT) Distributed Object Computing Middleware
Challenges Limitations of non-OO FT strategies that focus on application processes Techniques based on process-based failure detection & recovery are not applicable to distributed object computing applications due to: Overly coarse granularity Inability to restore complex object relationships Restrictions on process checkpointing & recovery
43
Fault Tolerant Middleware Strategies
Integration Strategy Modify middleware to support FT, e.g.: Orbix+Isis Electra electra.html Obtrusive Interception Strategy Intercepts messages outside the ORB, e.g.: Eternal AQuA AQuA.html Unobtrusive, but complex Andy, can you please split the figure on this page into *two* diagrams, one that shows the integration strategy and one that shows the interception strategy? That way, I can animate it in a more sensible manner than I’m currently doing (to see what I’m currently doing, please run this slide in “slide show” mode!).
44
Fault Tolerant Middleware Strategies (cont’d)
Service Strategy FT as a higher-layer service, e.g., DOORS org/11356/html/doors.html Unobtrusive, but requires standard ORB support Summary of FT Strategies Integration strategy requires extensive non-portable & non-standard modifications to an ORB Interception strategy provides out-of-band solution that can be inefficient (due to duplication of effort) and is hard to port across operating systems Service strategy requires some ORB modifications, but is now standardized…
45
DOORS & FT-CORBA DOORS is a “Distributed OO Reliable Service” developed prior to FT-CORBA Uses the service strategy to provide FT to CORBA objects Patterns and mechanisms in DOORS were integrated into FT-CORBA standard DOORS implements most of FT-CORBA standard Focus on passive-replication Available as open-source for non-commercial use from Lucent Runs atop the TAO open-source real-time ORB Andy, can you please lighten the background colors, as per my previous request?!
46
Replication Manager Components Property manager
Allows properties to be set for an object group Properties include replication style, membership style, consistency style, monitoring style, & number of replicas Generic factory Creates object groups & each member of the object group Used when the membership-style is infrastructure-controlled Object group manager Used by applications to create, add or delete members of an object group
47
Fault Detection & Notification
Components Fault Detectors Detect faults using a pull-based or a push-based mechanism Fault Notifier Notified of any faults by the Fault Detector A Fault Notifier notifies its Replication Manager when it detects faults
48
Logging & Recovery Used in a infrastructure-controlled consistency style The mechanism intercepts & logs GIOP messages On failure detection, the messages can be “played back” for recovery Mechanism is transparent to the client application Andy, can you please break the diagram on this slide into two diagrams, so that I can animate them more effectively?!
49
FT CORBA Spec Requirements
Preserving the CORBA Object Model with enhancements No single point of failure Transparent failover Transparent client redirection & reinvocation FT CORBA cannot handle commision faults or correlated faults
50
Effect of Polling Interval on Failure Detection Times
Analysis Failure detection time increases with the polling interval Average failure detection time is half the polling interval Challenge Choosing small polling interval Minimize message overhead Fault detection time measured as the time between the failure of replica & the FaultDetector detecting failure
51
Research Synopsis Technical Challenge DOORS Approach Impact Standardized middleware for fault tolerance Influence standards body from experience with prototype Fault Tolerant CORBA Standard High-Performance Middleware Iterative process involving benchmarking, quantifying/profiling, and optimizing QoS Enabled Middleware Robust Middleware Based on patterns Dependable, Extensible DOORS is a “Distributed OO Reliable Service” developed prior to FT-CORBA Uses the service strategy to provide FT to CORBA objects Patterns and mechanisms in DOORS were integrated into FT-CORBA standard
52
What is CORRIDOR ? CORRIDOR = Component Oriented Reliable Realtime Infrastructure for Distributed Object MiddlewaRe Next generation of meta middleware technology for distributed real-time and embedded (DRE) telecom systems that enable Simultaneous control of multiple Quality of Services properties Composable & customizable common platform bases for telecommunications and data networking applications
53
Architecture
54
CORRIDOR Dependencies
Applications Domain-Specific Services will leverage from lower layers Common Services } QoS-enabled Distribution Middleware RT/FT CORBA, RT Java, .NET Services Fault tolerance, Notification, Naming, Event, Logging, etc Distribution Middleware Infrastructure Middleware Platform independent: e.g., ACE, JVM, RogueWave Operating Systems & Protocols Realtime OS e.g., VxWorks, PSOS General OS e.g., Solaris, Win2K, Linux Hardware Network elements using COTS hardware
55
CORRIDOR Impact Technical Challenge CORRIDOR Approach Impact
Generating optimized middleware Algorithms & tools to auto-generate optimized code for repetitive tasks Time-to-Market Saves time and resources to implement code Supporting multiple QoS Properties Components providing fault tolerance, real-time scheduling, throughput and/or latency, dynamic growth/degrowth, and runtime upgrades Configurability Rapid development of applications requiring multiple QoS properties by composition of CORRIDOR components Assuring dynamic flexibility and QoS in telecom systems Robust, yet dynamically reconfigurable, meta-programming middleware Adaptability Support varying workloads & environment over telecom system lifecycles Interoperability with third party components Based on standards-based middleware Competitive Edge Rapidly integrate newer or third party components at lower costs over life cycle of installed base Formalizing QoS-related design expertise for telecom systems Pattern languages & frameworks to encapsulate complexity in meta middleware Affordability Reduce recurring maintenance costs Reduced initial costs for new products
56
Novel Features Tools for automatic code generation for repetitive features that require domain specific knowledge (e.g. fault tolerance, real-time, security) Composition of service-enabled components based on declarative description in XML or similar formats Adaptive and Reflective capabilities to fine tune the system vertically in order to optimize resource usage Abstraction over distribution middleware allows newer middleware technology to be plugged underneath Support for run-time upgrades and growth/degrowth How does Aurora Management Workbench help? provides substantial portion of management infrastructure for run-time software maintenance allows developers to focus on implementing network element control software Plug & Play Architecture - involves a flexible, extensible and standards-based framework of reusable components shielding applications from the heterogeneity incurred by hardware, operating systems, and communication protocols
57
Technical Challenges How to effectively decouple application-specific software from management software in different dimensions (fault tolerance, security, real-time)? How to design an extensible and flexible infrastructure of components that enables the rapid incorporation of new technologies? How to specify QoS policies and requirements in different dimensions? Is XML suitable? How to monitor resources at runtime and adapt without causing performance degradation? How to handle run-time growth and software upgrades? run-time growth: adding new components while the application is running software upgrades: replacing a software component with an updated version of the component while the application is running
58
Research Directions CORRIDOR: QoS-enabled framework of middleware components Higher level middleware framework shielding applications from lower level middleware Multi-dimensional QoS support Patterns-based architecture of plug & play components Code generation tools for repetitive tasks Middleware for Ad hoc/Wireless networks FT CORBA enhancements for JINI-like systems CORBA Pluggable Protocol for Bluetooth devices Middleware enhancements for 3G wireless/mobile internet Fault tolerance and FT CORBA Sequenced Initialization and Recovery (dealing with object dependencies) Handling failure groups and collocated groups Fault Escalation strategies and Fault Analysis Growth/degrowth, runtime upgrades
59
Acknowledgments FT CORBA (DOORS) Network Management (HARP/AMW)
Douglas C. Schmidt Balachandran Natarajan Shalini Yajnik Network Management (HARP/AMW) Richard Buskens Ali Siddiqui Oscar Gonzalez
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.