Download presentation
Presentation is loading. Please wait.
Published byEthelbert Bradford Modified over 9 years ago
1
Issues and Ideas in Software Reliability for FCS Joe Loyall BBN Technologies
2
5/18/2004Joe Loyall2 General Issues Affecting Reliability of FCS Size and complexity - Very large, complex systems –Many interoperating parts, developed by different people, including legacy –Unreliability of any one part can affect the system, but reliability of any one part may have little effect on the reliability of the entire system Large mission requirements that decompose into distributed (and some local) requirements –Too easy to decompose poorly One can verify, validate, and unit test individual pieces –However, reliability of the whole is not the sum of the reliability of the parts Abstracting away the details can help one to understand some of the high-level design –However, putting back in the details later can put back in the complexity and the bugs Some things can’t be put back in later, because they are pervasive –Trying to insert some things after the fact can greatly increase the fragility of the system –QoS, security, fault tolerance are examples Tying too tightly to a hardware platform can lead to future brittleness; Tying too loosely can lead to bugs associated with lack of control –Motivates the need for a middle layer Reliability of the system can be limited by the quality of the least capable programming group –Motivates the need for strong processes, tools, patterns, etc.
3
5/18/2004Joe Loyall3 Topic 1: Building Reliable FCS Software with Managed Quality of Service (QoS) Managed QoS in DRE systems is crucial –Providing managed QoS currently complicates application development significantly especially in distributed environments –Has traditionally been handled with static provisioning –Recent research has developed the ability to handle QoS at runtime with control and adaptation New advances are needed to develop reliable FCS software –Can’t move backward to only static provisioning because FCS is too dynamic –Runtime QoS control, however, is only one part of software reliability Need to continue to build upon the advances of recent years… –Separate programming of QoS and functionality –Design-time specification and runtime enforcement of QoS –Predictable end-to-end QoS in dynamic environments –Component sized units for encapsulation, reuse, and composition While moving forward to support the design and implementation of reliable QoS managed FCS software –Modeling of QoS aspects separately from, but alongside, functional and component modeling –Programming to well-defined QoS interfaces and standard protocols –Reusable encapsulated, but configurable, QoS behaviors that can be assembled with reliability –Models, tools, patterns, and processes
4
5/18/2004Joe Loyall4 Area of Focus: SoS QoS Designing SoS must consider several dimensions of QoS QoS for each individual end-to-end string (SDMS/W) QoS for multiple end-to-end application strings competing for resources Doing this for non-fixed, changing numbers of application strings Handling it dynamically, where conditions change over time Technologies and processes to make it feasible to handle QoS at the System of Systems (SoS) level of abstraction Modeling tools that support design of QoS aspects of SoS separate from, but alongside, functional components QoS interfaces and patterns of use that enforce managed assembly and disciplined composition of QoS and functional components (ala type checking and IDL) Multi-layer QoS design and management –Mission layer coordinates missions and mission- level policies –Coordination layer manages QoS for logically or physically related sets of components –Resource layer manages QoS for individual resources or mechanisms Reusable, validated QoS components Assembly, deployment, and configuration with validated behavior –Validated QoS behaviors assembled into validated patterns using enforcing interfaces
5
5/18/2004Joe Loyall5 Topic 2: Processes and Methods for FCS Software Reliability Modeling is important, but not a silver bullet, and can be dangerous –Models can diverge from implementation over time (is incorrect documentation worse than no documentation?) –Models frequently are higher level, and more abstract, to capture the top-down design, but introducing the details later introduces bugs and complexity (Need proper abstractions and correct/complete decomposition support) –Modeling can introduce more opportunities for errors Models can be incorrect (need for model validation) Code synthesizers can be incorrect Interaction with legacy or handwritten code can introduce errors Well-defined interfaces and “type” enforcement –Component interfaces and type enforcement have reduced many instances of common errors –With some attention and research, could the QoS, security, fault tolerance, etc equivalent be developed Verification and constraint concepts might provide partial solutions to FCS reliability –Constraints and verification at many levels (higher abstract design level down through each decomposition) – the only way to scale the idea to the size and complexity of FCS –“Proof carrying code”-like enforcement constraints for functionality and QoS, for assembly, deployment, configuration, and runtime Can prevent errors in some cases Earlier detection (in the life cycle) of other problems Aid in software correctness over the system’s lifetime
6
5/18/2004Joe Loyall6 Topic 3: Open-Standards, Open-Source, and Alternative Models Open standards and open-source are trends that are unlikely to reverse –Economic benefits – no single vendor for a technology; longer lived technology bases –Fewer stove-piped, one-of-a-kind systems –Pushes the technology up System developers can assume the existence of infrastructure and the programmers that understand it Enables the development of systems with greater capability because they don’t have to be built from the ground up –However, they make integration more important and more frequent Program development models that increase reliability –There are domains in which software is well-engineered and reliable –For example, many business applications (which previously were developed by professional programmers) are developed today by domain experts (e.g., accountants) in well-established, reliable tools (e.g., spreadsheet programs) –Are there parts of FCS SoS building that can likewise, with the proper tool support, be turned over to domain experts and what would be needed to enable it? Patterns of use, idioms that lead programmers to producing correct software Modeling or other programming tool environments with domain-friendly interfaces These tools could be highly constraining to allow production of only well-behaved, reliable software because their focus is narrow and domain-specific
7
5/18/2004Joe Loyall7 Topic 4: Certification of FCS SoS Software Certification is already a difficult issue and the highly distributed, heterogeneous, and dynamic nature of large SoS software makes certification with current processes more difficult –However systems of greater scale, distribution, interaction, and dynamism are inevitable and need to be certified –The nature of the systems being certified and the nature of the certification process might need to evolve simultaneously –Certification of individual components or participants is unlikely to scale well to certification of the entire system –Can certification of individual behaviors contribute to certification of a system that can change its behavior We can provide techniques that support the certification of dynamic systems –Increase the ability to certify dynamic systems by constraining their dynamism Critical subsystems limited to dynamically choosing from a set of certified static choices –If we can’t certify exactly correct behavior for highly dynamic systems, perhaps we can certify their limits For example, certify that an adaptive system can do no harm; while we might not be able to certify exactly how it can adapt, we can certify how much, or within what limits, it can adapt or that its adaptation can affect the rest of the system –Can we certify the adaptive mechanisms that delimit behavior, recover, protect, or keep software operating within a “safe” subset of possibilities In a highly dynamic, distributed system even if we cannot certify that it is free from defects, perhaps it is sufficient to certify that the system would gracefully handle, recover from, or fix defects How do we certify the adaptive mechanisms – useful if we can presume this is simpler than certifying the full system behavior
8
Some Additional Technical Ideas Relevant to Reliable FCS Software Survivability for FCS
9
5/18/2004Joe Loyall9 Defense Enabling: Dynamism for Survivability Survival of critical systems, as much as security, is crucial Adaptation is essential to survive organized, malicious attack –Tolerate and recover from failures induced by the attack –Compensate (e.g., graceful degradation) if attacker succeeds in preventing use of required resources –Introduce artificial diversity to increase attacker work factor Adaptive response involves dynamic management of system resources and properties –Integration of system properties (e.g., real-time, security, dependability) and the associated tradeoffs –Strategies for coordinated, distributed, but secure adaptation and management Adaptive response is supported by –Redundancy (eliminate single point failures) –Heterogeneity (prevent common mode failures) –Uncertainty (slow staged attacks)
10
5/18/2004Joe Loyall10 Architecting Survivability into FCS (and other SoS) Reliability requires architecting in multiple dimensions Even more so, when the goal is to be resilient not only against errors, but also against attacks…. Diversity: Avoid common mode vulnerabilities Layers of protection Both HW and SW Design Principles, Architectural constrains High barrier to intrusion Adaptive response Adaptive middleware Rapid and coordinated response Isolation, recovery, Graceful degradation Redundancy: No single point of failure in critical functionality Weak assumptions Less susceptible to attacker’s manipulation of environment Detection and correlation Embedded sensors Mix of IDS and Policy violation Advanced, distributed correlation General principles for survivability Protect as best as possible Improve chances of detection Adapt to manage gaps
11
5/18/2004Joe Loyall11 Use of Modeling for Validation of Integrated Survivability Requirements decomposition Executable model of the system (probabilistic or logical) Model assumptions Supporting arguments and experimentation Survivability results obtained through modeling –Critical functionality available with high probability even when under heavy successful attack –98% of all functions successful even with vulnerabilities discovered daily, or faster –Operating system diversity bolsters reliability of critical functionality when under attack –With the current architecture, attackers are more effective compromising functionality than crashing components
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.