Investigating Survivability Strategies for Ultra-Large Scale (ULS) Systems Vanderbilt University Nashville, Tennessee Institute for Software Integrated.

Slides:



Advertisements
Similar presentations
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Advertisements

Distributed Systems Topics What is a Distributed System?
Lecturer: Sebastian Coope Ashton Building, Room G.18 COMP 201 web-page: Lecture.
CSE 6590 Department of Computer Science & Engineering York University 1 Introduction to Wireless Ad-hoc Networking 5/4/2015 2:17 PM.
CS 795 – Spring  “Software Systems are increasingly Situated in dynamic, mission critical settings ◦ Operational profile is dynamic, and depends.
High-confidence Software for Cyber Physical Systems Drexel University Philadephia, PA Vanderbilt University Nashville, Tennessee Aniruddha Gokhale *, Sherif.
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
Chapter 9 Designing Systems for Diverse Environments.
Variability Oriented Programming – A programming abstraction for adaptive service orientation Prof. Umesh Bellur Dept. of Computer Science & Engg, IIT.
1 Quality Objects: Advanced Middleware for Wide Area Distributed Applications Rick Schantz Quality Objects: Advanced Middleware for Large Scale Wide Area.
Software Engineering and Middleware: a Roadmap by Wolfgang Emmerich Ebru Dincel Sahitya Gupta.
DARPA Dr. Douglas C. Schmidt DARPA/ITO Towards Adaptive & Reflective Middleware for Combat Systems Wednesday, June 24, 2015 Authorized.
- 1 - Component Based Development R&D SDM Theo Schouten.
Investigating Lightweight Fault Tolerance Strategies for Enterprise Distributed Real-time Embedded Systems Tech-X Corporation Boulder, Colorado Vanderbilt.
1 FM Overview of Adaptation. 2 FM RAPIDware: Component-Based Design of Adaptive and Dependable Middleware Project Investigators: Philip McKinley, Kurt.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
23 September 2004 Evaluating Adaptive Middleware Load Balancing Strategies for Middleware Systems Department of Electrical Engineering & Computer Science.
QoS-enabled middleware by Saltanat Mashirova. Distributed applications Distributed applications have distinctly different characteristics than conventional.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Research and Projects Name: Aloysius. A. EDOH Name: Alo Address: Ext 3344.
Tufts Wireless Laboratory School Of Engineering Tufts University “Network QoS Management in Cyber-Physical Systems” Nicole Ng 9/16/20151 by Feng Xia, Longhua.
26 Sep 2003 Transparent Adaptive Resource Management for Distributed Systems Department of Electrical Engineering and Computer Science Vanderbilt University,
Version 4.0. Objectives Describe how networks impact our daily lives. Describe the role of data networking in the human network. Identify the key components.
Cluster Reliability Project ISIS Vanderbilt University.
Wireless Access and Terminal Mobility in CORBA Dimple Kaul, Arundhati Kogekar, Stoyan Paunov.
Distributed Software Engineering Lecture 1 Introduction Sam Malek SWE 622, Fall 2012 George Mason University.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
HPEC’02 Workshop September 24-26, 2002, MIT Lincoln Labs Applying Model-Integrated Computing & DRE Middleware to High- Performance Embedded Computing Applications.
Sunday, October 15, 2000 JINI Pattern Language Workshop ACM OOPSLA 2000 Minneapolis, MN, USA Fault Tolerant CORBA Extensions for JINI Pattern Language.
Dr. Douglas C. Schmidt, Dr. Aniruddha S. Gokhale, Bala Natarajan, Jeff Parsons, Tao Lu, Boris Kolpackov, Krishnakumar Balasubramanian, Arvind Krishna,
DataReader 2 Enhancing Security in Ultra-Large Scale (ULS) Systems using Domain- specific Modeling Joe Hoffert, Akshay Dabholkar, Aniruddha Gokhale, and.
Decision-Theoretic Planning with (Re)Deployment of Components in Distributed Real-time & Embedded Systems Douglas C. Schmidt
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
Aniruddha Gokhale and Jeff Gray Institute for Software Integrated Systems (ISIS) Vanderbilt University Software Composition and Modeling Laboratory University.
MDDPro: Model-Driven Dependability Provisioning in Enterprise Distributed Real-time and Embedded Systems Sumant Tambe* Jaiganesh Balasubramanian Aniruddha.
Group member: Kai Hu Weili Yin Xingyu Wu Yinhao Nie Xiaoxue Liu Date:2015/10/
NetQoPE: A Middleware-based Netowork QoS Provisioning Engine for Distributed Real-time and Embedded Systems Jaiganesh Balasubramanian
A QoS Policy Modeling Language for Publish/Subscribe Middleware Platforms A QoS Policy Modeling Language for Publish/Subscribe Middleware Platforms Joe.
Domain-Specific Modeling Languages for Configuring and Evaluating Enterprise DRE System Quality of Service Stoyan G. Paunov, James H. Hill, Douglas C.
Adaptive Resource Management Architecture for DRE Systems Nishanth Shankaran
Towards a Holistic Approach for Integrating Middleware with Software Product Lines Research Institute for Software Integrated Systems Dept of EECS, Vanderbilt.
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
Topic 2: The Role of Open Standards, Open-Source Development, & Different Development Models & Processes (on Industrializing Software) ARO Workshop Outbrief,
POSAML: A Visual Language for Middleware Provisioning Dimple Kaul, Arundhati Kogekar, Aniruddha Gokhale ISIS, Dept.
Enhancing Security in Enterprise Distributed Real-time and Embedded Systems using Domain-specific Modeling Akshay Dabholkar, Joe Hoffert, Aniruddha Gokale,
Towards A QoS Modeling and Modularization Framework for Component-based Systems Sumant Tambe* Akshay Dabholkar Aniruddha Gokhale Amogh Kavimandan (Presenter)
Creating competitive advantage Copyright © 2003 Enterprise Java Beans Presenter: Wickramanayake HMKSK Version:0.1 Last Updated:
Model-Driven Optimizations of Component Systems Vanderbilt University Nashville, Tennessee Institute for Software Integrated Systems OMG Real-time Workshop.
Improving System Availability in Distributed Environments Sam Malek with Marija Mikic-Rakic Nels.
FLARe: a Fault-tolerant Lightweight Adaptive Real-time Middleware for Distributed Real-time and Embedded Systems Dr. Aniruddha S. Gokhale
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
CoSMIC: An MDA Tool Suite for Distributed Real-time and Embedded Systems Aniruddha Gokhale, Tao Lu, Emre Turkay, Balachandran Natarajan, Jeff Parsons,
Issues in Cloud Computing. Agenda Issues in Inter-cloud, environments  QoS, Monitoirng Load balancing  Dynamic configuration  Resource optimization.
Copyright © Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Deployment and Mobility Software Architecture Lecture 12.
Chapter 1 Characterization of Distributed Systems
The Role of Reflection in Next Generation Middleware
Sumant Tambe* Akshay Dabholkar Aniruddha Gokhale
International Service Availability Symposium (ISAS) 2007
Vanderbilt University
Applying Domain-Specific Modeling Languages to Develop DRE Systems
Tools for Composing and Deploying Grid Middleware Web Services
International Service Availability Symposium (ISAS) 2007
Cloud Computing Architecture
Presentation transcript:

Investigating Survivability Strategies for Ultra-Large Scale (ULS) Systems Vanderbilt University Nashville, Tennessee Institute for Software Integrated Systems Jaiganesh Balasubramanian Dr. Aniruddha Gokhale Dr. Douglas C. Schmidt Dr. Sherif Abdelwahed

2 Key characteristics of the solution space Vast accidental & inherent complexities Continuous evolution & change Highly heterogeneous (& legacy constrained) platform, language, & tool environments Key characteristics of the problem space Network-centric, dynamic, very large-scale “systems of systems” Stringent simultaneous QoS demands, e.g., “never die,” time-critical, etc. Highly diverse, complex, & increasingly integrated/autonomous application domains Ultra-Large Scale (ULS) System Characteristics Mapping & integrating problem artifacts to solution artifacts is hard

3 Motivating Scenario for ULS Impact of Service-Oriented Architectures on enterprise distributed real-time & embedded (DRE) ULS systems Applications composed of an “operational string” of services A service is an assembly of components Dynamic (re)deployment of services into operational strings is necessary Performability = performance + survivability requirements Key challenges Regulating & adapting to (dis)continuous changes in runtime environments e.g., online prognostics, dependable upgrades Satisfying tradeoffs between multiple (often conflicting) QoS demands e.g., secure, real-time, reliable, etc. Satisfying QoS demands in face of fluctuating and/or insufficient resources e.g., mobile ad hoc networks (MANETs)

4 Some Performability Challenges for ULS Systems Performability challenges in dynamic provisioning of operational strings & services Service workloads & resource capacity issues – service placement depends on workloads & available resources Service accessibility patterns – service survivability depends on its sharing degree Differentiated levels of QoS – affects resource provisioning & survivability strategies Operational string & service failover – different failover possibilities e.g., as a whole or part operational string or one service at a time No one-size-fits-all dependability strategy – cannot dictate one survivability strategy on all services & operational strings Application performability addressed by resolving service placement & survivability problems

5 Model of Approach Model addresses various concerns: Per-service concern: Choice of implementation Depends on resources, compatibility with other components in assembly Coupling concern: Choice of invocation & communication mechanism used Sharing concern: Shared services will need proactive survivability since it affects several services simultaneously Failure recovery concern: What is the unit of failover? Availability concerns: What is the degree of redundancy? What replication styles to use? Does it apply to whole assembly? Deployment concerns: How to select resources? How much sharing? Assembly concerns: What components to assemble dynamically? Configurations & optimizations for end-to-end performability? Service placement & service survivability strategies address these concerns

6 Addressing the Service Placement Problem Service placement algorithms must consider tradeoffs between providing performance to applications & providing survivability to applications, allocating resources either to primaries or replicas Service placement problem must consider: Set of computation nodes attributed by: Processing index or capacity Memory index or capacity Survivability index Set of communication links attributed by: Bandwidth index Survivability index Set of components attributed by: Different implementations offering performance tradeoffs across quality dimensions Different implementations consuming various amounts of resources Constraints on being deployed as an assembly to offer a complete service Replica placement issues involve: Different availability requirements for different assemblies of components: Multiple replicas needed, tolerate non-availability of replicas based on importance of assemblies Replica resource provisioning depending on replication schemes used Load balancing of replicas if resources available but introduce run-time problems on consistency

7 Addressing the Survivability Problem A configurable approach to survivability including micro- (infrastructure) & macro- (assembly & operational string) level strategies Micro-level strategies monitor infrastructure state to make proactive decisions at Component level (swapping & migration) Middleware level (configurations) Component Server Level (process resource allocations) Node level (multiple components) Macro-level strategies monitor assembly health to make failover decisions Failover based on type of failover unit Affects service placement decisions May involve load balancing State synchronization issues Replication styles (hidden by FT strategies) Initial prototype developed using Component-Integrated ACE ORB (CIAO) & Deployment & Configuration Engine (DAnCE) ( Future work on Data Distribution Service (DDS) & Distributed Real-time Specification for Java (DRTSJ)