Heading Off Correlated Failures through Independence-as-a-Service

Slides:

Advertisements

Similar presentations

2 Introduction A central issue in supporting interoperability is achieving type compatibility. Type compatibility allows (a) entities developed by various.

Advertisements

Seyedehmehrnaz Mireslami, Mohammad Moshirpour, Behrouz H. Far Department of Electrical and Computer Engineering University of Calgary, Canada {smiresla,

Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.

1 NETE4631 Cloud deployment models and migration Lecture Notes #4.

Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.

Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.

Introduction To System Analysis and Design

1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.

Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.

H-1 Network Management Network management is the process of controlling a complex data network to maximize its efficiency and productivity The overall.

A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.

Construction of efficient PDP scheme for Distributed Cloud Storage. By Manognya Reddy Kondam.

A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.

CST203-2 Database Management Systems Lecture 2. One Tier Architecture Eg: In this scenario, a workgroup database is stored in a shared location on a single.

Designing Active Directory for Security

An Integration Framework for Sensor Networks and Data Stream Management Systems.

Module 7: Fundamentals of Administering Windows Server 2008.

University of Westminster – Y. Zetuny, G. Terstyanszky, S. Winter, P. Kacsuk Centre for Parallel Computing Cavendish School of Informatics.

Introduction To System Analysis and Design

Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.

9 Systems Analysis and Design in a Changing World, Fourth Edition.

11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.

Lecture 18: Object-Oriented Design

Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.

Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.

Web Technologies Lecture 13 Introduction to cloud computing.

Launch Amazon Instance. Amazon EC2 Amazon Elastic Compute Cloud (Amazon EC2) provides resizable computing capacity in the Amazon Web Services (AWS) cloud.

Database Laboratory Regular Seminar TaeHoon Kim Article.

Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.

Introduction to DBMS Purpose of Database Systems View of Data

Chapter 6: Securing the Cloud

Talal H. Noor, Quan Z. Sheng, Lina Yao,

OPERATING SYSTEMS CS 3502 Fall 2017

Chapter 1: Introduction to Systems Analysis and Design

Introduction to Load Balancing:

Presentation on Software Requirements Submitted by

Improving searches through community clustering of information

Integrating HA Legacy Products into OpenSAF based system

Object-Oriented Analysis and Design

APARTMENT MAINTENANCE SYSTEM

VIRTUAL SERVERS Presented By: Ravi Joshi IV Year (IT)

Software Design and Architecture

Distribution and components

Advance Software Engineering

CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017

CHAPTER 3 Architectures for Distributed Systems

Towards Reliable Application Deployment in the Cloud

Auditing & Investigations I

Storage Virtualization

Introduction to Cloud Computing

De-anonymizing the Internet Using Unreliable IDs By Yinglian Xie, Fang Yu, and Martín Abadi Presented by Peng Cheng 03/22/2017.

Advanced Operating Systems

Introduction to cosynthesis Rabi Mahapatra CSCE617

Providing Secure Storage on the Internet

Chapter 1 Database Systems

Building a Database on S3

Database Systems Chapter 1

SpiraTest/Plan/Team Deployment Considerations

Fault Tolerance Distributed Web-based Systems

Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.

NTHU CS5421 Cloud Computing

Component--based development

Distributed Ledger Technology (DLT) and Blockchain

Analysis models and design models

An Introduction to Software Architecture

Introduction to DBMS Purpose of Database Systems View of Data

Chapter 1 Database Systems

Distributed Systems Bina Ramamurthy 4/22/2019 B.Ramamurthy.

Chapter 1: Introduction to Systems Analysis and Design

Presentation transcript:

Heading Off Correlated Failures through Independence-as-a-Service CS 523 Paper Presentation Kai Huang

Abstract Today’s systems rely on redundancy to ensure reliability Complex, multi-layered hardware/software stacks may share deep, hidden dependencies May undermine redundancy efforts and introduce unanticipated correlated failures

Abstract Solution: Independency-as-a-service (or INDaaS An architecture to audit the independence of redundant systems proactively Utilize pluggable dependency acquisition modules to collect structural dependency information (network, hardware, software…) Quantify independence of systems using pluggable auditing modules

Introduction Example: Glitch on one Amazon Elastic Block Store server disabled EBS service Lead to correlated failures across multiple Elastic Compute Cloud (EC2) instances Disabled applications designed for redundancy across these EC2 instances

Introduction Existing techniques usually require human intervention (slow) Correlated failures can be hidden by non-transparent business contracts between cloud providers (e. g. EC2 and Azure were disabled at the same time because a storm took down local power source and backup generator)

Introduction Propose Independence-as-a-Service (INDaaS) Collets and audits structural dependency data to evaluate the independence of redundant systems before failures occur Consists of Pluggable dependency acquisition modules that collect dependency data Pluggable auditing modules to quantify independence and identify common dependency Builds on traditional fault analysis techniques Support independence auditing even across mutually distrustful cloud providers who may be unwilling to share full dependency data (private independence auditing or PIA)

Architecture Overview Step 1: The auditing client, Alice, specifies to the auditing agent what services she wishes to audit and in what way. This specification includes: a) the relevant data sources; b) the level of redundancy desired; c) the types of components and dependencies to be considered; and d) the metrics used to quantify independence. Step 2: The auditing agent issues a request to each data source Alice specified. Step 3: Each specified data source uses one or more dependency acquisition modules to collect the dependency data for future independence auditing Step 4: In the private independence auditing (or PIA) case, the data sources collaborate to obtain the auditing results without revealing the proprietary dependency data to each Step 5: Each data source returns to the auditing agent either the full dependency data for structural independence auditing, or in the PIA case, returns the collaboratively computed independence auditing results. Step 6: The auditing agent returns to Alice an auditing report quantifying the independence of various redundancy deployments, optionally computing some use- ful information such as the estimates of correlated failure probabilities and ranked lists of potential risk groups.

Architecture Overview Three main types of entities Auditing client Requests audit of independence of cloud systems May request one-time / periodic independence audit Dependency data sources Providers of cloud systems Computation, storage and networking components Auditing agent Mediates interaction between auditing client and the data sources Construct dependency graph based on data from data sources Process dependency graph Then, the agent processes the dependency graph and quantifies its independence, or identifies any unexpected common dependencies using a set of pluggable independence auditing modules.

Dependency Acquisition A sample distributed storage system. Suppose an auditing client desires two-way redundancy for her service running on two of the three servers S1-S3 within her cloud. She submits to the auditing agent a specification indicating: 1) IP addresses of the three servers, and 2) relevant software components running on these servers. Our current prototype requires the auditing client to list software components of interest manually – e.g., Query Engine and Riak(a distributed database) in this example. With this specification, the auditing agent invokes the dependency acquisition modules (i.e. NSDMiner, lshw, and apt-rdepends) on each server to collect the network, hardware, and software dependencies, and store them in the DepDB

Dependency Acquisition Three main category of dependency Network dependency – a route from source to destination via various network components (e.g. router) Hardware dependency – physical component (e. g. disk, CPU of a server Software dependency – the package information of a software component A hardware dependency describes a physical component, e.g., a disk or CPU of a server. The Hw field denotes a physical component, and Type specifies the type of this component such as CPU, disk, RAM, etc. The Dep field specifies the model number of the component. Software dependency: Pgm field denotes software component Hw specifies the hardware on which it runs Dep various packages used by it

Independence Auditing Two scenarios: Structural independence auditing (data sources are willing to provide full dependency data) Private independence auditing (support analysis across multiple cloud providers unwilling to reveal full dependency data)

Independence Auditing – structural independence auditing Generate an explicit dependency graph representation Adapt traditional fault tree models to a directed acyclic graph structure (DAG)

Independence Auditing – structural independence auditing

Independence Auditing – structural independence auditing Generalize the representation to express dependencies at three different levels of detail: Component-set – most basic level of detial Fault-set – additionally assign weight to each component, assign each failure even a probability Fault graph – assume a single level of redundancy across data sources

Independence Auditing – structural independence auditing Determine Risk Groups Minimal RG algorithm Failure sampling algorithm Ranking Risk Groups Size-based ranking Failure probability ranking

Independence Auditing – private independence auditing Existing general approach: Use secure multi-party computation to compute and reveal overlap among the datasets of multiple cloud providers while keeping the data themselves private Problem: scales poorly due to complexity

Independence Auditing – private independence auditing Trust assumption Three main types of entinies Auditing client Cloud providers Auditing agent Assume that auditing clients are potentially malicious and wish to learn as much as possible about the cloud providers’ private dependency data

Independence Auditing – private independence auditing Techniques Jaccard similarity Compute Jaccard similarity based on MinHash Private set intersection cardinality protocol – allows a group of parties each with a local dataset to compute the number of overlapping elements without learning any elements in other parties’ dataset (P-SOP)

Independence Auditing – private independence auditing Generate local dependency graph at each cloud provider Normalize the local dependency graphs to ensure same component shared across different cloud providers has same identifier Use P-SOP to compute the number of common/unique components across cloud providers Use MinHash to deal with large datasets Otherwise, if cloud providers in a potential redundancy deployment have large component-sets, PIA uses M hash functions based on the MinHash technique to map each such component-set to a much smaller dataset Si, and then takes these MinHash-generated datasets as input to the P-SOP as normal to get the number of common components across cloud providers

Limitations and Practical Issues Accurate failure probability acquisition may be challenging Only takes static software dependency into account – potential solution: access logs and configuration scripts Cloud providers may not have incentive to join Cloud providers may not behave honestly

Implementation and Deployment Auditing client written in Python Dependency acquisition module (written in Python) include three open source tools NSDminer lshw aptrdepends Auditing agent (written in Python) with NetwrokX library, collets dependency data from dependency acquisition modules over SSH

Implementation and Deployment

Implementation and Deployment

Evaluation Common Network Dependency Over 190 different two-way redundancy deployment Among witch 27 do not have unexpected RGs Without INDaaS a random selection leads to only 14% probability to avoid unexpected RGs

Evaluation: Efficiency v.s. Accuracy

Evaluation: Efficiency v.s. Accuracy

Conclusion INDaaS, an architecture to audit the independence of redundant service deployments in the clould.

Thank you!