Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Programming Model Support for Dependable,

Slides:

Advertisements

Similar presentations

UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

Advertisements

Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.

Chuang, Sang, Yoo, Gu, Killian and Kulkarni, “EventWave” SoCC ‘13 EventWave: Programming Model and Runtime Support for Tightly-Coupled Elastic Cloud Applications.

Analysis of RAMCloud Crash Probabilities Asaf Cidon Stanford University.

Database Administration and Security Transparencies 1.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.

Transparent Environment for Replicated Ravenscar Applications Luís Miguel Pinho Francisco Vasques Ada-Europe 2002 Vienna, Austria June 2002.

Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.

What Should the Design of Cloud- Based (Transactional) Database Systems Look Like? Daniel Abadi Yale University March 17 th, 2011.

MPICH-V: Fault Tolerant MPI Rachit Chawla. Outline  Introduction  Objectives  Architecture  Performance  Conclusion.

Distributed Database Management Systems

CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.

16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.

Causal Logging : Manetho Rohit C Fernandes 10/25/01.

Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.

GS 3 GS 3 : Scalable Self-configuration and Self-healing in Wireless Networks Hongwei Zhang & Anish Arora.

What is Fault in an Overlay Network and How Can We Tolerate Them?

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.

IPDPS, Supporting Fault Tolerance in a Data-Intensive Computing Middleware Tekin Bicer, Wei Jiang and Gagan Agrawal Department of Computer Science.

PicsouGrid Viet-Dung DOAN. Agenda Motivation PicsouGrid’s architecture –Pricing scenarios PicsouGrid’s properties –Load balancing –Fault tolerance Perspectives.

DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.

Fault Tolerance via the State Machine Replication Approach Favian Contreras.

Tolerating Faults in Distributed Systems

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

IMDGs An essential part of your architecture. About me

RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah

Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.

Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.

EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.

Cloud Testing Haryadi Gunawi Towards thousands of failures and hundreds of specifications.

ISADS'03 Message Logging and Recovery in Wireless CORBA Using Access Bridge Michael R. Lyu The Chinese Univ. of Hong Kong

Toward Fault-tolerant P2P Systems: Constructing a Stable Virtual Peer from Multiple Unstable Peers Kota Abe, Tatsuya Ueda (Presenter), Masanori Shikano,

Research of P2P Architecture based on Cloud Computing Speaker : 吳靖緯 MA0G0101.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.

Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

Google File System Robert Nishihara. What is GFS? Distributed filesystem for large-scale distributed applications.

Section 2.1 Distributed System Design Goals Alex De Ruiter

Concurrent Object-Oriented Programming Languages Chris Tomlinson Mark Scheevel.

FTOP: A library for fault tolerance in a cluster R. Badrinath Rakesh Gupta Nisheeth Shrivastava.

Data-Centric Systems Lab. A Virtual Cloud Computing Provider for Mobile Devices Gonzalo Huerta-Canepa presenter 김영진.

Hierarchical Coordinated Checkpointing Protocol Himadri Sekhar Paul. Arobinda Gupta. R. Badrinath. Dept. of Computer Sc. & Engg. Indian Institute of Technology,

Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.

Garbage Collecting the World Presentation: Mark Mastroieni Authors: Bernard Lang, Christian Queinne, Jose Piquer.

Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.

Remote Backup Systems.

8.6. Recovery By Hemanth Kumar Reddy.

PREGEL Data Management in the Cloud

EEC 688/788 Secure and Dependable Computing

Supporting Fault-Tolerance in Streaming Grid Applications

RAID RAID Mukesh N Tekwani

湖南大学-信息科学与工程学院-计算机与科学系

EECS 498 Introduction to Distributed Systems Fall 2017

Outline Announcements Fault Tolerance.

Fault Tolerance Distributed Web-based Systems

Ch 4. The Evolution of Analytic Scalability

EEC 688/788 Secure and Dependable Computing

Slides prepared by Samkit

SAND: Towards High-Performance Serverless Computing

EEC 688/788 Secure and Dependable Computing

RAID RAID Mukesh N Tekwani April 23, 2019

Abstractions for Fault Tolerance

Remote Backup Systems.

University of Wisconsin-Madison Presented by: Nick Kirchem

Presentation transcript:

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Programming Model Support for Dependable, Elastic Cloud Applications Wei-Chiu Chuang, Bo Sang, Charles Killian, Milind Kulkarni 1

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Motivation Elasticity Dependability Implementation Conclusion 2

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Motivation – Cloud in Reality 3

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Motivation Imagine an ideal world… 4

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Motivation -“Ideal” Cloud 5

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Motivation - Failures 6 Failure! Single node failure induces global failure recovery

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Motivation – Failure in Large Systems 7 Naïve elasticity makes system more likely to fail

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Motivation - Too Big To Fail 8

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Motivation Elasticity Dependability Implementation Conclusion 9

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Elasticity – Context 10 Node state X Y Z e1 = 2 e2 = 5 e3 = 4

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Elasticity – Independent Contexts 11 Node state X Y Z Context 1: x Context 2: y Context 3: z e1e2 e3 Commit in sequential order

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Elasticity - Hierarchical Contexts 12 global c’ c c e1e2 message

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Elasticity – Logical Node 13 Demands Distribute contexts to more nodes demand throughput

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Elasticity – Logical Node 14 demand throughput

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Elasticity – Logical Node 15 Elasticity: change the mapping of contexts to physical nodes demand throughput

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Motivation Elasticity Dependability Implementation Conclusion 16

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Partial Recovery Failure! Failure recovery is per-context basis

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Event Replay Failure! Event replay is safe: outgoing messages is deferred until commit

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Motivation Elasticity Dependability Implementation Conclusion 19

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Head Node 20 Remote nodes interact with head nodes Logical node1 Logical node2 Logical node3

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Motivation Elasticity Dependability Implementation Conclusion 21

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Conclusion Elasticity is crucial for cloud applications. Our programming model enables elastic execution. The elastic mechanism also helps fault tolerance MaceSystems

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 23 Questions?

Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Motivation Elasticity Dependability Programming Model Conclusion Backup Slides 24