Replication and Availability in Distributed Systems

Slides:

Advertisements

Similar presentations

Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.

Advertisements

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.

DISTRIBUTED SYSTEMS II REPLICATION CNT. II Prof Philippas Tsigas Distributed Computing and Systems Research Group.

Replica Control for Peer-to- Peer Storage Systems.

Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank

Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :

Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.

Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class:Consistency Semantics Consistency models –Data-centric consistency models –Client-centric.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.

Failure Resilience in the Peer-to-Peer-System OceanStore Speaker: Corinna Richter.

Replication March 16, Replication What is Replication?  A technique for increasing availability, fault tolerance and sometimes, performance 

1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.

Fault Tolerance and Replication

Chap 7: Consistency and Replication

Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.

Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.

CSE 486/586 Distributed Systems Consistency --- 3

EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.

Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.

Distributed databases A brief introduction with emphasis on NoSQL databases Distributed databases1.

Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng

Updating SF-Tree Speaker: Ho Wai Shing.

CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.

6.4 Data and File Replication

Replication Control II Reading: Chapter 15 (relevant parts)

Failure recovery and Checkpointing in Distributed Systems

Replication and Availability in Distributed systems

Consistency in Distributed Systems

View Change Protocols and Reconfiguration

Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)

Storage Systems for Managing Voluminous Data

Distributed Shared Memory

Content Dissemination Systems Including Streaming Systems

Providing Secure Storage on the Internet

CSE 486/586 Distributed Systems Consistency --- 3

Implementing Consistency -- Paxos

Introduction There are many situations in which we might use replicated data Let’s look at another, different one And design a system to work well in that.

Distributed P2P File System

EECS 498 Introduction to Distributed Systems Fall 2017

Outline Announcements Fault Tolerance.

7.1. CONSISTENCY AND REPLICATION INTRODUCTION

Fault Tolerance Distributed Web-based Systems

CSE 486/586 Distributed Systems Consistency --- 1

Distributed File Systems

Consistency and Replication

Active replication for fault tolerance

PERSPECTIVES ON THE CAP THEOREM

EEC 688/788 Secure and Dependable Computing

Naman shah Harshil shah Priyank BambhrOLIA

Process Migration Troy Cogburn and Gilbert Podell-Blume

By: Greg Boyarko, Jordan Sutton, and Shaun Parkison

EEC 688/788 Secure and Dependable Computing

Cyber Physical Systems

IS 651: Distributed Systems Fault Tolerance

Lecture 21: Replication Control

EECS 498 Introduction to Distributed Systems Fall 2017

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

Distributed Systems CS

CONSISTENCY IN DISTRIBUTED SYSTEMS

Replica Placement Model: We consider objects (and don’t worry whether they contain just data or code, or both) Distinguish different processes: A process.

CMSC Cluster Computing Basics

Last Class: Web Caching

EEC 688/788 Secure and Dependable Computing

Distributed Systems (15-440)

Distributed Graph Algorithms

Lecture 21: Replication Control

Implementing Consistency -- Paxos

CSE 486/586 Distributed Systems Consistency --- 3

Dennis Kafura – CS5204 – Operating Systems

Presentation transcript:

Replication and Availability in Distributed Systems CS455 Introduction to Distributed Systems Department of Computer Science Colorado State University Sagar Reddy Bijjam Srinivas Reddy Kontham

Why is this Problem Important? We are producing more data than ever - 90% of all data ever produced generated in last two years We need to concurrently access this data We need quick data access We need high system availability

Problem Characterization Failures are inevitable, whether it may be hardware, programmatic. Two primary reasons for replication: Increasing reliability: – If a replica crashes, system can continue working by switching to other replicas. Improving performance: – Important for distributed systems over large geographical areas. – Divide the work over a number of servers. – Place data in the proximity of clients.

Trade-off space for solutions in this area The CAP principle: Strong consistency: system should be able to provide concurrent updates. High availability: any consumer of data can always reach some replica Partition resilience: the system can survive network partitions CAP: Strong Consistency, High Availability, Partition-resilience: Pick at most 2!

Dominant Approaches to the Problem(1/2) Pessimistic Algorithms – Guarantees Strong consistency at the cost of availability – Based On Quorum Consensus – Gifford’s Voting Protocol Optimistic Algorithms – Gives more importance to availability at the expense of consistency – Good technique if probability of conflicts is small – Coda replication

Dominant Approaches to the Problem(2/2) Highly Available Pessimistic Algorithm – Increased availability with Strong Consistency Dynamic Voting Protocol Network partitions and other failures can hurt fault-tolerance of static voting Dynamic voting can boost fault-tolerance by adapting – the number of votes assigned to various nodes – the set of nodes that can form read/write quorums

Insights Gleaned Replication is mainly used to improve the availability and performance But there exists several trade-offs in choosing the replication strategy Optimistic algorithms are best suitable where availability is main criteria. Whereas pessimistic algorithms are at there best in situations where availability is of less concern. So choose the best that fits the situation.

Problem Space in the Future Which files should be replicated? How many replicas should be created? Where the replicas should be placed? Which replica should be deleted if there is no enough space in data storage?

Future Trade-Off Space More uniform distribution of files among nodes – Reduces bottlenecks at nodes that get accessed more often – NP-hard (variant of knapsack problem) – Use best algorithm/heuristic combination for task Demand-based replication – Popular files have higher level of replication – Files distributed closer to nodes making most requests – Delete the file with the least demand.

THANK YOU!