Distributed Systems CS

Slides:



Advertisements
Similar presentations
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 7: Consistency 4/13/20151Distributed Systems - COMP 655.
Advertisements

Distributed Systems CS
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems CS Consistency and Replication – Part II Lecture 11, Oct 10, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Distributed Systems CS Consistency and Replication – Part III Lecture 12, Oct 12, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Distributed Systems CS Consistency and Replication – Part I Lecture 10, Oct 5, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Client-Centric.
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class:Consistency Semantics Consistency models –Data-centric consistency models –Client-centric.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
1 6.4 Distribution Protocols Different ways of propagating/distributing updates to replicas, independent of the consistency model. First design issue.
Consistency and Replication CSCI 4780/6780. Chapter Outline Why replication? –Relations to reliability and scalability How to maintain consistency of.
Consistency And Replication
Distributed Systems CS Consistency and Replication – Part II Lecture 11, Oct 2, 2013 Mohammad Hammoud.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Distributed Systems CS /640 Programming Models Borrowed and adapted from our good friends at CMU-Doha, Qatar Majd F. Sakr, Mohammad Hammoud andVinay.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Distributed Systems CS Consistency and Replication – Part IV Lecture 21, Nov 10, 2014 Mohammad Hammoud.
Distributed Systems CS Consistency and Replication – Part I Lecture 10, September 30, 2013 Mohammad Hammoud.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Distributed Systems CS Consistency and Replication – Part IV Lecture 13, Oct 23, 2013 Mohammad Hammoud.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Distributed Systems CS Consistency and Replication – Part III Lecture 13, Oct 26, 2015 Mohammad Hammoud.
Distributed Systems CS Consistency and Replication – Part I Lecture 11, Oct 19, 2015 Mohammad Hammoud.
Distributed Systems CS Consistency and Replication – Part IV Lecture 14, Oct 28, 2015 Mohammad Hammoud.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
CS6320 – Performance L. Grewe.
Distributed Systems CS
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Multiprocessor Cache Coherency
Ivy Eva Wu.
CSI 400/500 Operating Systems Spring 2009
Chapter 16: Distributed System Structures
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
CMSC 611: Advanced Computer Architecture
Distributed Systems CS
Distributed Systems CS
The University of Adelaide, School of Computer Science
Distributed Systems CS
Distributed Systems CS
Outline Midterm results summary Distributed file systems – continued
Consistency and Replication
Distributed Systems CS
Distributed Systems CS
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
Lecture 21: Replication Control
High Performance Computing
CS510 - Portland State University
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Distributed Systems CS
Distributed Systems CS
Replica Placement Model: We consider objects (and don’t worry whether they contain just data or code, or both) Distinguish different processes: A process.
Chapter 4 Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
THE GOOGLE FILE SYSTEM.
Last Class: Web Caching
Database System Architectures
Lecture 17 Multiprocessors and Thread-Level Parallelism
Distributed Systems (15-440)
The University of Adelaide, School of Computer Science
Lecture 21: Replication Control
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Distributed Systems CS 15-440 Consistency and Replication – Part III Lecture 14, Oct 24, 2016 Mohammad Hammoud

Today… Last Session Consistency and Replication – Part II Client-Centric Consistency Models Today’s Session Consistency and Replication – Part III Replica Management & Consistency Protocols Programming Models (Intro) Announcements: P3 will be posted by tomorrow. It is due on Nov 14 Your virtual clusters are ready- we will show you how to use them during the recitation

Overview Consistency Models Replica Management Consistency Protocols Data-centric Consistency Models Client-centric Consistency Models Replica Management Replica Server Placement Content Replication and Placement Consistency Protocols Primary-based protocols Replicated-write protocols Cache-coherence protocols

Summary of Consistency Models Data-centric Models for Specifying Consistency Continuous Consistency Model Models for Consistent Ordering of Operations Sequential Consistency Model Causal Consistency Model Client-centric Eventual Consistency Client Consistency Guarantees Monotonic Reads Read your writes Write follow reads

Overview Consistency Models Replica Management Consistency Protocols Data-centric Consistency Models Client-centric Consistency Models Replica Management Replica Server Placement Content Replication and Placement Consistency Protocols Primary-based protocols Replicated-write protocols Cache-coherence protocols

Replica Management Replica management describes where, when and by whom replicas should be placed We will study two problems under replica management Replica-Server Placement Decides the best locations to place the replica servers that can host data-stores Content Replication and Placement Finds the best server for placing the contents

Overview Consistency Models Replica Management Consistency Protocols Data-centric Consistency Models Client-centric Consistency Models Replica Management Replica Server Placement Content Replication and Placement Consistency Protocols Primary-based protocols Replicated-write protocols Cache-coherence protocols

Replica Server Placement Factors that affect placement of replica servers: What are the possible locations where servers can be placed? Should we place replica servers close-by or distribute them uniformly? How many replica servers can be placed? What are the trade-offs between placing many replica servers vs. few? How many clients are accessing the data from a location? More replicas at locations where most clients access improves performance and fault-tolerance If K replicas have to be placed out of N possible locations, find the best K out of N locations(K<N)

Replica Server Placement – An Example Approach Problem: K replica servers should be placed on some of the N possible replica sites such that Clients have low-latency/high-bandwidth connections A possible Greedy Approach: C=100 Evaluate the cost of placing a replica on each of the N potential sites Examining the cost of C clients connecting to the replica Cost of a link can be 1/bandwidth or latency Choose the lowest-cost site Do the same every time you need to add a replica site for a new group or old sub-group of clients (could lead to sub-optimal locations) R1 C=40 R2 R2 C=90 R4 C=60 R3 R3

Overview Consistency Models Replica Management Consistency Protocols Data-centric Consistency Models Client-centric Consistency Models Replica Management Replica Server Placement Content Replication and Placement Consistency Protocols Primary-based protocols Replicated-write protocols Cache-coherence protocols

Content Replication and Placement In addition to the server placement, it is important to know: How, when and by whom different data items (contents) are placed on possible replica servers Identify how webpage replicas are replicated: Primary Servers in an organization Replica Servers on external hosting sites Permanent Replicas Server-initiated Replicas Client-initiated Replicas

Logical Organization of Replicas Permanent Replicas Server-Initiated Replicas Client-initiated Replicas Clients Server-initiated Replication Client-initiated Replication

1. Permanent Replicas Permanent replicas are the initial set of replicas that constitute a distributed data-store They are typically small in number There can be two types of permanent replicas: Primary replicas One or more servers in an organization Whenever a request arrives, it is forwarded into one of the primary replicas Mirror sites Geographically spread, and replicas are generally statically configured Clients pick one of the mirror sites to download the data

2. Server-initiated Replicas A third party (provider) owns the secondary replica servers, and they provide hosting service The provider has a collection of servers across the Internet The hosting service dynamically replicates files on different servers E.g., Based on the popularity of a file in a region The permanent server chooses to host the data item on different secondary replica servers The scheme is efficient when updates are rare Examples of Server-initiated Replicas Replicas in Content Delivery Networks (CDNs)

Dynamic Replication in Server-initiated Replicas Dynamic replication at secondary servers: Helps to reduce the server load and improve client performance But, replicas have to dynamically push the updates to other replicas Rabinovich et al. [3] proposed a distributed scheme for replication: Each server keeps track of: The closest server to any requesting client The number of requests per file per closest server E.g., each server Q keeps track of cntQ(P,F) which denotes how many requests arrived at Q which are closer to server P (for a file F) If cntQ(P,F) > 0.5 * cntQ(Q,F) Request P to replicate a copy of file F If cntP(P,F) < LOWER_BOUND Delete the file at replica Q If some other server is nearer to the clients, request replication over that server If the replication is not popular, delete the replica

3. Client-initiated Replicas Client-initiated replicas are known as client caches Client caches are used only to reduce the access latency of data e.g., Browser caching a web-page locally Typically, managing a cache is entirely the responsibility of a client Occasionally, data-store may inform client when the replica has become stale

Summary of Replica Management Replica management deals with placement of servers and content for improving performance and fault-tolerance Replica Management Permanent Replicas Server Initiated Replicas Client Initiated Replicas So far, we know: The required consistency models for applications How to place replica servers and content What else do we need to provide consistency in a distributed system?

Overview Consistency Models Replica Management Consistency Protocols Data-centric Consistency Models Client-centric Consistency Models Replica Management Replica Server Placement Content Replication and Placement Consistency Protocols Primary-based protocols Replicated-write protocols Cache-coherence protocols

Consistency Protocols A consistency protocol describes the implementation of a specific consistency model We are going to study three consistency protocols: Primary-based protocols One primary coordinator is elected to control replication across multiple replicas Replicated-write protocols Multiple replicas coordinate together to provide consistency guarantees Cache-coherence protocols A special case of client-controlled replication

Overview of Consistency Protocols Primary-based Protocols Replicated-Write Protocols Cache Coherence Protocols

Primary-based protocols In Primary-based protocols, a simple centralized design is used to implement consistency models Each data-item x has an associated “Primary Replica” The primary replica is responsible for coordinating write operations We will study one example of Primary-based protocols that implements Sequential Consistency Model Remote-Write Protocol When the consistency models become complex, designing distributed consistency protocols are difficult For the ease of development, simple protocols are often widely used

Remote-Write Protocol Rules: All write operations are forwarded to the primary replica Read operations are carried out locally at each replica Approach for write ops: (Budhiraja et al. 1993) Client connects to some replica RC If the client issues write operation to RC: RC forwards the request to the primary replica RP RP updates its local value RP forwards the update to other replicas Ri Other replicas Ri update, and send an ACK back to RP After RP receives all ACKs, it informs RC that the write operation is completed RC acknowledges the client, which in return completes the write operation x+=5 Client 1 Primary server R1 R2 R3 x1=0 x1=5 x2=0 x2=5 x3=0 x3=5 Data-store

Remote-Write Protocol – Discussion The Remote-Write protocol provides A simple way to implement sequential consistency Guarantees that clients see the most recent write operations However, latency is high in Remote-Write Protocols Clients block until all the replicas are updated Can a non-blocking strategy be applied? Remote-Write Protocols are applied to distributed databases and file systems that require fault-tolerance Replicas are placed on the same LAN to reduce latency

Overview of Consistency Protocols Primary-based Protocols Remote-Write Protocol Replicated-Write Protocols Cache Coherence Protocols

Replicated-Write Protocol In a replicated-write protocol, updates can be carried out at multiple replicas We will study one example on replicated-write protocols called Active Replication Protocol Here, clients write at any replica The modified replica will propagate updates to other replicas

Active Replication Protocol When a client writes at a replica, the replica will send the write operation updates to all other replicas Challenges with Active Replication Ordering of operations cannot be guaranteed across the replicas x+=2 x*=3 Client 1 Client 2 W(x) R2 R3 R1 R(x)2 R(x)0 R(x)6 x+=2 x*=3 R1 R2 R3 x1=0 x1=6 x1=2 x2=6 x2=0 x2=2 x3=0 x3=2 x3=6 Data-store

Centralized Active Replication Protocol Approach There is a centralized coordinator called the sequencer (Seq) When a client connects to a replica RC and issues a write operation RC forwards the update to the Seq Seq assigns a sequence number to the update operation RC propagates the sequence number and the operation to other replicas Operations are carried out at all the replicas in the order defined by the sequencer x+=5 x-=2 Client 1 Client 2 10 Seq R1 R2 11 R3 10 x+=5 11 x-=2 Data-store

Overview of Consistency Protocols Primary-based Protocols Remote-Write Protocols Replicated-Write Protocols Active Replication Protocol Cache Coherence Protocols

Cache Coherence Protocols Caches are special types of replicas Typically, caches are client-controlled replicas Cache coherence refers to the consistency of data stored in caches How are the cache coherence protocols in shared-memory multiprocessor (SMP) systems different from those in Distributed Systems? Coherence protocols in SMP assume cache states can be broadcasted efficiently In DS, this is difficult because caches may reside on different machines

Cache Coherence Protocols (Cont’d) Cache Coherence protocols determine how caches are kept consistent Caches may become inconsistent when a data item is modified: at the server replicas, or at the cache

When Data is Modified at the Server Two approaches for enforcing coherence: Server-initiated invalidation Here, server sends all caches an invalidation message (when data item is modified) Server updates the cache Server will propagate the update to the cache

When Data is Modified at the Cache The enforcement protocol may use one of three techniques: Read-only cache The cache does not modify the data in the cache The update is propagated to the server replica Write-through cache Directly modify the cache, and forward the update to the server Write-back cache The client allows multiple writes to take place at the cache The client batches a set of writes, and will send the batched write updates to the server replica

Summary of Consistency Protocols Primary-based Protocols Remote-Write Protocols Replicated-Write Protocols Active Replication Protocol Cache Coherence Protocols Coherence Enforcement Strategies

Consistency and Replication – A Very Brief Summary Replication improves performance and fault-tolerance However, replicas have to be kept reasonably consistent A contract between the data-store and processes Types: Data-centric and Client-centric Consistency Models Describes where, when and by whom replicas should be placed Types: Replica Server Placement, Content Replication and Placement Replication Management Implement Consistency Models Types: Primary-based, Replicated-Write, Cache Coherence Consistency Protocols

New Topic + Programming Models- Part I

Discussion on Programming Models Objectives Discussion on Programming Models MapReduce, Pregel and GraphLab Message Passing Interface (MPI) Types of Parallel Programs Traditional models of parallel programming Parallel computer architectures Why parallelism? Why parallelism?

Amdahl’s Law We parallelize our programs in order to run them faster How much faster will a parallel program run? Suppose that the sequential execution of a program takes T1 time units and the parallel execution on p processors takes Tp time units Suppose that out of the entire execution of the program, s fraction of it is not parallelizable while 1-s fraction is parallelizable Then the speedup (Amdahl’s formula):  

Amdahl’s Law: An Example Suppose that 80% of your program can be parallelized and that you use 4 processors to run your parallel version of the program The speedup you can get according to Amdahl’s law is: Although you use 4 processors you cannot get a speedup more than 2.5 times!  

Real Vs. Actual Cases Amdahl’s argument is too simplified to be applied to real cases When we run a parallel program, there are a communication overhead and a workload imbalance among processes (in general) 20 80 20 80 Serial Serial Parallel 20 20 Parallel 20 20 Process 1 Process 1 Process 2 Process 2 Cannot be parallelized Process 3 Process 3 Cannot be parallelized Can be parallelized Communication overhead Process 4 Can be parallelized Process 4 Load Unbalance 1. Parallel Speed-up: An Ideal Case 2. Parallel Speed-up: An Actual Case

Guidelines In order to efficiently benefit from parallelization, we can follow some guidelines like: Maximize the fraction of your program that can be parallelized Balance the workload of parallel processes Minimize the time spent for communication

Next Class Continue with Programming Models