Distributed Systems CS 15-440 Consistency and Replication – Part II Lecture 11, Oct 2, 2013 Mohammad Hammoud.

Slides:



Advertisements
Similar presentations
CS 542: Topics in Distributed Systems Diganta Goswami.
Advertisements

COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 7: Consistency 4/13/20151Distributed Systems - COMP 655.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Consistency and Replication Chapter Introduction: replication and scalability 6.2 Data-Centric Consistency Models 6.3 Client-Centric Consistency.
Consistency and Replication Chapter Topics Reasons for Replication Models of Consistency –Data-centric consistency models: strict, linearizable,
Consistency and Replication
Consistency and Replication Chapter 6. Object Replication (1) Organization of a distributed remote object shared by two different clients.
Replication and Consistency Chapter 6. Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated.
Consistency and Replication. Replication of data Why? To enhance reliability To improve performance in a large scale system Replicas must be consistent.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems CS Consistency and Replication – Part II Lecture 11, Oct 10, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Distributed Systems Spring 2009
Distributed Systems Spring 2009
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Distributed Systems CS Consistency and Replication – Part I Lecture 10, Oct 5, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Computer Science Lecture 14, page 1 CS677: Distributed OS Consistency and Replication Today: –Introduction –Consistency models Data-centric consistency.
Distributed Systems CS Google Chubby and Message Ordering Recitation 4, Sep 29, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Distributed Systems CS Synchronization – Part II Lecture 8, Sep 28, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Replication and Consistency CS-4513 D-term Replication and Consistency CS-4513 Distributed Computing Systems (Slides include materials from Operating.
Computer Science Lecture 14, page 1 CS677: Distributed OS Consistency and Replication Introduction Consistency models –Data-centric consistency models.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Consistency.
Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li.
Consistency. Consistency model: –A constraint on the system state observable by applications Examples: –Local/disk memory : –Database: What is consistency?
Distributed Systems CS Programming Models- Part V Replication and Consistency- Part I Lecture 18, Oct 29, 2014 Mohammad Hammoud 1.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Anh Nguyen.  The emergence of continuous consistency model (CCM)  Conit-based CCM  What is it?  Policies  Examples  Discussion Break…  The composability.
Consistency And Replication
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Consistency.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
Consistency and Replication. Replication of data Why? To enhance reliability To improve performance in a large scale system Replicas must be consistent.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Consistency and Replication Distributed Software Systems.
Synchronization. Why we need synchronization? It is important that multiple processes do not access shared resources simultaneously. Synchronization in.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Consistency.
第5讲 一致性与复制 §5.1 副本管理 Replica Management §5.2 一致性模型 Consistency Models
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Distributed Systems CS Consistency and Replication – Part IV Lecture 21, Nov 10, 2014 Mohammad Hammoud.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Distributed Systems CS Consistency and Replication – Part I Lecture 10, September 30, 2013 Mohammad Hammoud.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Consistency and Replication. Outline Introduction (what’s it all about) Data-centric consistency Client-centric consistency Replica management Consistency.
Distributed Systems CS Consistency and Replication – Part IV Lecture 13, Oct 23, 2013 Mohammad Hammoud.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
Distributed Systems CS Consistency and Replication – Part III Lecture 13, Oct 26, 2015 Mohammad Hammoud.
Distributed Systems CS Consistency and Replication – Part I Lecture 11, Oct 19, 2015 Mohammad Hammoud.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
6.2 Logical Clocks Kranthi Koya09/23/2015. Overview Introduction Lamport’s Logical Clocks Vector Clocks Research Areas Conclusion.
Distributed Systems CS Consistency and Replication – Part IV Lecture 14, Oct 28, 2015 Mohammad Hammoud.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Consistency and Replication CSCI 6900/4900. FIFO Consistency Relaxes the constraints of the causal consistency “Writes done by a single process are seen.
Consistency and Replication (1). Topics Why Replication? Consistency Models – How do we reason about the consistency of the “global state”? u Data-centric.

Distributed Systems CS
Consistency and Replication
Consistency and Replication
Consistency Models.
Distributed Systems CS
Distributed Systems CS
DATA CENTRIC CONSISTENCY MODELS
Distributed Systems CS
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Distributed Systems CS
Distributed Systems CS
Distributed Systems CS
Replication and Consistency
Presentation transcript:

Distributed Systems CS Consistency and Replication – Part II Lecture 11, Oct 2, 2013 Mohammad Hammoud

Today…  Last Session:  Synchronization: Mutual Exclusion and Election Algorithms  Consistency and Replication: Introduction  Today’s Session:  Consistency and Replication  Data-Centric Consistency Models  Announcements:  P2 design report is due on Oct 7, 2013  Midterm exam is on Wednesday Oct 9, 2013  Next session (on Monday Oct 7) will be an overview session  PS2 is due on Oct 12, 2013  Eid Al-Adha Break from Oct (NO CLASSES) 2

Why Consistency? In a DS with replicated data, one of the main problems is keeping the data consistent An example: In an e-commerce application, the bank database has been replicated across two servers Maintaining consistency of replicated data is a challenge 3 Bal=1000 Replicated Database Event 1 = Add $1000 Event 2 = Add interest of 5% Bal= Bal= Bal= Bal=2100

Overview of Consistency and Replication Consistency Models Data-Centric Consistency Models Client-Centric Consistency Models Replica Management When, where and by whom replicas should be placed? Which consistency model to use for keeping replicas consistent? Consistency Protocols We study various implementations of consistency models 4 Next lectures Today’s lecture

Overview Consistency Models Data-Centric Consistency Models Client-Centric Consistency Models Replica Management Consistency Protocols 5

Introduction to Consistency and Replication In a distributed system, shared data is typically stored in distributed shared memory, distributed databases or distributed file systems The storage can be distributed across multiple computers Simply, we refer to a series of such data storage units as data-stores Multiple processes can access shared data by accessing any replica on the data-store Processes generally perform read and write operations on the replicas Process 1 Process 2 Process 3 Local Copy Distributed data-store

Maintaining Consistency of Replicated Data 7 x=0 Replica 1Replica 2Replica 3Replica n Process 1 Process 2 Process 3 R(x)b =Read variable x; Result is b W(x)b = Write variable x; Result is b P1 =Process P1=Timeline at P1 R(x)0 W(x)2 x=2 R(x)?R(x)2 W(x)5 R(x)?R(x)5 x=5 DATA-STORE Strict Consistency Data is always fresh After a write operation, the update is propagated to all the replicas A read operation will result in reading the most recent write If there are occasional writes and reads, this leads to large overheads

Maintaining Consistency of Replicated Data (Cont’d) 8 x=0 Replica 1Replica 2Replica 3Replica n Process 1 Process 2 Process 3 R(x)b =Read variable x; Result is b W(x)b = Write variable x; Result is b P1 =Process P1=Timeline at P1 R(x)0 R(x)5 W(x)2 x=2 R(x)?R(x)3 W(x)5 R(x)?R(x)5 x=0x=5x=3 DATA-STORE Loose Consistency Data might be stale A read operation may result in reading a value that was written long back Replicas are generally out-of-sync The replicas may sync at coarse grained time, thus reducing the overhead

Trade-offs in Maintaining Consistency Maintaining consistency should balance between the strictness of consistency versus efficiency Good-enough consistency depends on your application 9 Strict Consistency Generally hard to implement, and is inefficient Loose Consistency Easier to implement, and is efficient

Consistency Model A consistency model is a contract between the process that wants to use the data, and the replicated data repository (or data-store) A consistency model states the level of consistency provided by the data-store to the processes while reading and writing the data 10

Types of Consistency Models Consistency models can be divided into two types: Data-Centric Consistency Models These models define how the data updates are propagated across the replicas to keep them consistent Client-Centric Consistency Models These models assume that clients connect to different replicas at different times The models ensure that whenever a client connects to a replica, the replica is brought up-to-date with the replica that the client accessed previously 11

Overview Consistency Models Data-Centric Consistency Models Client-Centric Consistency Models Replica Management Consistency Protocols 12

Data-centric Consistency Models Data-centric Consistency Models describe how the replicated data is kept consistent, and what the processes can expect Under Data-centric Consistency Models, we study two types of models: Consistency Specification Models: These models enable specifying the consistency levels that can be tolerated by the application Models for Consistent Ordering of Operations: These models specify the order in which the data updates are propagated to different replicas 13

Overview Consistency Models Data-Centric Consistency Models Consistency Specification Models Models for Consistent Ordering of Operations Client-Centric Consistency Models Replica Management Consistency Protocols 14

Consistency Specification Models In replicated data-stores, there should be a mechanism to: Measure how inconsistent the data might be on different replicas How replicas and applications can specify the tolerable inconsistency levels Consistency Specification Models enable measuring and specifying the level of inconsistency in a replicated data-store We study a Consistency Specification Model called Continuous Consistency Model 15

Continuous Consistency Model Continuous Consistency Model is used to measure inconsistencies and express what inconsistencies can be expected in the system Yu and Vahdat [1] provided a framework for measuring and expressing consistency in replicated data-stores 16

Continuous Consistency Ranges 17 Level of consistency is defined over three independent axes: Numerical Deviation: Deviation in the numerical values between replicas Order Deviation: Deviation with respect to the ordering of update operations Staleness Deviation: Deviation in the staleness between replicas Numerical Deviation Staleness Deviation Ordering Deviation Example: Two copies a stock price should not deviate by more than $0.02 Example: Weather data should not be more than four hours stale Example: In a bulletin board application, a maximum of six messages can be issued out-of-order

Consistency Unit (Conit) Consistency unit (Conit) specifies the data unit over which consistency is measured For example, conit can be defined as a record representing a single stock Level of consistency is measured by each replica along the three dimensions Numerical Deviation For a given replica R, how many updates at other replicas are not yet seen at R? What is the effect of the non-propagated updates on local Conit values? Order Deviation For a given replica R, how many local updates are not propagated to other replicas? Staleness Deviation For a given replica R, how long has it been since updates were propagated? 18

Numerical Deviation at replica R is defined as n(w), where n = # of operations at other replicas that are not yet seen by R, w = weight of the deviation = max(update amount of all variables in a Conit) Replica AReplica B 19 Example of Conit and Consistency Measures x; y x+=2x=2 Operation Result y+=1y=1 x+=1x=3 y+=3y=4 x; y x+=2x=2 Operation Result y+=1y=1 Order Deviation at a replica R is the number of operations in R that are not present at the other replicas Replica AReplica B xyVCOrdNumxyVCOrdNum 00(0,0)00(0)00(0,0)00(0) 00 (0,0) 01(2)20(0,5)10(0) 20 (1,5) 00(0) 20(0,5)00(0) 21 (10,5) 10(0) 20(0,5)01(1) 21(0,16)11(1) 21 (10,5) 11(1) 31 (14,5) 21(1) 21(0,16)12(2) Operation performed at B when the vector clock was 5 = = Committed operation x;y = A Conit = Uncommitted operation 34 (23,5) 31(1) 21(0,16)13(5)

Overview Consistency Models Data-Centric Consistency Models Continuous Specification Models Models for Consistent Ordering of Operations Client-Centric Consistency Models Replica Management Consistency Protocols 20

Why is Consistent Ordering Required in Replication? In several applications, the order or the sequence in which the replicas commit to the data-store is critical Example: Continuous Specification Models define how inconsistency is measured However, the models do not provide any indication about the order in which the data are committed 21 Bal=1000 Replicated Databases Event 1 = Add $1000 Event 2 = Add interest of 5% Bal= Bal= Bal= Bal=2100

Consistent Ordering of Operations Besides continuous consistency, we need to express the semantics of parallel accesses when shared resources are replicated Before updates at replicas are committed, all replicas shall reach an agreement on a global ordering of the updates Replicas in shared data-stores should agree on a consistent ordering of updates What consistent ordering of updates can replicas agree on? 22

Types of Ordering  We will study three types of orderings, which can be utilized by consistency models to agree upon and meet the needs of different applications: 1.Total Ordering 2.Sequential Ordering i.Sequential Consistency Model 3.Causal Ordering i.Causal Consistency Model 23

Types of Ordering 1.Total Ordering 2.Sequential Ordering 3.Causal Ordering 24

Total Ordering Total Order If process P i sends a message m i and P j sends m j, and if one correct process delivers m i before m j then every correct process delivers m i before m j Messages can refer to replica updates In the example Ex1, if P 1 issues the operation m (1,1) : x=x+1; and If P 3 issues m (3,1) : print(x); and P 1 or P 2 or P 3 delivers m (3,1) before m (1,1) Then, at all replicas P 1, P 2, P 3 the following order of operations are executed print(x); x=x+1; m (1,1) P1 P2 P3 m (3,1) Ex1: Total Order m (1,1) P1 P2 P3 m (3,1) Ex2: Not in Total Order

Types of Ordering 1.Total Ordering 2.Sequential Ordering 3.Causal Ordering 26

Sequential Ordering If a process Pi sends a sequence of messages m (i,1),...., m (i,ni), and Process Pj sends a sequence of messages m (j,1),...., m (j,nj), Then: At any process, the set of messages received are in some sequential order Messages from each individual process appear in this sequence in the order sent by the sender At every process, m i,1 should be delivered before m i,2, which is delivered before m i,3 and so on... At every process, m j,1 should be delivered before m j,2, which is delivered before m j,3 and so on... m(1,1) P1 P2 P3 m(3,1) m(3,2) Valid Sequential Orders m(1,2) m(3,3) m(1,1) P1 P2 P3 m(3,1) m(3,2) Invalid Sequential Orders, but Valid Total Order m(1,2) m(3,3)

28 Sequential Consistency Model The Sequential Consistency Model entails that all update operations are executed at replicas in a sequential order Consider a data-store with variable x (Initialized to NULL ) In the two data-stores below, identify the sequentially consistent data-store P1 P2 P3 P4 W(x)a W(x)b R(x)b R(x)a R(x)b P1 P2 P3 P4 W(x)a W(x)b R(x)b R(x)a R(x)b =Read variable x; Result is b W(x)b = Write variable x; Result is b P1 =Process P1 (a) Results while operating on DATA-STORE-1(b) Results while operating on DATA-STORE-2 =Timeline at P1

Sequential Consistency (Cont’d) Consider three processes P 1, P 2 and P 3 executing multiple instructions on three shared variables x, y and z Assume that x, y and z are set to zero at start There are many valid sequences in which operations can be executed at the replica respecting sequential consistency Identify the output 29 P1 x = 1 print (y,z) P2 y = 1 print (x,z) P3 z = 1 print (x,y) x = 1 print (y,z) y = 1 print (x,z) z = 1 print (x,y) Output x = 1 y = 1 print (x,z) print (y,z) z = 1 print (x,y) z = 1 print (x,y) print (x,z) y = 1 x = 1 print (y,z) y = 1 z = 1 print (x,y) print (x,z) x = 1 print (y,z)

Implications of Adopting A Sequential Consistency Model for Applications There might be several different sequentially consistent combinations of ordering Number of combinations for a total of n instructions = The contract between the process and the distributed data-store is that the process must accept all of the sequential orderings as valid results A process that works for some of the sequential orderings and does not work correctly for others is INCORRECT 30

Types of Ordering 1.Total Ordering 2.Sequential Ordering 3.Causal Ordering 31

Causality (Recap) Causal relation between two events If a and b are two events such that a happened- before b (i.e., a  b), and If the (logical) times when events a and b occur at a process P i are denoted as C i (a) and C i (b) Then, if we can infer that a  b by observing that C i (a) < C i (b), then a and b are causally related Causality can be implemented using Vector Clocks 32

Causal vs. Concurrent events Consider an interaction between processes P 1 and P 2 operating on replicated data x and y 33 P1 P2 W(x)a R(x)a P1 P2 W(x)a W(y)b R(x)b =Read variable x; Result is b W(x)b = Write variable x; Result is b P1 =Process P1=Timeline at P1 W(y)b Events are causally related Events are not concurrent Computation of y at P 2 may have depended on value of x written by P 1 Events are not causally related Events are concurrent Computation of y at P 2 does not depend on the value of x written by P 1 R(x)a

Causal Ordering Causal Order If process P i sends a message m i and P j sends m j, and if m i  m j (operator ‘  ’ is Lamport’s happened-before relation) then any correct process that delivers m j will deliver m i before m j In Ex1: m (1,1) and m (3,1) are in Causal Order and m (1,1) and m (1,2) are in Causal Order m (1,1) m (1,2) P1 P2 P3 m (3,1) Ex1: Valid Causal Orders m (1,1) m (1,2) P1 P2 P3 m (3,1) Invalid Causal Order

Causal Consistency Model A data-store is causally consistent if: Writes that are potentially causally related must be seen by all the processes in the same order Concurrent writes may be seen in a different order on different machines 35

Example of a Causally Consistent Data-store 36 P1 P2 P3 P4 W(x)a R(x)a R(x)b =Read variable x; Result is b W(x)b = Write variable x; Result is b P1 =Process P1=Timeline at P1 W(x)b W(x)c R(x)cR(x)b R(x)c A Causally Consistent Data-Store But not a Sequentially Consistent Data-Store Concurrent writes Causal writes

Implications of adopting a Causally Consistent Data-store for Applications Processes have to keep track of which processes have seen which writes This requires maintaining a dependency graph between write and read operations Vector clocks provides a way to maintain causally consistent data-base 37

Topics Covered in Data-centric Consistency Models 38 Data-centric Consistency Models Models for Specifying Consistency Continuous Consistency Model Models for Consistent Ordering of Operations Sequential Consistency Model Causal Consistency Model But, is Data-Centric Consistency Model good for all applications?

Applications that Can Use Data-centric Models Data-centric models are applicable when many processes are concurrently updating the data-store But, do all applications need all replicas to be consistent? 39 Webpage-A Event: Update Webpage-A Webpage-A Data-Centric Consistency Model is too strict when One client process updates the data Other processes read the data, and are OK with reasonably stale data

Summary of Data-Centric Consistency Models 40 These models allow measuring and specifying the consistency levels that are tolerable to the application Data-centric Consistency Models Models for Specifying Consistency Continuous Consistency Model Models for Consistent Ordering of Operations Sequential Consistency Model Causal Consistency Model These models specify what ordering of operations are ensured at the replicas Data-centric consistency models describe how the replicated data is kept consistent across different data-stores, and what a process can expect from the data-store Data-centric models are too strict when: Most operations are read operations Updates are generally triggered from one client process

Next Classes Consistency Models Client-Centric Consistency Models Replica Management Replica management studies: when, where and by whom replicas should be placed which consistency model to use for keeping replicas consistent Consistency Protocols We study various implementations of consistency models 41

References [1] Haifeng Yu and Amin Vahdat, “Design and evaluation of a conit-based continuous consistency model for replicated services” [2] load-on-the-web/ [3] [4] [5] [6] [7] 42