Continuous Consistency and Availability Haifeng Yu CPS 212 Fall 2002.

Slides:



Advertisements
Similar presentations
Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.
Advertisements

The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway.
High throughput chain replication for read-mostly workloads
Teaser - Introduction to Distributed Computing
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Consensus Hao Li.
Authors Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, Abraham Flaxman Presented by: Jonathan di Costanzo & Muhammad Atif Qureshi 1.
Replica Control for Peer-to- Peer Storage Systems.
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.
Distributed Processing, Client/Server, and Clusters
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
1 A General Auction-Based Architecture for Resource Allocation Weidong Cui, Matthew C. Caesar, and Randy H. Katz EECS, UC Berkeley {wdc, mccaesar,
Overview Distributed vs. decentralized Why distributed databases
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Data Sharing in OSD Environment Dingshan He September 30, 2002.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Computer Science Lecture 14, page 1 CS677: Distributed OS Consistency and Replication Introduction Consistency models –Data-centric consistency models.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Web Caching Schemes For The Internet – cont. By Jia Wang.
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class:Consistency Semantics Consistency models –Data-centric consistency models –Client-centric.
SybilGuard: Defending Against Sybil Attacks via Social Networks Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman Presented by Ryan.
CS 603 Data Replication February 25, Data Replication: Why? Fault Tolerance –Hot backup –Catastrophic failure Performance –Parallelism –Decreased.
Practical Replication. Purposes of Replication Improve Availability Replicated databases can be accessed even if several replicas are unavailable Improve.
Anh Nguyen.  The emergence of continuous consistency model (CCM)  Conit-based CCM  What is it?  Policies  Examples  Discussion Break…  The composability.
An Efficient Topology-Adaptive Membership Protocol for Large- Scale Cluster-Based Services Jingyu Zhou * §, Lingkun Chu*, Tao Yang* § * Ask Jeeves §University.
Consistency and Replication CSCI 4780/6780. Chapter Outline Why replication? –Relations to reliability and scalability How to maintain consistency of.
Distributed Multimedia March 19, Distributed Multimedia What is Distributed Multimedia?  Large quantities of distributed data  Typically streamed.
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
1 Heterogeneity in Multi-Hop Wireless Networks Nitin H. Vaidya University of Illinois at Urbana-Champaign © 2003 Vaidya.
Replication March 16, Replication What is Replication?  A technique for increasing availability, fault tolerance and sometimes, performance 
Distributed Systems and Algorithms Sukumar Ghosh University of Iowa Spring 2011.
Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
CAP Theorem Justin DeBrabant CIS Advanced Systems - Fall 2013.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Distributed Systems CS Consistency and Replication – Part I Lecture 10, September 30, 2013 Mohammad Hammoud.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
CSE 486/586 Distributed Systems Consistency --- 3
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Distributed, Self-stabilizing Placement of Replicated Resources in Emerging Networks Bong-Jun Ko, Dan Rubenstein Presented by Jason Waddle.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
Relying on Safe Distance to Achieve Strong Partitionable Group Membership in Ad Hoc Networks Authors: Q. Huang, C. Julien, G. Roman Presented By: Jeff.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
The Top 10 Reasons Why Federated Can’t Succeed
Consistency and Replication
Fault-tolerance techniques RSM, Paxos
PERSPECTIVES ON THE CAP THEOREM
Replication and Availability in Distributed Systems
Dissemination of Dynamic Data on the Internet
Distributed Systems CS
Transaction Properties: ACID vs. BASE
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Implementing Consistency -- Paxos
Sisi Duan Assistant Professor Information Systems
Distributed Systems and Algorithms
Presentation transcript:

Continuous Consistency and Availability Haifeng Yu CPS 212 Fall 2002

2 Consistency in Replication l Replication comes with consistency cost: l Reasons for replication: Better performance and availability client server l Replication transforms client-server communication to server-server communication: Decrease performance Decrease availability

3 Strong Consistency and Optimistic Consistency l Traditionally, two choices for consistency level: Strong consistency: Strictly “in sync” Optimistic consistency: No guarantee at all Associated tradeoffs with each model Availability / Performance / Scalability Consistency Optimistic Consistency Strong Consistency

4 Problems with Binary Choice l Strong consistency incurs prohibitive overheads for many WAN apps Replication may even decrease performance, availability and scalability relative to a single server! l Optimistic consistency provides no consistency guarantee at all Resulting in upset users: Unbounded reservation conflicts Potentially render the app unusable: If traffic data is more than 1 hour stale, probably of little use l Applications cannot tune consistency level based on its environment Need to adapt to client, service and network characteristics

5 Continuous Consistency l Consistency is continuous rather than binary for many WAN apps These apps can benefit from exploiting the consistency spectrum between strong and optimistic consistency. Availability / Performance / Scalability Consistency Optimistic Consistency Strong Consistency Consistency Continuous Consistency Availability / Performance / Scalability

6 Quantifying Consistency l Many ways: Staleness (TTL in web caching): Invalidate Limit number of locally buffered writes buffered updates To Other Replicas

7 Applications ? l Applications: Web caching Airline reservation Distributed games Shared editor l Non-Applications: Some scientific computing problems Banking system Any application that has binary output l Application’s nature determines whether continuous consistency is applicable

8 Trading Consistency for Performance l Airline reservation: running at Berkeley, Utah, Duke Strong Consistency Optimistic Consistency [Yu’02, TOCS]

9 The Cost of Increased Performance l Increased performance comes with a cost Adaptively trade consistency for performance based on client, network, and service conditions

10 Model vs. Protocol l Continuous consistency model is a spec. l Protocol is anything that can enforce the spec. Corollary: Strong consistency protocol is a protocol for any model l Many protocols for a specific model, some are good, others are not

11 Designing a Continuous Consistency Model l Model is a spec, thus quantifying consistency (in a bad way) is trivial l Only applications know its definition of consistency Airline reservation vs. distributed games l What is a “good” continuous consistency model? Can be used by diverse apps Practical

12 Distributed Consensus and Leader Election l What does “continuous consistency” mean ? Allow at most k decision values Allow at most k leaders l Helps overcome some impossibilities Unique decision value requires ½ majority K decision values allow any partition with 1/(k + 1) nodes to decide

13 Group Membership Service l Def: Keep track of which nodes belong to which group l Traditionally, group membership only maintain a single group Primary-partition membership services Corresponds to strong consistency l Recently, partitionable membership services Still active area of research Corresponds to optimistic consistency l Continuous consistency: Allow at most k groups Again, helps overcome the ½ majority limitation

14 Continuous Consistency Summary l WAN replication needs dynamically tunable consistency l Tradeoff between consistency and performance l How to design a continuous consistency model l Continuous consistency in other context l Next: Availability

15 What is Availability ? l No well-accepted availability metric for Internet services l “Uptime” metric can be misleading for Internet services Server may be inaccessible because of network partition l Available: “present or ready for immediate use” From Webster’s Collegiate Dictionary What does “immediate” mean? Time-out l Availability = (accepted accesses) / (submitted accesses) Implicit time-out in the definition

16 Perform-ability l User satisfaction is not binary What if a partial result is returned before time-out ? What if the result is sent back after an hour, or a day ? Availability is related to performance l Performability = reward function (quality and timeliness of result) l Determining reward function is hard !

17 Availability of an Internet Service l We use user-observed availability in our study: Availability = (accepted accesses) / (submitted accesses) Server client × 2% [Chandra et.al., USITS’01] reject due to server failure × 0.1% [MS press release,Jan’01]

18 Effects of Replication l Consistency may force a replica to reject an otherwise acceptable request Network Failure Rate Replica Rejection Rate client × < 2% × reject Replica reject × communication to maintain consistency failed > 0.1%

19 Limitations of Strong Consistency : Replicas : Clients Option 1: accept reads accept reads reject writes reject writes Option 2: accept reads reject reads accept writes reject writes

20 Effects of Continuous Consistency Option 1: accept reads accept reads reject writes reject writes New Option 1: accept reads accept reads accept first 10 writes accept first 5 writes allow replica to buffer 5 writes

21 Effects of Continuous Consistency Option 2: accept reads reject reads accept writes reject writes New Option 2: accept reads accept first few reads accept writes accept first 5 writes allow replica to buffer 5 writes

22 Consistency Impact is Inherent Availability Inconsistency Hard Bound 0% Consistency 100% Availability 100% Consistency l Hard bound always exist l We always know the to end points, but may not know the exact shape of the curve

23 Effects of Consistency Protocol l Achieved availability also depends on protocol Design better protocols Job of system designers Availability Inconsistency Upper Bound Protocol A Protocol B

24 Availability Optimizations l Technique should not be tied to model l Focus on two techniques: Retiring replicas Aggressive write propagation

25 Limitations of Strong Consistency : Replicas : Clients Option 1: accept reads accept reads reject writes reject writes Option 2: accept reads reject reads accept writes reject writes

26 Retiring Replicas l Obviously, such decision may not be optimal unless we have future knowledge Importance of prediction l Even with future knowledge, it is hard l In option 2, all replicas much reach an agreement Leader election We are experiencing partitions One option: Voting What if we don’t have majority?

27 Aggressive Write Propagation l Applicable to continuous consistency l Continuous consistency gives us “buffers” that can be utilized in case of network partition l Keep the buffer empty: Cannot predict the occurrence of network partitions Propagate writes more aggressively Cut down the amount of inconsistency accumulated in times of good connectivity

28 Effects of Aggressive Propagation l Baseline: Propagate writes only when necessary (lazily) l Aggressive: When necessary and every 3 seconds 8 replicas with measured faultload From [Yu’01, SOSP]

29 More Aggressive Propagation l Aggressive write propagation does not work in all cases l Availability optimizations can incur more communication Best availability achieved when we use a strong consistency protocol l Speaks of availability / performance tradeoffs

30 Availability of Other Systems l Consensus and leader election Blocks without majority l Group membership Blocks without majority l Relaxing consistency enables them to make progress Open Question: But will these systems still be useful ?

31 Availability Summary l Availability definition l Inherent impact of consistency on availability l Availability also depends on consistency protocols l Availability optimizations: Replica retirement Aggressive write propagation

32 Why can we easily approach the upper bound? l Simple protocols in our study can approach the upper bound closely Remember reaching the upper bound in general needs future knowledge l Related to the characteristics of the faultloads we measured and simulated Most partitions are singleton partitions Most transitions are: fully-connected → singleton partition → fully-connected l These characteristics are consistent with Internet hierarchical architecture

33 Dual Effects of Replication Scale on Availability l Consistency may force a replica to reject a request l Adding more replicas: Network Failure Rate Replica Rejection Rate l Availability = (1 - Network Failure Rate) * ( 1 - Rejection Rate) Too large or too small replication scale can hurt availability

34 Optimal Replication Scale l Optimal replication scale: Adding more replicas can hurt! Increase in “replica rejection rate” outweighs decrease in “network failure rate” l Optimal replication scale depends on Consistency level Network failure rate among replicas