1 CS 194: Elections, Exclusion and Transactions Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.

Slides:



Advertisements
Similar presentations
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Advertisements

Global States.
Database Systems (資料庫系統)
1 Concurrency Control Chapter Conflict Serializable Schedules  Two actions are in conflict if  they operate on the same DB item,  they belong.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
1 Chapter 3. Synchronization. STEMPusan National University STEM-PNU 2 Synchronization in Distributed Systems Synchronization in a single machine Same.
Concurrency Control and Recovery In real life: users access the database concurrently, and systems crash. Concurrent access to the database also improves.
CS 582 / CMPE 481 Distributed Systems
Ordering and Consistent Cuts Presented By Biswanath Panda.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.
CS 582 / CMPE 481 Distributed Systems Concurrency Control.
CS 582 / CMPE 481 Distributed Systems
Synchronization. Physical Clocks Solar Physical Clocks Cesium Clocks International Atomic Time Universal Coordinate Time (UTC) Clock Synchronization Algorithms.
Transaction Management and Concurrency Control
Computer Science Lecture 11, page 1 CS677: Distributed OS Last Class: Clock Synchronization Logical clocks Vector clocks Global state.
Synchronization Part 2 REK’s adaptation of Claypool’s adaptation ofTanenbaum’s Distributed Systems Chapter 5 and Silberschatz Chapter 17.
Transaction Management
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
1 CS194: Clocks and Synchronization Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
Synchronization Chapter 5. Clock Synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
© 1997 UW CSE 11/13/97N-1 Concurrency Control Chapter 18.1, 18.2, 18.5, 18.7.
Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm.
Synchronization.
TRANSACTIONS AND CONCURRENCY CONTROL Sadhna Kumari.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 6 Synchronization.
EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
MUTUAL EXCLUSION AND QUORUMS CS Distributed Mutual Exclusion Given a set of processes and a single resource, develop a protocol to ensure exclusive.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
Real-Time & MultiMedia Lab Synchronization Chapter 5.
1 Mutual Exclusion: A Centralized Algorithm a)Process 1 asks the coordinator for permission to enter a critical region. Permission is granted b)Process.
Concurrency Server accesses data on behalf of client – series of operations is a transaction – transactions are atomic Several clients may invoke transactions.
Transactions and Concurrency Control Distribuerade Informationssystem, 1DT060, HT 2013 Adapted from, Copyright, Frederik Hermans.
Global State (1) a)A consistent cut b)An inconsistent cut.
Synchronization Chapter 5. Outline 1.Clock synchronization 2.Logical clocks 3.Global state 4.Election algorithms 5.Mutual exclusion 6.Distributed transactions.
Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Distributed synchronization and mutual exclusion Distributed Transactions.
Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Election algorithms –Bully algorithm –Ring algorithm Distributed.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
7c.1 Silberschatz, Galvin and Gagne ©2003 Operating System Concepts with Java Module 7c: Atomicity Atomic Transactions Log-based Recovery Checkpoints Concurrent.
Synchronization Chapter 5. Table of Contents Clock Synchronization Logical Clocks Global State Election Algorithms Mutual Exclusion.
Clock Synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time.
Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Synchronization. Clock Synchronization In a centralized system time is unambiguous. In a distributed system agreement on time is not obvious. When each.
10 1 Chapter 10_B Concurrency Control Database Systems: Design, Implementation, and Management, Rob and Coronel.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Synchronization Chapter 5. Clock Synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Lecture on Synchronization Submitted by
Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Election algorithms –Bully algorithm –Ring algorithm Distributed.
Chapter 13 Managing Transactions and Concurrency Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
Last Class: Canonical Problems
Concurrency Control II (OCC, MVCC)
Chapter 10 Transaction Management and Concurrency Control
Distributed Database Management Systems
Atomic Commit and Concurrency Control
Lecture 10: Coordination and Agreement
Chapter 5 (through section 5.4)
Lecture 11: Coordination and Agreement
Concurrency control (OCC and MVCC)
Presentation transcript:

1 CS 194: Elections, Exclusion and Transactions Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA

2 Finishing Last Lecture  We discussed time synchronization, Lamport clocks, and vector clocks -Time synchronization makes the clocks agree better -Lamport clocks establish clocks that are causally consistent But they leave too much ambiguity -Vector clocks tighten up ambiguity by weaving much finer web of causality Lots of overhead  I’ll now finish up the material on global state

3 Global State  Global state is local state of each process, including any sent messages -Think of this as the sequence of events in each process -Useful for debugging distributed system, etc.  If we had perfect synchronization, it would be easy to get global state at some time t -But don’t have synchronization, so need to take snapshot with different times in different processes  A consistent state is one in which no received messages haven’t been sent -No causal relationships violated

4 Method #1: Use Lamport Clocks  Pick some time t  Collect state of all processes when their local Lamport clock is t (or the largest time less than t)  Can causality be violated?  A violation would required that the receipt of the message is before t and the sending of it is after t.

5 Method #2: Distributed Snapshot  Initiating process records local state and sends out “marker” along its channels -Note: all communication goes through channels! -Each process has some set of channels to various other processes  Whenever a process receives a marker: -First marker: records state, then sends out marker -Otherwise: records all messages received after it recorded its own local state  A process is done when it has received a marker along each channel; it then sends state to initiator -Can’t receive any more messages

6 Why Does This Work?  Assume A sends message to B, but in the snapshot B records the receipt but A does not record the send  A’s events: receive marker, send message out all channels, then send message to B  B’s events: receive message from A, then receive marker  This can’t happen! Why?

7 What Does This Rely On?  Ordered message delivery  Limited communication patterns (channels)  In the Internet, this algorithm would require n 2 messages

8 Lamport Clocks vs Snapshot  What are the tradeoffs?  Lamport: overhead on every message, but only on the messages sent  Snapshot: no per-message overhead, but snapshot requires messages along each channel -If channels are limited, snapshot might be better -If channels are unlimited, Lamport is probably better

9 Termination Detection  Assume processes are in either a passive state or an active state: -Active: still performing computation, might send messages -Passive: done with computation, won’t become active unless it receives a message  Want to know if computation has terminated -all processes passive  Not really a snapshot algorithm

10 Termination Detection (2)  Send markers as before (no state recording)  Set up predecessor/successor relationships -Your first marker came from your predecessor -You are your successor’s predecessor  Send “done” to predecessor if: -All your successors have sent you a “done” -You are passive  Otherwise, send “continue”  If initiator gets any “continue” messages, resends marker  If initiator gets all “done” messages, termination

11 Comments  Few of these algorithms work at scale, with unreliable messages and flaky nodes  What do we do in those cases?

12 Back to Lecture 7  Elections  Exclusion  Transactions

13 Elections  Need to select a node as the “coordinator” -It doesn’t matter which node  At the end of the election, all nodes agree on who the coordinator is

14 Assumptions  All nodes have a unique ID number  All nodes know the ID numbers of all other nodes -What world are these people living in???  But they don’t know which nodes are down  Someone will always notice when the coordinator is down

15 Bully Algorithm  When a node notices the coordinator is down, it initiates an election  Election: -Send a message to all nodes with higher IDs -If no one responds, you win! -If someone else responds, they take over and hold their own election -Winner sends out a message to all announcing their election

16 Gossip-Based Method  Does not require everyone know everyone else  Assume each node knows a few other nodes, and that the “knows-about” graph is connected  Coordinator periodically sends out message with sequence number and its ID, which is then “flooded” to all nodes  If a node notices that its ID is larger than the current coordinator, it starts sending out such messages  If the sequence number hasn’t changed recently, someone starts announcing

17 Which is Better?  In small systems, Bully might be easier  In large and dynamic systems, Gossip dominates  Why?

18 Exclusion  Ensuring that a critical resource is accessed by no more than one process at the same time  Centralized: send all requests to a coordinator (who was picked using the election algorithm) -3 message exchange to access -Problem: coordinator failures  Distributed: treat everyone as a coordinator -2(n-1) message exchange to access -Problem: any node crash

19 Majority Algorithm  Require that a node get permission from over half of the nodes before accessing resource -Nodes don’t give permission to more than one node at a time  Why is this better?  N=1000, p=.99 -Unanimous: Prob of success = 4x Majority: Prob of failure = orders of magnitude better!!

20 Interlocking Permission Sets  Every node I can access the resource if it gets permission from a set V(I) -Want sets to be as small as possible, but evenly distributed  What are the requirements on the sets V?  For every I,J, V(I) and V(J) must share at least one member  If we assume all sets V are the same size, and that each node is a member in the same number of sets, how big are they?

21 Transactions  Atomic: changes are all or nothing  Consistent: Does not violate system invariants  Isolated: Concurrent transactions do not interfere with each other (serializable)  Durable: Changes are permanent

22 Implementation Methods  Private workspace  Writeahead log

23 Concurrency Control  Want to allow several transactions to be in progress  But the result must be the same as some sequential order of transactions  Transactions are a series of operations on data items: -Write(A), Read(B), Write(B), etc. -We will represent them as O(A) -In general, A should be a set, but ignore for convenience  Question: how to schedule these operations coming from different transactions?

24 Example  T1: O1(A), O1(A,B), O1(B)  T2: O2(A), O2(B)  Possible schedules: -O1(A),O1(A,B),O1(B),O2(A),O2(B) = T1, T2 -O1(A),O2(A),O1(A,B),O2(B),O1(B) = ?? -O1(A),O1(A,B),O2(A),O1(B),O2(B) = T1, T2  How do you know? What are general rules?

25 Grab and Hold  At start of transaction, lock all data items you’ll use  Release only at end  Obviously serializable: done in order of lock grabbing

26 Grab and Unlock When Not Needed  Lock all data items you’ll need  When you no longer have left any operations involving a data item, release the lock for that data item  Why is this serializable?

27 Lock When First Needed  Lock data items only when you first need them  When done with computation, release all locks  Why does this work?  What is the serial order?

28 Potential Problem  Deadlocks!  If two transactions get started, but each need the other’s data item, then they are doomed to deadlock  T1=O1(A),O1(A,B)  T2=O2(B),O2(A,B)  O1(A),O2(B) is a legal starting schedule, but they deadlock, both waiting for the lock of the other item

29 Deadlocks  Releasing early does not cause deadlocks  Locking late can cause deadlocks

30 Lock When Needed, Unlock When Not Needed  Grab when first needed  Unlock when no longer needed  Does this work?

31 Example  T1 = O1(A),O1(B)  T2 = O2(A),O2(B)  O1(A),O2(A),O1(B),O2(B) = T1,T2  O1(A),O2(A),O2(B),O1(B) = ??

32 Two Phase Locking  Lock data items only when you first need them  After you’ve gotten all the locks you need, unlock data items when you no longer need them  Growing phase followed by shrinking phase  Why does this work?  What is the serial order?

33 Alternative to Locking  Use timestamps!  Transaction has timestamp, and every operation carries that timestamp  Serializable order is timestamped order  Data items have: -Read timestamp tR: timestamp of transaction that last read it -Write timestamp tW: timestamp of transaction that last wrote it

34 Pessimistic Timestamp Ordering  If ts < tW(A) when transaction tries to read A, then abort  If ts < tR(A) when transaction tries to write A, then abort  But can allow -ts > tW(A) for reading -ts > tR(A) for writing  No need to look at tR for reading or tW for writing

35 Optimistic Timestamp Ordering  Do whatever you want (in your private workspace), but keep track of timestamps  Before committing results, check to see if any of the data has changed since when you started  Useful if few conflicts