Download presentation
Presentation is loading. Please wait.
Published byMelissa Wilkins Modified over 6 years ago
1
EECS 498 Introduction to Distributed Systems Fall 2017
Harsha V. Madhyastha
2
Dynamo Recap Consistent hashing Execution of reads and writes
1-hop DHT enabled by gossip Execution of reads and writes Coordinated by first available successor of key Read/write from N available successors Declare success if R reads or W writes succeed Use vector clocks to identify causal precedence Vector of (coordinator, count) pairs November 8, 2017 EECS 498 – Lecture 16
3
Dealing with Failures Permanent failures: Temporary failures:
Identified and marked manually Discovered via gossip Temporary failures: Handled by sloppy quorums What if less than W of N replicas are up? How does a node catch up once it recovers? November 8, 2017 EECS 498 – Lecture 16
4
Hinted Handoff Suppose coordinator doesn’t receive W replies from N successors Could return failure But, want to maximize availability Coordinator tries writing to subsequent nodes (beyond N successors of key) Coordinator informs recipient of intended node November 8, 2017 EECS 498 – Lecture 16
5
Hinted Handoff: Example
Say N = 3, W = 3 Hinted Handoff: B writes to itself, D and E E points to node C When C is available again E forwards the replicated data back to C Key K Coordinator November 8, 2017 EECS 498 – Lecture 16
6
Removing threats to durability
Hinted handoff node may crash before it replicates data to appropriate node Need to ensure that each key-value pair is replicated N times Solution: Replica synchronization Nodes nearby on ring periodically gossip Compare the (k, v) pairs they hold Copy any missing keys the other has How to efficiently sync between two nodes? November 8, 2017 EECS 498 – Lecture 16
7
How to sync, quickly? A B B tells A: highest timestamp for every node
e.g., “X 30, Y 40” What if every node stores only (key, value) map? How to efficiently sync? A B 〈-,10, X〉 〈-,20, Y〉 〈-,30, X〉 〈-,40, X〉 〈-,40, Y〉 Version vector November 8, 2017 EECS 498 – Lecture 16
8
Efficient synchronization with Merkle trees
Merkle tree hierarchically summarizes the key-value pairs a node holds Leaf node = hash of one key’s value Internal node = hash of concatenation of children Compare roots; if match, values match If they don’t match, compare children Iterate this process down the tree November 8, 2017 EECS 498 – Lecture 16
9
Merkle tree reconciliation
B is missing orange key; A is missing green one Exchange and compare hash nodes from root downwards, pruning when hashes match A’s values: B’s values: [0, 2128) [0, 2128) [0, 2127) [2127, 2128) [0, 2127) [2127, 2128) November 8, 2017 EECS 498 – Lecture 16
10
Load Balancing What needs to happen to bootstrap N72?
Copy (k,v) pairs from N90 to N72 N120 N10 N105 N32 N90 N60 N72 November 8, 2017 EECS 498 – Lecture 16
11
Initializing New Server
kv-store map[string]string InitServer(nodeID int64) { s = get_successor(hash(ID)) call(s, “Transfer”, ID, &kv-store) } November 8, 2017 EECS 498 – Lecture 16
12
Handing Off to New Server
kv-store map[string]string Transfer(ID int64, shard map[str]str) { for key, val := range kv-store { if hash(key) < hash(ID) { shard[key] = value delete(kv-store, key) } November 8, 2017 EECS 498 – Lecture 16
13
Revisiting Consistent Hashing
What needs to happen to bootstrap N72? Copy (k,v) pairs from N90 to N72 Need to scan (k,v) map at N90 Worse with virtual nodes Need to recompute Merkle trees N120 N10 N105 Root cause of problems: Consistent hashing determines both partitioning and placement Node addition/removal affects both how data is partitioned and where partitions are placed N32 N90 N60 N72 November 8, 2017 EECS 498 – Lecture 16
14
Decoupling Partitioning and Placement
Partition hash space statically into fixed number of equal sized shards Place shard on first N virtual nodes after its end Need to identify only which shards to hand off Maintain Merkle tree per shard November 8, 2017 EECS 498 – Lecture 16
15
Dynamo Summary Scalability and low latency: High availability:
1-hop DHT enabled by inter-node gossip High availability: Sloppy quorums, hinted handoff Eventual consistency: Vector clocks, Merkle trees Load balancing: Decoupling partitioning and placement of data November 8, 2017 EECS 498 – Lecture 16
16
Impact of Dynamo NoSQL systems
Eventual consistency popularized by Dynamo Better scalability than strongly consistent systems Frustration over application development complexity to offer intuitive user experience Recent trend: Scalable strong consistency Example: Google’s Spanner November 8, 2017 EECS 498 – Lecture 16
17
Impact of Partitioning State
Why consistent hashing or DHT? Enable load balancing across partitions Accommodate state too big for one server What if operation touches multiple partitions? Examples: Add meeting to calendars of two participants Transfer money from one account to another What about looking up balance of two accounts? Need distributed transactions November 8, 2017 EECS 498 – Lecture 16
18
Executing Transactions
To ensure atomicity, execute one transaction at a time Problem? Desired property: Serializability Despite concurrent execution, externally visible effects equivalent to some serial order of execution November 8, 2017 EECS 498 – Lecture 16
19
Example of Serializability
Concurrent execution of transactions: T1: Transfer $10 from Alice to Bob T2: Read balance in Alice’s and Bob’s accounts Initial balance of $100 in both accounts Permissible outputs for T2? (Alice: $100, Bob: $100) or (Alice: $90, Bob: $110) Invalid outputs for T2: (Alice: $90, Bob: $100) or (Alice: $100: Bob: $110) November 8, 2017 EECS 498 – Lecture 16
20
Example Scenario Many students are each looking to host a party
Each host invites a subset of other students Party is on only if all invitees can make it Consensus among all students Slow to identify all parties that are on How can each host decide whether his/her party is on without coordinating all students? November 8, 2017 EECS 498 – Lecture 16
21
Achieving Serializability
Client submits transaction to coordinator (TC) TC acquires locks on all data involved Once locks acquired, execute transaction and release locks November 8, 2017 EECS 498 – Lecture 16
22
Two Phase Locking TC P1 P2 P3 Lock Commit November 8, 2017
EECS 498 – Lecture 16
23
Two Phase Locking TC P1 P2 P3 Lock Abort November 8, 2017
EECS 498 – Lecture 16
24
Two Phase Locking TC acquires locks on all necessary shards before attempting transaction Disjoint transactions can execute concurrently Related transactions wait on each other How to ensure fault-tolerance? November 8, 2017 EECS 498 – Lecture 16
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.