Asynchronous rebalancing of a replicated tree

Slides:



Advertisements
Similar presentations
Knowledge Construction
Advertisements

Abdulrahman Idlbi COE, KFUPM Jan. 17, Past Schedulers: 1.2 & : circular queue with round-robin policy. Simple and minimal. Not focused on.
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
Chapter 4: Trees Part II - AVL Tree
AVL Trees Balancing. The AVL Tree An AVL tree is a balanced binary search tree. What does it mean for a tree to be balanced? It means that for every node.
Conflict-free Replicated Data Types MARC SHAPIRO, NUNO PREGUIÇA, CARLOS BAQUERO AND MAREK ZAWIRSKI Presented by: Ron Zisman.
CSC 213 Lecture 7: Binary, AVL, and Splay Trees. Binary Search Trees (§ 9.1) Binary search tree (BST) is a binary tree storing key- value pairs (entries):
CS 582 / CMPE 481 Distributed Systems Replication.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CS 61B Data Structures and Programming Methodology Aug 11, 2008 David Sun.
Franck Petit INRIA, LIP Lab. Univ. / ENS of Lyon France Optimal Probabilistic Ring Exploration by Semi-Synchronous Oblivious Robots Joint work with Stéphane.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Practical Byzantine Fault Tolerance
S-Paxos: Eliminating the Leader Bottleneck
Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Tree Traversals, TreeSort 20 February Expression Tree Leaves are operands Interior nodes are operators A binary tree to represent (A - B) + C.
Vertex Coloring Distributed Algorithms for Multi-Agent Networks
Keeping Binary Trees Sorted. Search trees Searching a binary tree is easy; it’s just a preorder traversal public BinaryTree findNode(BinaryTree node,
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables II.
Improving Authenticated Dynamic Dictionaries
AVL Trees CSE, POSTECH.
Data Flow Analysis Suman Jana
Query Optimization Heuristic Optimization
Delay-Tolerant Networks (DTNs)
File Organization and Processing Week 3
Module 2: Conditionals and Logical Equivalences
Model and complexity Many measures Space complexity Time complexity
Multiway Search Trees Data may not fit into main memory
Parallel Tasks Decomposition
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
CO4301 – Advanced Games Development Week 10 Red-Black Trees continued
Instructor: Vincent Conitzer
Week 11 - Friday CS221.
Elastic Consistent Hashing for Distributed Storage Systems
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs (cont.) Dr. Xiao.
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
CS223 Advanced Data Structures and Algorithms
Efficient Distance Computation between Non-Convex Objects
Replication and Consistency
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Dr. David Matuszek Heapsort Dr. David Matuszek
Replication and Consistency
Topic 23 Red Black Trees "People in every direction No words exchanged No time to exchange And all the little ants are marching Red and Black.
Presented by Marek Zawirski
Active replication for fault tolerance
Fault-tolerance techniques RSM, Paxos
Copyright © Aiman Hanna All rights reserved
SPLAY TREES.
Data Structures & Algorithms
EEC 688/788 Secure and Dependable Computing
Algorithms and Data Structures Lecture VIII
EEC 688/788 Secure and Dependable Computing
Heapsort.
COMP60621 Fundamentals of Parallel and Distributed Systems
CE 221 Data Structures and Algorithms
Balanced binary search trees
CSC 143 Java Applications of Trees.
Haitao Wang Utah State University SoCG 2017, Brisbane, Australia
CSC 143 Binary Search Trees.
Heapsort.
EEC 688/788 Secure and Dependable Computing
Heapsort.
COMP60611 Fundamentals of Parallel and Distributed Systems
326 Lecture 9 Henry Kautz Winter Quarter 2002
CO 303 Algorithm Analysis and Design
Replication and Consistency
Announcement Sign up google sheet for in class lectures
Presentation transcript:

Asynchronous rebalancing of a replicated tree Marek Zawirski INRIA & UPMC, France Marc Shapiro INRIA & LIP6, France Nuno Preguiça UNL, Portugal CFSE, May 2011, Saint-Malo

Summary Overview of Treedoc: Problem faced: Abstractly, always-responsive replicated sequence Built as a replicated ordering tree Problem faced: Tree unbalance Solution for asynchronous tree rebalance Algorithm requirements statement Novel F-translate algorithm Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Treedoc – a replicated sequence 1 Replicated representation: Grow-only binary tree Stable, unique position ids 1 L P L P Total order ”<”: infix traversal = 1 P Sequence of atoms: Ops: read, addAt, removeAt L I P replica3 I 1 L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Treedoc – a replicated sequence 1 1 L P L P L I P replica3 I S 1 addAt( , S) S < < I S P L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Treedoc – a replicated sequence 1 1 L P L P S L I S P replica3 I 1 addAt( , S) S < < I S P = 10 S L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Treedoc – a replicated sequence 1 1 L P L P S X L I S P replica3 I 1 addAt( , S) S L P removeAt( ) P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Treedoc – a replicated sequence 1 1 L P L P S L I S replica3 I 1 S addAt( , S) L P P removeAt( ) [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Treedoc – a replicated sequence Operation-based replication: Immediate local execution Propagate (cbcast) & replay I I 1 1 L P L P S S L I S replica3 I 1 addAt( , S) addAt( , S) S addAt( , S) S S L P P removeAt( ) S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Treedoc – a replicated sequence Operation-based replication: Immediate local execution Propagate (cbcast) & replay I I 1 1 L P L P S S L I S replica3 I 1 addAt( , S) S L P P removeAt( ) P removeAt( ) P removeAt( ) S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Treedoc – a replicated sequence Operation-based replication: Immediate local execution Propagate (cbcast) & replay I I 1 1 L P L P E S A S A addAt( , A) A addAt( , A) E L I S replica3 addAt( , E) E I 1 L P ? S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Treedoc – a replicated sequence Operation-based replication: Immediate local execution Propagate (cbcast) & replay I I 1 1 L P L P E A S A S A addAt( , A) A addAt( , A) E A L I S replica3 addAt( , E) E addAt( , E) E addAt( , E) E I 1 A addAt( , A) L P Predefined order: red < green < blue… S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Treedoc – a replicated sequence Operation-based replication: Immediate local execution Propagate (cbcast) & replay Concurrent commute Eventually consistent I I 1 1 L P L P E A S E A S E A L I S replica3 I 1 L P Predefined order: red < green < blue… E A S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

The tree rebalance problem With time tree gets worse and worse Unbalanced, empty nodes, lot of colors… Various negative impacts I 1 L P E A S Tree rebalance: Create minimal tree from nonempty nodes Keep order “<“ Use single color (white) New ids (rectangles), incompatible with old rebalance L 1 Challenge: How to avoid system-wide consensus? E S 1 A I Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

The core-nebula architecture Idea: limit consensus to a smaller number of replicas Divide replicas into two disjoint sets: [Leția et. al, 2009] CORE a stable group execute tree operations & agree on rebalance easier agreement NEBULA sites join & leave, dynamic generate tree operations learns about rebalance perform catch-up protocol to integrate conc. changes never blocked Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Rebalance in core, catch-up from nebula 1 1 1 1 L P P L P P L P P L P P 1 1 1 1 S S S S S addAt( , S) S addAt( , S) S addAt( , S) S addAt( , S) Any pair of replicas can exchange operations in the same epoch Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Rebalance in core, catch-up from nebula 1 1 1 1 L P P L P P L P P L P P 1 1 1 1 S S S S P removeAt( ) P removeAt( ) P removeAt( ) P removeAt( ) Any pair of replicas can exchange operations in the same epoch Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Rebalance in core, catch-up from nebula 1 1 1 1 L I P L P I L I P L I P 1 1 1 1 S S S S I removeAt( ) Any pair of replicas can exchange operations in the same epoch Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Rebalance in core, catch-up from nebula 1 1 1 1 L I P L P I L I P L P I 1 1 1 1 1 1 S P S U P S U S final state P addAt( , P) P addAt( , P) U addAt( , U) U addAt( , U) rebalance Any pair of replicas can exchange operations in the same epoch rebalance@core initiates new epoch rebalance@core and operations@core inherently concurrent to ops@nebula! S L T addAt( , T) T Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Rebalance in core, catch-up from nebula 1 1 1 1 L P I L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state P addAt( , P) P addAt( , P) U addAt( , U) rebalance X Pairwise catch-up moves nebula replica to the next epoch S ? L T addAt( , T) T Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Rebalance in core, catch-up from nebula 1 1 1 1 L I P L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state I removeAt( ) rebalance Pairwise catch-up moves nebula replica to the next epoch replay core ops until final state S L T Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Rebalance in core, catch-up from nebula 1 1 1 1 L P I L I P L P I L I P 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up Pairwise catch-up moves nebula replica to the next epoch replay core ops until final state replay rebalance on final nodes translate nodes of nebula operations || rebalance into the new tree S S L L 1 ? T P Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up S S Naive translation algorithm: Create new position respecting old order observed at the nebula replica L L 1 < < L P S ~[Leția et. al, 2009] T P < < L P S Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L P I L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state I removeAt( ) rebalance catch-up S S L L 1 T P Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up catch-up S S S L L L 1 1 T P U < < L U S Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L I P L I P L I P 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up catch-up S S S L L L 1 1 T P U addAt( , P) P addAt( , P) U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L P I L I P L P I L P I 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up catch-up Order observed at nebula2 broken! S S S L L L < 1 U P 1 T P U < P U P U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Towards correct translate: requirements Order-preserving For every , the order is preserved between epochs: < => < Deterministic For every , nebulai, nebulaj, is translated identically: @nebulai = @nebulaj Non-disruptive For every created by addAt and created by translate: != Solution: designate all cases in advance using final state only! X Y X Y X Y X X X X X Y X Y Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

F-Translate algorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L I P L P I L P I 1 1 1 1 1 1 S P S U P S U S final state Sentinel position : Designated position for every potential translate < < … < < Materialized on translate S Xi S L L S1 X X1 X2 Xim Y T L1 L2 L3 L4 Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

F-Translate algorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L P I L P I L I P L P I 1 1 1 1 1 1 S P S U P S U S final state F-Translate: Right child of final node S S L L S1 T L1 L1 L2 L3 L4 1 U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

F-Translate algorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L I P L I P L I P 1 1 1 1 1 1 S P S U P S U S final state F-Translate: Child of empty f. node Etc. S S S L L S1 L S1 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 1 P U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

F-Translate algorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L I P L I P L I P 1 1 1 1 1 1 S P S U P S U S final state I removeAt( ) S S S L L S1 L S1 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 1 P U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

F-Translate algorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L P I L P I L I P L P I 1 1 1 1 1 1 S P S U P S U S final state S S S S L L S1 L S1 L S1 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 1 1 P U P U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

F-Translate algorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L P I L I P L P I 1 1 1 1 1 1 S P S U P S U S final state S S S S L L S1 L S1 L S1 T L1 L1 L2 L3 L3 L4 T L1 L1 L2 L3 L3 L4 T L1 L1 L2 L3 L3 L4 T L1 L1 L2 L3 L3 L4 1 1 1 1 U P U P U P U P Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

F-Translate: sentinel implementation How to implement sentinelAfter? TODO: REPLCACE w/figure!!! show if time allows & audience is not lost Empty nodes are necessary! Is more empty nodes discarded than introduced? Yes, but… How to minimize empty nodes in sentinelAfter Using balanced binary tree! Define addAt such that it avoids empty nodes, long paths Two-round empty node discard X O(1) number of empty nodes / path + O(log imax ) / path Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Summary Problem faced: Solution for asynchronous tree rebalance Tree unbalance Solution for asynchronous tree rebalance Algorithm requirements statement Novel F-translate algorithm: Identify and utilize final state prior to rebalance Use sentinel positions Prototype catch-up implementation Future work? Evaluation of sentinelAfter implementation Formal order-preservation proof Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

Appendix: the unbalance problem Use sparse tree and heuristic to assign PosID [Weiss et. al, ‘09] Tested on Wikipedia traces; works at the cost of possible anomaly Use list instead of a tree [Roh et. al, ‘10] Different costs and convergence characteristics? Rebalance the tree [Shapiro, Preguiça et. al, ‘09] System-wide consensus; inherent limitations The core-nebula idea [Leția et. al ‘09]; incorrect translation This work brings: More formalization of the core-nebula for asynchronous systems Flaws revealed in naive algorithms Translation requirements statement Novel F-translate algorithm First prototype implementation in Java (subject to further studies) Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree