Presentation is loading. Please wait.

Presentation is loading. Please wait.

Asynchronous rebalancing of a replicated tree

Similar presentations


Presentation on theme: "Asynchronous rebalancing of a replicated tree"— Presentation transcript:

1 Asynchronous rebalancing of a replicated tree
Marek Zawirski INRIA & UPMC, France Marc Shapiro INRIA & LIP6, France Nuno Preguiça UNL, Portugal CFSE, May 2011, Saint-Malo

2 Summary Overview of Treedoc: Problem faced:
Abstractly, always-responsive replicated sequence Built as a replicated ordering tree Problem faced: Tree unbalance Solution for asynchronous tree rebalance Algorithm requirements statement Novel F-translate algorithm Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

3 Treedoc – a replicated sequence
1 Replicated representation: Grow-only binary tree Stable, unique position ids 1 L P L P Total order ”<”: infix traversal = 1 P Sequence of atoms: Ops: read, addAt, removeAt L I P replica3 I 1 L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

4 Treedoc – a replicated sequence
1 1 L P L P L I P replica3 I S 1 addAt( , S) S < < I S P L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

5 Treedoc – a replicated sequence
1 1 L P L P S L I S P replica3 I 1 addAt( , S) S < < I S P = 10 S L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

6 Treedoc – a replicated sequence
1 1 L P L P S X L I S P replica3 I 1 addAt( , S) S L P removeAt( ) P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

7 Treedoc – a replicated sequence
1 1 L P L P S L I S replica3 I 1 S addAt( , S) L P P removeAt( ) [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

8 Treedoc – a replicated sequence
Operation-based replication: Immediate local execution Propagate (cbcast) & replay I I 1 1 L P L P S S L I S replica3 I 1 addAt( , S) addAt( , S) S addAt( , S) S S L P P removeAt( ) S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

9 Treedoc – a replicated sequence
Operation-based replication: Immediate local execution Propagate (cbcast) & replay I I 1 1 L P L P S S L I S replica3 I 1 addAt( , S) S L P P removeAt( ) P removeAt( ) P removeAt( ) S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

10 Treedoc – a replicated sequence
Operation-based replication: Immediate local execution Propagate (cbcast) & replay I I 1 1 L P L P E S A S A addAt( , A) A addAt( , A) E L I S replica3 addAt( , E) E I 1 L P ? S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

11 Treedoc – a replicated sequence
Operation-based replication: Immediate local execution Propagate (cbcast) & replay I I 1 1 L P L P E A S A S A addAt( , A) A addAt( , A) E A L I S replica3 addAt( , E) E addAt( , E) E addAt( , E) E I 1 A addAt( , A) L P Predefined order: red < green < blue… S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

12 Treedoc – a replicated sequence
Operation-based replication: Immediate local execution Propagate (cbcast) & replay Concurrent commute Eventually consistent I I 1 1 L P L P E A S E A S E A L I S replica3 I 1 L P Predefined order: red < green < blue… E A S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

13 The tree rebalance problem
With time tree gets worse and worse Unbalanced, empty nodes, lot of colors… Various negative impacts I 1 L P E A S Tree rebalance: Create minimal tree from nonempty nodes Keep order “<“ Use single color (white) New ids (rectangles), incompatible with old rebalance L 1 Challenge: How to avoid system-wide consensus? E S 1 A I Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

14 The core-nebula architecture
Idea: limit consensus to a smaller number of replicas Divide replicas into two disjoint sets: [Leția et. al, 2009] CORE a stable group execute tree operations & agree on rebalance easier agreement NEBULA sites join & leave, dynamic generate tree operations learns about rebalance perform catch-up protocol to integrate conc. changes never blocked Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

15 Rebalance in core, catch-up from nebula
1 1 1 1 L P P L P P L P P L P P 1 1 1 1 S S S S S addAt( , S) S addAt( , S) S addAt( , S) S addAt( , S) Any pair of replicas can exchange operations in the same epoch Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

16 Rebalance in core, catch-up from nebula
1 1 1 1 L P P L P P L P P L P P 1 1 1 1 S S S S P removeAt( ) P removeAt( ) P removeAt( ) P removeAt( ) Any pair of replicas can exchange operations in the same epoch Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

17 Rebalance in core, catch-up from nebula
1 1 1 1 L I P L P I L I P L I P 1 1 1 1 S S S S I removeAt( ) Any pair of replicas can exchange operations in the same epoch Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

18 Rebalance in core, catch-up from nebula
1 1 1 1 L I P L P I L I P L P I 1 1 1 1 1 1 S P S U P S U S final state P addAt( , P) P addAt( , P) U addAt( , U) U addAt( , U) rebalance Any pair of replicas can exchange operations in the same epoch initiates new epoch and inherently concurrent to S L T addAt( , T) T Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

19 Rebalance in core, catch-up from nebula
1 1 1 1 L P I L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state P addAt( , P) P addAt( , P) U addAt( , U) rebalance X Pairwise catch-up moves nebula replica to the next epoch S ? L T addAt( , T) T Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

20 Rebalance in core, catch-up from nebula
1 1 1 1 L I P L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state I removeAt( ) rebalance Pairwise catch-up moves nebula replica to the next epoch replay core ops until final state S L T Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

21 Rebalance in core, catch-up from nebula
1 1 1 1 L P I L I P L P I L I P 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up Pairwise catch-up moves nebula replica to the next epoch replay core ops until final state replay rebalance on final nodes translate nodes of nebula operations || rebalance into the new tree S S L L 1 ? T P Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

22 Naive translation algorithm(s)
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up S S Naive translation algorithm: Create new position respecting old order observed at the nebula replica L L 1 < < L P S ~[Leția et. al, 2009] T P < < L P S Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

23 Naive translation algorithm(s)
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L P I L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state I removeAt( ) rebalance catch-up S S L L 1 T P Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

24 Naive translation algorithm(s)
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up catch-up S S S L L L 1 1 T P U < < L U S Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

25 Naive translation algorithm(s)
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L I P L I P L I P 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up catch-up S S S L L L 1 1 T P U addAt( , P) P addAt( , P) U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

26 Naive translation algorithm(s)
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L P I L I P L P I L P I 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up catch-up Order observed at nebula2 broken! S S S L L L < 1 U P 1 T P U < P U P U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

27 Towards correct translate: requirements
Order-preserving For every , the order is preserved between epochs: < => < Deterministic For every , nebulai, nebulaj, is translated identically: @nebulai = @nebulaj Non-disruptive For every created by addAt and created by translate: != Solution: designate all cases in advance using final state only! X Y X Y X Y X X X X X Y X Y Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

28 F-Translate algorithm & sentinels
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L I P L P I L P I 1 1 1 1 1 1 S P S U P S U S final state Sentinel position : Designated position for every potential translate < < … < < Materialized on translate S Xi S L L S1 X X1 X2 Xim Y T L1 L2 L3 L4 Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

29 F-Translate algorithm & sentinels
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L P I L P I L I P L P I 1 1 1 1 1 1 S P S U P S U S final state F-Translate: Right child of final node S S L L S1 T L1 L1 L2 L3 L4 1 U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

30 F-Translate algorithm & sentinels
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L I P L I P L I P 1 1 1 1 1 1 S P S U P S U S final state F-Translate: Child of empty f. node Etc. S S S L L S1 L S1 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 1 P U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

31 F-Translate algorithm & sentinels
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L I P L I P L I P 1 1 1 1 1 1 S P S U P S U S final state I removeAt( ) S S S L L S1 L S1 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 1 P U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

32 F-Translate algorithm & sentinels
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L P I L P I L I P L P I 1 1 1 1 1 1 S P S U P S U S final state S S S S L L S1 L S1 L S1 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 1 1 P U P U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

33 F-Translate algorithm & sentinels
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L P I L I P L P I 1 1 1 1 1 1 S P S U P S U S final state S S S S L L S1 L S1 L S1 T L1 L1 L2 L3 L3 L4 T L1 L1 L2 L3 L3 L4 T L1 L1 L2 L3 L3 L4 T L1 L1 L2 L3 L3 L4 1 1 1 1 U P U P U P U P Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

34 F-Translate: sentinel implementation
How to implement sentinelAfter? TODO: REPLCACE w/figure!!! show if time allows & audience is not lost Empty nodes are necessary! Is more empty nodes discarded than introduced? Yes, but… How to minimize empty nodes in sentinelAfter Using balanced binary tree! Define addAt such that it avoids empty nodes, long paths Two-round empty node discard X O(1) number of empty nodes / path + O(log imax ) / path Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

35 Summary Problem faced: Solution for asynchronous tree rebalance
Tree unbalance Solution for asynchronous tree rebalance Algorithm requirements statement Novel F-translate algorithm: Identify and utilize final state prior to rebalance Use sentinel positions Prototype catch-up implementation Future work? Evaluation of sentinelAfter implementation Formal order-preservation proof Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree

36 Appendix: the unbalance problem
Use sparse tree and heuristic to assign PosID [Weiss et. al, ‘09] Tested on Wikipedia traces; works at the cost of possible anomaly Use list instead of a tree [Roh et. al, ‘10] Different costs and convergence characteristics? Rebalance the tree [Shapiro, Preguiça et. al, ‘09] System-wide consensus; inherent limitations The core-nebula idea [Leția et. al ‘09]; incorrect translation This work brings: More formalization of the core-nebula for asynchronous systems Flaws revealed in naive algorithms Translation requirements statement Novel F-translate algorithm First prototype implementation in Java (subject to further studies) Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree


Download ppt "Asynchronous rebalancing of a replicated tree"

Similar presentations


Ads by Google