Download presentation
Presentation is loading. Please wait.
Published byEleanore Stanley Modified over 6 years ago
1
Asynchronous rebalancing of a replicated tree
Marek Zawirski INRIA & UPMC, France Marc Shapiro INRIA & LIP6, France Nuno Preguiça UNL, Portugal CFSE, May 2011, Saint-Malo
2
Summary Overview of Treedoc: Problem faced:
Abstractly, always-responsive replicated sequence Built as a replicated ordering tree Problem faced: Tree unbalance Solution for asynchronous tree rebalance Algorithm requirements statement Novel F-translate algorithm Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
3
Treedoc – a replicated sequence
1 Replicated representation: Grow-only binary tree Stable, unique position ids 1 L P L P Total order ”<”: infix traversal = 1 P Sequence of atoms: Ops: read, addAt, removeAt L I P replica3 I 1 L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
4
Treedoc – a replicated sequence
1 1 L P L P L I P replica3 I S 1 addAt( , S) S < < I S P L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
5
Treedoc – a replicated sequence
1 1 L P L P S L I S P replica3 I 1 addAt( , S) S < < I S P = 10 S L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
6
Treedoc – a replicated sequence
1 1 L P L P S X L I S P replica3 I 1 addAt( , S) S L P removeAt( ) P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
7
Treedoc – a replicated sequence
1 1 L P L P S L I S replica3 I 1 S addAt( , S) L P P removeAt( ) [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
8
Treedoc – a replicated sequence
Operation-based replication: Immediate local execution Propagate (cbcast) & replay I I 1 1 L P L P S S L I S replica3 I 1 addAt( , S) addAt( , S) S addAt( , S) S S L P P removeAt( ) S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
9
Treedoc – a replicated sequence
Operation-based replication: Immediate local execution Propagate (cbcast) & replay I I 1 1 L P L P S S L I S replica3 I 1 addAt( , S) S L P P removeAt( ) P removeAt( ) P removeAt( ) S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
10
Treedoc – a replicated sequence
Operation-based replication: Immediate local execution Propagate (cbcast) & replay I I 1 1 L P L P E S A S A addAt( , A) A addAt( , A) E L I S replica3 addAt( , E) E I 1 L P ? S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
11
Treedoc – a replicated sequence
Operation-based replication: Immediate local execution Propagate (cbcast) & replay I I 1 1 L P L P E A S A S A addAt( , A) A addAt( , A) E A L I S replica3 addAt( , E) E addAt( , E) E addAt( , E) E I 1 A addAt( , A) L P Predefined order: red < green < blue… S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
12
Treedoc – a replicated sequence
Operation-based replication: Immediate local execution Propagate (cbcast) & replay Concurrent commute Eventually consistent I I 1 1 L P L P E A S E A S E A L I S replica3 I 1 L P Predefined order: red < green < blue… E A S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
13
The tree rebalance problem
With time tree gets worse and worse Unbalanced, empty nodes, lot of colors… Various negative impacts I 1 L P E A S Tree rebalance: Create minimal tree from nonempty nodes Keep order “<“ Use single color (white) New ids (rectangles), incompatible with old rebalance L 1 Challenge: How to avoid system-wide consensus? E S 1 A I Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
14
The core-nebula architecture
Idea: limit consensus to a smaller number of replicas Divide replicas into two disjoint sets: [Leția et. al, 2009] CORE a stable group execute tree operations & agree on rebalance easier agreement NEBULA sites join & leave, dynamic generate tree operations learns about rebalance perform catch-up protocol to integrate conc. changes never blocked Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
15
Rebalance in core, catch-up from nebula
1 1 1 1 L P P L P P L P P L P P 1 1 1 1 S S S S S addAt( , S) S addAt( , S) S addAt( , S) S addAt( , S) Any pair of replicas can exchange operations in the same epoch Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
16
Rebalance in core, catch-up from nebula
1 1 1 1 L P P L P P L P P L P P 1 1 1 1 S S S S P removeAt( ) P removeAt( ) P removeAt( ) P removeAt( ) Any pair of replicas can exchange operations in the same epoch Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
17
Rebalance in core, catch-up from nebula
1 1 1 1 L I P L P I L I P L I P 1 1 1 1 S S S S I removeAt( ) Any pair of replicas can exchange operations in the same epoch Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
18
Rebalance in core, catch-up from nebula
1 1 1 1 L I P L P I L I P L P I 1 1 1 1 1 1 S P S U P S U S final state P addAt( , P) P addAt( , P) U addAt( , U) U addAt( , U) rebalance Any pair of replicas can exchange operations in the same epoch initiates new epoch and inherently concurrent to S L T addAt( , T) T Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
19
Rebalance in core, catch-up from nebula
1 1 1 1 L P I L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state P addAt( , P) P addAt( , P) U addAt( , U) rebalance X Pairwise catch-up moves nebula replica to the next epoch S ? L T addAt( , T) T Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
20
Rebalance in core, catch-up from nebula
1 1 1 1 L I P L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state I removeAt( ) rebalance Pairwise catch-up moves nebula replica to the next epoch replay core ops until final state S L T Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
21
Rebalance in core, catch-up from nebula
1 1 1 1 L P I L I P L P I L I P 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up Pairwise catch-up moves nebula replica to the next epoch replay core ops until final state replay rebalance on final nodes translate nodes of nebula operations || rebalance into the new tree S S L L 1 ? T P Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
22
Naive translation algorithm(s)
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up S S Naive translation algorithm: Create new position respecting old order observed at the nebula replica L L 1 < < L P S ~[Leția et. al, 2009] T P < < L P S Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
23
Naive translation algorithm(s)
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L P I L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state I removeAt( ) rebalance catch-up S S L L 1 T P Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
24
Naive translation algorithm(s)
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L P I L I P L I P 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up catch-up S S S L L L 1 1 T P U < < L U S Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
25
Naive translation algorithm(s)
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L I P L I P L I P 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up catch-up S S S L L L 1 1 T P U addAt( , P) P addAt( , P) U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
26
Naive translation algorithm(s)
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L P I L I P L P I L P I 1 1 1 1 1 1 S P S U P S U S final state rebalance catch-up catch-up Order observed at nebula2 broken! S S S L L L < 1 U P 1 T P U < P U P U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
27
Towards correct translate: requirements
Order-preserving For every , the order is preserved between epochs: < => < Deterministic For every , nebulai, nebulaj, is translated identically: @nebulai = @nebulaj Non-disruptive For every created by addAt and created by translate: != Solution: designate all cases in advance using final state only! X Y X Y X Y X X X X X Y X Y Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
28
F-Translate algorithm & sentinels
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L I P L P I L P I 1 1 1 1 1 1 S P S U P S U S final state Sentinel position : Designated position for every potential translate < < … < < Materialized on translate S Xi S L L S1 X X1 X2 Xim Y T L1 L2 L3 L4 Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
29
F-Translate algorithm & sentinels
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L P I L P I L I P L P I 1 1 1 1 1 1 S P S U P S U S final state F-Translate: Right child of final node S S L L S1 T L1 L1 L2 L3 L4 1 U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
30
F-Translate algorithm & sentinels
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L I P L I P L I P 1 1 1 1 1 1 S P S U P S U S final state F-Translate: Child of empty f. node Etc. S S S L L S1 L S1 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 1 P U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
31
F-Translate algorithm & sentinels
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L I P L I P L I P 1 1 1 1 1 1 S P S U P S U S final state I removeAt( ) S S S L L S1 L S1 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 1 P U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
32
F-Translate algorithm & sentinels
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L P I L P I L I P L P I 1 1 1 1 1 1 S P S U P S U S final state S S S S L L S1 L S1 L S1 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 1 1 P U P U Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
33
F-Translate algorithm & sentinels
core1 nebula1 nebula2 nebula3 I I I I 1 1 1 1 L I P L P I L I P L P I 1 1 1 1 1 1 S P S U P S U S final state S S S S L L S1 L S1 L S1 T L1 L1 L2 L3 L3 L4 T L1 L1 L2 L3 L3 L4 T L1 L1 L2 L3 L3 L4 T L1 L1 L2 L3 L3 L4 1 1 1 1 U P U P U P U P Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
34
F-Translate: sentinel implementation
How to implement sentinelAfter? TODO: REPLCACE w/figure!!! show if time allows & audience is not lost Empty nodes are necessary! Is more empty nodes discarded than introduced? Yes, but… How to minimize empty nodes in sentinelAfter Using balanced binary tree! Define addAt such that it avoids empty nodes, long paths Two-round empty node discard X O(1) number of empty nodes / path + O(log imax ) / path Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
35
Summary Problem faced: Solution for asynchronous tree rebalance
Tree unbalance Solution for asynchronous tree rebalance Algorithm requirements statement Novel F-translate algorithm: Identify and utilize final state prior to rebalance Use sentinel positions Prototype catch-up implementation Future work? Evaluation of sentinelAfter implementation Formal order-preservation proof Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
36
Appendix: the unbalance problem
Use sparse tree and heuristic to assign PosID [Weiss et. al, ‘09] Tested on Wikipedia traces; works at the cost of possible anomaly Use list instead of a tree [Roh et. al, ‘10] Different costs and convergence characteristics? Rebalance the tree [Shapiro, Preguiça et. al, ‘09] System-wide consensus; inherent limitations The core-nebula idea [Leția et. al ‘09]; incorrect translation This work brings: More formalization of the core-nebula for asynchronous systems Flaws revealed in naive algorithms Translation requirements statement Novel F-translate algorithm First prototype implementation in Java (subject to further studies) Zawirski, Shapiro, Preguiça - Asynchronous rebalancing of a replicated tree
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.