Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fault Tolerant Storage And Quorum Systems in Dynamic Environments Uri Nadav, Master thesis Advisor: Moni Naor The Weizmann Institute of Science.

Similar presentations


Presentation on theme: "Fault Tolerant Storage And Quorum Systems in Dynamic Environments Uri Nadav, Master thesis Advisor: Moni Naor The Weizmann Institute of Science."— Presentation transcript:

1 Fault Tolerant Storage And Quorum Systems in Dynamic Environments Uri Nadav, Master thesis Advisor: Moni Naor The Weizmann Institute of Science

2 Slide - 2 Agenda Fault-Tolerant Storage System  Fighting Censors Quorum System for Dynamic Networks

3 Slide - 3 Goal Distributed file storage system  Peer-to-peer environment  Processors join and leave the system Partial Solutions  Distributed File sharing applications [Gnutella, Kazaa]  Distributed Hash Tables [DH, Chord, Viceroy] Store (key, value) pairs and perform lookup on key

4 Slide - 4 Fault-Tolerant Storage System Censor  Aims to eliminate access to some files Design Goal:  A reader should be able to reconstruct each file with high probability after faults have been caused Probability taken over coins of the writer and reader

5 Slide - 5 Adversarial Model Adversary chooses the set of processors to crash fail-stop failures  We do not consider Byzantine failures Different degrees of adaptiveness  Non adaptive adversary Choice of faulty processors is not based on their content  Adversary with a limited number of queries May query some processors

6 Slide - 6 Other Fault Models Random faults model:  Examples: Distance Halving DHT, Chord  Standard technique: Replication to log ( n ) processors Assures survival with high probability Adversarial faults [Fiat, Saia]  Large fraction accessible after adversary crashes a linear fraction of the processors Still, a censor can target a specific file

7 Slide - 7 Measures of Quality Read/Write complexity:  Average number of processors accessed during a read/write operation Number of rounds:  Number of rounds required from an adaptive reader Blowup Ratio:  Ratio between the total number of bits used for the storage of a file and its size

8 Slide - 8 Quorum Systems Formal Definition:  U – Universe  F ½ 2 U  8 A,B 2 F A Å B  ;  F is called a quorum system. A,B are called quorum sets Probabilistic  -intersecting quorum system [Malkhi et al]  Strategy (distribution) w over 2 U  Two sets A,B drawn from the strategy w, intersect with probability at least 1  A quorum system is an intersecting family of sets over some universe The set of processors from which a file is read must intersect the set of processors to which a file was written

9 Slide - 9 Storage system example The  intersecting quorum system [Malkhi et al]  The quorum set is made of all sets of size -----  Pick one quorum uniformly at random  Intersection follows from the birthday paradox Storage System:  Storage: A file is replicated to all members of a quorum set  Retrieval: Choose a quorum set and probe its members

10 Slide - 10 Properties of the Probabilistic Storage System Pros :  Simplicity  Resilient against linear number of faults Even if the processors are chosen by the adversary adaptively  Adapted to a dynamic environment [Abraham, Malkhi] Target: Come up with a storage system with better parameters Cons :  High read/write complexity (  )  High blowup-ratio (  )

11 Slide - 11 The Model Non-adaptive adversary:  Chooses a set of processors (linear size) without accessing any processor first. Non-adaptive reader:  Processors are chosen without accessing any processor Theorem: A fault tolerant storage system, in the non-adaptive reader model, resilient against  (n) faults, cannot do better than the  -intersecting storage system example.

12 Slide - 12 Lower bound on the blowup-ratio Theorem: A system which tolerates  ( n ) faults, with ----- read complexity, has Blowup Ratio = Lower Bounds for the Non-Adaptive Reader Model Lower bound on the read/write complexity Theorem: A system which tolerates  ( n ) faults has Read Complexity ¢ Write Complexity = ---- Formal definitions of the non-adaptive storage system

13 Slide - 13 Slightly Adaptive Adversary Model Reader can have adaptive queries  Wish to have a small number of rounds to shorten time complexity of read operation Slightly adaptive adversary:  Adversary is less adaptive than the reader  queries Fail-stop and not Byzantine faults

14 Slide - 14 Generic Storage Scheme Storing a file:  Encode a file using a coding scheme with a constant blowup ratio (Reed Solomon, IDA [Rabin] )  Distribute to a set chosen by a write strategy with load ------ (optimal) Retrieval of a file, after faults:  Find enough processors from the ‘write set’  Decode the file using the coding scheme Fault-tolerance:  With high probability: The adversary doesn’t find any element during the adaptive queries phase. At least half of the processors in a chosen write set survive  Any half of the processors in the write set can reconstruct the file To instantiate a storage system – plug-in a write strategy and a read algorithm Load: Maximal probability of a processor to be chosen

15 Slide - 15 Choosing a write strategy What about the random strategy of the example?  Not a good candidate  A read algorithm that finds a constant fraction of the surviving processors requires access to  ( n ) processors We will present a strategy with ---------- read complexity  Logarithmic number of rounds.  Using the And-Or tree.

16 Slide - 16 The And-Or Tree Structure Complete binary tree  Leaves represent processors  Inner nodes are AND/OR gates Alternating layers 1 AND OR 2 3 4 9 10 1112 5 678141516 AND/OR gate Processor 13 AND OR

17 Slide - 17 The And-Or Tree Structure Recursive Definition of ANDset, ORset collections  Recursive procedure for selecting a set Write strategy:  Pick a set from the ANDset collection uniformly at random Intersection Property:  A set from ANDset collection and a set from ORset collection intersect 1 2 3 4 9 10 1112 5 678141516 13 13 16  AND  OR  AND  OR AND/OR gate Processor

18 Slide - 18 Adaptive Read Algorithm 1 2 3 4 9 10 1112 5 678141516 13 Write set 1 2 5 6 Pick a set from the ORset collection to find an element from the write set AND/OR gate Processor

19 Slide - 19 Read Algorithm - Pruning the Tree… 1 2 3 4 9 10 1112 5 678 141516 13 1 To find remaining items, algorithm is recursively applied to remaining subtrees Total of processor-accesses during the read algorithm Write set AND/OR gate Processor

20 Slide - 20 Properties of the And-Or Storage System  Constant blowup ratio, write complexity and ---------- read complexity  Logarithmic number of rounds  Resilient against  (n) faults of a slightly adaptive adversary Cannot expect anything much better in terms of read/write tradeoff!

21 Slide - 21 Early Stopping When Less Faults Occur Drawback: The read complexity is high even when no faults occur Dynamic read-complexity:  When up to t faults occur the read complexity is  Pay in logarithmic instead of constant blowup ratio

22 Slide - 22 Dynamically Adjusting to the Number of Faults AND OR AND OR Each Node represents a processor, not only leaves Redefine the ANDset collection so that each set includes all the visited nodes The size of a set in the collection remains ------

23 Slide - 23 Where do we stand? And-Or for static network Ignored routing scheme Adaptation of the And-Or storage system Storage coupled with the routing Use the distance-halving network [Naor-Wieder] Next: Dynamic Environment

24 Slide - 24 Dynamic Hash Tables The continuous space is partitioned locally (on the fly) into cells corresponding to processors  Each point in [0,1) is covered by exactly one processor 01

25 Slide - 25 The Distance Halving Network [Naor, Wieder] 01x continuous graph  Nodes: [0,1) interval  Edges: Left and right outgoing edges  Each point is the root of a binary tree subgraph

26 Slide - 26 The Distance Halving Network [Naor, Wieder] Connect two processors if their respective cells are connected in the continuous graph 01

27 Slide - 27 Embedding the Storage System The binary tree  Subgraph of the continuous graph  Well defined for each point  Depth log n 01 Edges covered by network connections Gossip protocols Each file has a different tree

28 Slide - 28 Storage Through Gossip Data percolate using DH edges for log ( n ) steps After a single write operation the writer is done, and the file can already be retrieved Fault-Tolerance is built during gossip  When messages reach the nodes in the i th level, the file is  ( 2 i ) fault-tolerant

29 Slide - 29 Retrieval Uses routing protocol of the DH-network  Routing dilation is O ( log ( n )) Total time for retrieval is O ( log 2 ( n )) Read complexity can be dynamically adjusted to the number of faults  Store in every processor visited

30 Slide - 30 Fault-Tolerance Balanced network  A processor covers a segment of size O ( 1 / n )  Various balancing techniques ( Manku, Karger and Ruhl, Naor and Wieder, Abraham et al) Theorem: When the network is balanced, the system is  (n) fault-tolerant 01

31 Slide - 31 Open Questions Do the lower bounds shown when both the reader and the adversary are non-adaptive hold when both are adaptive? Is there a fault-tolerant storage system in the adaptive reader model with o( log ( n )) rounds?

32 Slide - 32 Summary The probabilistic solution is optimal in the non- adaptive reader model The And-Or storage system  Constant blowup-ratio  Almost optimal read/write complexity Adaptation of the storage system in a dynamic environment  Storage uses network topology  When the system is balanced it maintains fault-tolerance

33 Slide - 33 Agenda Fault-Tolerant Storage System  Fighting Censors The And-Or Quorum System  Static case  Dynamic Networks Quorum systems are important beyond their application in storage (mutual exclusion, load balancing, access control…)

34 Slide - 34 Measures of Quality Load:  Load of strategy: maximal probability of a processor to be chosen  Minimum over all strategies Availability:  Probability all quorums are hit under random faults Probe Complexity:  Number of probes required to obtain a live quorum w.h.p

35 Slide - 35 The And-Or Quorum System Known Properties [Naor, Wool] :  Optimal Load, High Availability Our contribution: Static network:  Optimal non-adaptive algorithm  Optimal adaptive algorithm Construction in a dynamic network

36 Slide - 36 Non Adaptive Algorithm Probes Matches a lower bound [Naor, Wieder] 2loglog n

37 Slide - 37 Adaptive Algorithm Probe complexity Run in parallel  2loglogn rounds Local Adjustments

38 Slide - 38 Dynamic Quorum System The universe constantly changes Two challenges:  Integrity: Intersection property Combinatorial structure and properties  Locality: Local way to access a quorum

39 Slide - 39 Dynamic And-Or Embedding of a binary tree DH-Graph  Left, Right children  Define Tree on each point  Leaves equally divides [0,1) 01 A quorum of processors is the set that covers the points in a quorum

40 Slide - 40 Dynamic And-Or Locality  Natural gossip protocol Integrity  When network grows/shrinks members of quorums gossip themselves to children/parent in the continuous graph  Network connections cover edges in the continuous graph

41 Slide - 41 Load Processor is chosen when covered leaves are chosen  Optimal load on leaves  Balanced Network Induced optimal load on processors 01

42 Slide - 42 Availability of the Dynamic Quorum Static case:  Global Failure probability exponentially decays  Processor fails with probability < 0.25 Dynamic case:  Problem in analysis: Faults are not independent When the network is balanced… Two leaves are dependent, only if covered by same processor Constant number of dependent faults Domination by a product measure

43 Slide - 43 Domination by a Product Measure Finite set S Space of configurations:  = {0,1} S Partial order :  1,  2 2   1 ¸  2 if 8 s 2 S,  1 (s) ¸  2 (s) Function f increasing:  1 ¸  2 ) f(  1 ) ¸ f(  2 ) Product measure (  p )  8 s 2 S, Pr[  (s)=0] = p   (s) independent of all others

44 Slide - 44 Domination by a Product Measure , Probability measures on ,   dominates ( ¹  ): for every increasing f E (f) · E  (f) [Ligget et al]: If 8 s 2 S, Pr[  (s)=0] < p and this event is dependent on at most k other such events (where k is a constant), then, 9 p`<p, s.t.  p’ ¹  By decreasing p, p` can be made arbitrarily close to 0

45 Slide - 45 Availability of the Dynamic And-Or S is the set of leaves,  the configurations  probability measure induced by random faults on processors  Balanced network: limited independence  Dominates a product measure  p’ When p' < 0.25, F p' · O(exp(-n 0.5 ))

46 Slide - 46 Probe Complexity of the Dynamic And-Or Nonadaptive  Subtrees are not independent  Positively correlated Adaptive  Expected constant height for local subtrees  Expected number of probes  Markov: Optimal probe complexity with probability 1-o(1)

47 Slide - 47 Other Dynamic Quorum Systems Dynamic Probabilistic QS [Abraham, Malkhi]  Random walk  Very high availability For arbitrary failure probability  Higher load Dynamic Paths [Naor, Wieder]  Emulate Paths quorum system Voronoi diagram  High availability Failure probability < 0.5  Slower Adaptive algorithm

48 Slide - 48 Summary Non-adaptive, Adaptive Algorithms to And-Or  Optimal  Adaptive case: Excellent time complexity Adaptation over dynamic overlay network Optimal Load, probe complexity and high availability  Domination by product measure

49 Slide - 49 Open Questions (on Quorum Systems) Lower bound to the adaptive algorithmic probe complexity Better analysis of the adaptive algorithm for dynamic network

50 Slide - 50 Elementary Storage System Write strategy  w  Distribution on { N } n Encoder: E (f,q w )  (x 1,…,x n ) Read strategy  r  Distribution on {0,1} n Decoder: D (x 1,…,x n )  {0,1} k q w chosen by  w

51 Slide - 51 Reconstruction Decoding a previously encoded file: D (  ( E (f,q w ),q r )) = f ( ,k)-Storage System: 8 f 2 {0,1} k, Pr[ D (  ( E (f,q w ),q r )) = f] > 1-  q w, q r chosen from  w,  r   projection to mask unread processors:   x 1,x 2,x 3,x 4, (1,1,0,1)) = (x 1,x 2, ,x 4 )

52 Slide - 52 ( ,k)-Intersection Property Write Strategy  w, Read Strategy  r The Pair (  w,  r ) satisfy ( ,k)-Intersection Property, if Pr[ h q w,q r i > k] > 1-  number of bits read

53 Slide - 53 Storage System Characterization Theorem: Let S=(  w, E,  r, D ) be an ( ,k)-storage system. Then  w,  r maintain the (2 ,k) intersection property.

54 Slide - 54 Error Correcting Codes View storage-system as coding scheme: Message: files concatenated Codeword: Processors’ memories concatenated Worst case Faults-Model  Adversary “knows” the content of each processor

55 Slide - 55 Locally Decodable Codes Decode a single symbol, instead of the whole message  No need to read all the codeword(?) Rates:  No linear code for constant number of queries [Katz, Trevisan]  Exponential lower bound for 2 queries [Goldreich et al],[Wolf, Ker’]  Linear rate for polynomial number of queries Multivariate code [Reed-Muller]

56 Slide - 56 Balanced And-Or Tree Family of trees  Constant difference  Optimal Load  Good Availability Different constants Useful for the dynamic case


Download ppt "Fault Tolerant Storage And Quorum Systems in Dynamic Environments Uri Nadav, Master thesis Advisor: Moni Naor The Weizmann Institute of Science."

Similar presentations


Ads by Google