Download presentation
Presentation is loading. Please wait.
1
Fault Tolerant Storage And Quorum Systems in Dynamic Environments Uri Nadav, Master thesis Advisor: Moni Naor The Weizmann Institute of Science
2
Slide - 2 Agenda Fault-Tolerant Storage System Fighting Censors Quorum System for Dynamic Networks
3
Slide - 3 Goal Distributed file storage system Peer-to-peer environment Processors join and leave the system Partial Solutions Distributed File sharing applications [Gnutella, Kazaa] Distributed Hash Tables [DH, Chord, Viceroy] Store (key, value) pairs and perform lookup on key
4
Slide - 4 Fault-Tolerant Storage System Censor Aims to eliminate access to some files Design Goal: A reader should be able to reconstruct each file with high probability after faults have been caused Probability taken over coins of the writer and reader
5
Slide - 5 Adversarial Model Adversary chooses the set of processors to crash fail-stop failures We do not consider Byzantine failures Different degrees of adaptiveness Non adaptive adversary Choice of faulty processors is not based on their content Adversary with a limited number of queries May query some processors
6
Slide - 6 Other Fault Models Random faults model: Examples: Distance Halving DHT, Chord Standard technique: Replication to log ( n ) processors Assures survival with high probability Adversarial faults [Fiat, Saia] Large fraction accessible after adversary crashes a linear fraction of the processors Still, a censor can target a specific file
7
Slide - 7 Measures of Quality Read/Write complexity: Average number of processors accessed during a read/write operation Number of rounds: Number of rounds required from an adaptive reader Blowup Ratio: Ratio between the total number of bits used for the storage of a file and its size
8
Slide - 8 Quorum Systems Formal Definition: U – Universe F ½ 2 U 8 A,B 2 F A Å B ; F is called a quorum system. A,B are called quorum sets Probabilistic -intersecting quorum system [Malkhi et al] Strategy (distribution) w over 2 U Two sets A,B drawn from the strategy w, intersect with probability at least 1 A quorum system is an intersecting family of sets over some universe The set of processors from which a file is read must intersect the set of processors to which a file was written
9
Slide - 9 Storage system example The intersecting quorum system [Malkhi et al] The quorum set is made of all sets of size ----- Pick one quorum uniformly at random Intersection follows from the birthday paradox Storage System: Storage: A file is replicated to all members of a quorum set Retrieval: Choose a quorum set and probe its members
10
Slide - 10 Properties of the Probabilistic Storage System Pros : Simplicity Resilient against linear number of faults Even if the processors are chosen by the adversary adaptively Adapted to a dynamic environment [Abraham, Malkhi] Target: Come up with a storage system with better parameters Cons : High read/write complexity ( ) High blowup-ratio ( )
11
Slide - 11 The Model Non-adaptive adversary: Chooses a set of processors (linear size) without accessing any processor first. Non-adaptive reader: Processors are chosen without accessing any processor Theorem: A fault tolerant storage system, in the non-adaptive reader model, resilient against (n) faults, cannot do better than the -intersecting storage system example.
12
Slide - 12 Lower bound on the blowup-ratio Theorem: A system which tolerates ( n ) faults, with ----- read complexity, has Blowup Ratio = Lower Bounds for the Non-Adaptive Reader Model Lower bound on the read/write complexity Theorem: A system which tolerates ( n ) faults has Read Complexity ¢ Write Complexity = ---- Formal definitions of the non-adaptive storage system
13
Slide - 13 Slightly Adaptive Adversary Model Reader can have adaptive queries Wish to have a small number of rounds to shorten time complexity of read operation Slightly adaptive adversary: Adversary is less adaptive than the reader queries Fail-stop and not Byzantine faults
14
Slide - 14 Generic Storage Scheme Storing a file: Encode a file using a coding scheme with a constant blowup ratio (Reed Solomon, IDA [Rabin] ) Distribute to a set chosen by a write strategy with load ------ (optimal) Retrieval of a file, after faults: Find enough processors from the ‘write set’ Decode the file using the coding scheme Fault-tolerance: With high probability: The adversary doesn’t find any element during the adaptive queries phase. At least half of the processors in a chosen write set survive Any half of the processors in the write set can reconstruct the file To instantiate a storage system – plug-in a write strategy and a read algorithm Load: Maximal probability of a processor to be chosen
15
Slide - 15 Choosing a write strategy What about the random strategy of the example? Not a good candidate A read algorithm that finds a constant fraction of the surviving processors requires access to ( n ) processors We will present a strategy with ---------- read complexity Logarithmic number of rounds. Using the And-Or tree.
16
Slide - 16 The And-Or Tree Structure Complete binary tree Leaves represent processors Inner nodes are AND/OR gates Alternating layers 1 AND OR 2 3 4 9 10 1112 5 678141516 AND/OR gate Processor 13 AND OR
17
Slide - 17 The And-Or Tree Structure Recursive Definition of ANDset, ORset collections Recursive procedure for selecting a set Write strategy: Pick a set from the ANDset collection uniformly at random Intersection Property: A set from ANDset collection and a set from ORset collection intersect 1 2 3 4 9 10 1112 5 678141516 13 13 16 AND OR AND OR AND/OR gate Processor
18
Slide - 18 Adaptive Read Algorithm 1 2 3 4 9 10 1112 5 678141516 13 Write set 1 2 5 6 Pick a set from the ORset collection to find an element from the write set AND/OR gate Processor
19
Slide - 19 Read Algorithm - Pruning the Tree… 1 2 3 4 9 10 1112 5 678 141516 13 1 To find remaining items, algorithm is recursively applied to remaining subtrees Total of processor-accesses during the read algorithm Write set AND/OR gate Processor
20
Slide - 20 Properties of the And-Or Storage System Constant blowup ratio, write complexity and ---------- read complexity Logarithmic number of rounds Resilient against (n) faults of a slightly adaptive adversary Cannot expect anything much better in terms of read/write tradeoff!
21
Slide - 21 Early Stopping When Less Faults Occur Drawback: The read complexity is high even when no faults occur Dynamic read-complexity: When up to t faults occur the read complexity is Pay in logarithmic instead of constant blowup ratio
22
Slide - 22 Dynamically Adjusting to the Number of Faults AND OR AND OR Each Node represents a processor, not only leaves Redefine the ANDset collection so that each set includes all the visited nodes The size of a set in the collection remains ------
23
Slide - 23 Where do we stand? And-Or for static network Ignored routing scheme Adaptation of the And-Or storage system Storage coupled with the routing Use the distance-halving network [Naor-Wieder] Next: Dynamic Environment
24
Slide - 24 Dynamic Hash Tables The continuous space is partitioned locally (on the fly) into cells corresponding to processors Each point in [0,1) is covered by exactly one processor 01
25
Slide - 25 The Distance Halving Network [Naor, Wieder] 01x continuous graph Nodes: [0,1) interval Edges: Left and right outgoing edges Each point is the root of a binary tree subgraph
26
Slide - 26 The Distance Halving Network [Naor, Wieder] Connect two processors if their respective cells are connected in the continuous graph 01
27
Slide - 27 Embedding the Storage System The binary tree Subgraph of the continuous graph Well defined for each point Depth log n 01 Edges covered by network connections Gossip protocols Each file has a different tree
28
Slide - 28 Storage Through Gossip Data percolate using DH edges for log ( n ) steps After a single write operation the writer is done, and the file can already be retrieved Fault-Tolerance is built during gossip When messages reach the nodes in the i th level, the file is ( 2 i ) fault-tolerant
29
Slide - 29 Retrieval Uses routing protocol of the DH-network Routing dilation is O ( log ( n )) Total time for retrieval is O ( log 2 ( n )) Read complexity can be dynamically adjusted to the number of faults Store in every processor visited
30
Slide - 30 Fault-Tolerance Balanced network A processor covers a segment of size O ( 1 / n ) Various balancing techniques ( Manku, Karger and Ruhl, Naor and Wieder, Abraham et al) Theorem: When the network is balanced, the system is (n) fault-tolerant 01
31
Slide - 31 Open Questions Do the lower bounds shown when both the reader and the adversary are non-adaptive hold when both are adaptive? Is there a fault-tolerant storage system in the adaptive reader model with o( log ( n )) rounds?
32
Slide - 32 Summary The probabilistic solution is optimal in the non- adaptive reader model The And-Or storage system Constant blowup-ratio Almost optimal read/write complexity Adaptation of the storage system in a dynamic environment Storage uses network topology When the system is balanced it maintains fault-tolerance
33
Slide - 33 Agenda Fault-Tolerant Storage System Fighting Censors The And-Or Quorum System Static case Dynamic Networks Quorum systems are important beyond their application in storage (mutual exclusion, load balancing, access control…)
34
Slide - 34 Measures of Quality Load: Load of strategy: maximal probability of a processor to be chosen Minimum over all strategies Availability: Probability all quorums are hit under random faults Probe Complexity: Number of probes required to obtain a live quorum w.h.p
35
Slide - 35 The And-Or Quorum System Known Properties [Naor, Wool] : Optimal Load, High Availability Our contribution: Static network: Optimal non-adaptive algorithm Optimal adaptive algorithm Construction in a dynamic network
36
Slide - 36 Non Adaptive Algorithm Probes Matches a lower bound [Naor, Wieder] 2loglog n
37
Slide - 37 Adaptive Algorithm Probe complexity Run in parallel 2loglogn rounds Local Adjustments
38
Slide - 38 Dynamic Quorum System The universe constantly changes Two challenges: Integrity: Intersection property Combinatorial structure and properties Locality: Local way to access a quorum
39
Slide - 39 Dynamic And-Or Embedding of a binary tree DH-Graph Left, Right children Define Tree on each point Leaves equally divides [0,1) 01 A quorum of processors is the set that covers the points in a quorum
40
Slide - 40 Dynamic And-Or Locality Natural gossip protocol Integrity When network grows/shrinks members of quorums gossip themselves to children/parent in the continuous graph Network connections cover edges in the continuous graph
41
Slide - 41 Load Processor is chosen when covered leaves are chosen Optimal load on leaves Balanced Network Induced optimal load on processors 01
42
Slide - 42 Availability of the Dynamic Quorum Static case: Global Failure probability exponentially decays Processor fails with probability < 0.25 Dynamic case: Problem in analysis: Faults are not independent When the network is balanced… Two leaves are dependent, only if covered by same processor Constant number of dependent faults Domination by a product measure
43
Slide - 43 Domination by a Product Measure Finite set S Space of configurations: = {0,1} S Partial order : 1, 2 2 1 ¸ 2 if 8 s 2 S, 1 (s) ¸ 2 (s) Function f increasing: 1 ¸ 2 ) f( 1 ) ¸ f( 2 ) Product measure ( p ) 8 s 2 S, Pr[ (s)=0] = p (s) independent of all others
44
Slide - 44 Domination by a Product Measure , Probability measures on , dominates ( ¹ ): for every increasing f E (f) · E (f) [Ligget et al]: If 8 s 2 S, Pr[ (s)=0] < p and this event is dependent on at most k other such events (where k is a constant), then, 9 p`<p, s.t. p’ ¹ By decreasing p, p` can be made arbitrarily close to 0
45
Slide - 45 Availability of the Dynamic And-Or S is the set of leaves, the configurations probability measure induced by random faults on processors Balanced network: limited independence Dominates a product measure p’ When p' < 0.25, F p' · O(exp(-n 0.5 ))
46
Slide - 46 Probe Complexity of the Dynamic And-Or Nonadaptive Subtrees are not independent Positively correlated Adaptive Expected constant height for local subtrees Expected number of probes Markov: Optimal probe complexity with probability 1-o(1)
47
Slide - 47 Other Dynamic Quorum Systems Dynamic Probabilistic QS [Abraham, Malkhi] Random walk Very high availability For arbitrary failure probability Higher load Dynamic Paths [Naor, Wieder] Emulate Paths quorum system Voronoi diagram High availability Failure probability < 0.5 Slower Adaptive algorithm
48
Slide - 48 Summary Non-adaptive, Adaptive Algorithms to And-Or Optimal Adaptive case: Excellent time complexity Adaptation over dynamic overlay network Optimal Load, probe complexity and high availability Domination by product measure
49
Slide - 49 Open Questions (on Quorum Systems) Lower bound to the adaptive algorithmic probe complexity Better analysis of the adaptive algorithm for dynamic network
50
Slide - 50 Elementary Storage System Write strategy w Distribution on { N } n Encoder: E (f,q w ) (x 1,…,x n ) Read strategy r Distribution on {0,1} n Decoder: D (x 1,…,x n ) {0,1} k q w chosen by w
51
Slide - 51 Reconstruction Decoding a previously encoded file: D ( ( E (f,q w ),q r )) = f ( ,k)-Storage System: 8 f 2 {0,1} k, Pr[ D ( ( E (f,q w ),q r )) = f] > 1- q w, q r chosen from w, r projection to mask unread processors: x 1,x 2,x 3,x 4, (1,1,0,1)) = (x 1,x 2, ,x 4 )
52
Slide - 52 ( ,k)-Intersection Property Write Strategy w, Read Strategy r The Pair ( w, r ) satisfy ( ,k)-Intersection Property, if Pr[ h q w,q r i > k] > 1- number of bits read
53
Slide - 53 Storage System Characterization Theorem: Let S=( w, E, r, D ) be an ( ,k)-storage system. Then w, r maintain the (2 ,k) intersection property.
54
Slide - 54 Error Correcting Codes View storage-system as coding scheme: Message: files concatenated Codeword: Processors’ memories concatenated Worst case Faults-Model Adversary “knows” the content of each processor
55
Slide - 55 Locally Decodable Codes Decode a single symbol, instead of the whole message No need to read all the codeword(?) Rates: No linear code for constant number of queries [Katz, Trevisan] Exponential lower bound for 2 queries [Goldreich et al],[Wolf, Ker’] Linear rate for polynomial number of queries Multivariate code [Reed-Muller]
56
Slide - 56 Balanced And-Or Tree Family of trees Constant difference Optimal Load Good Availability Different constants Useful for the dynamic case
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.