Download presentation
Presentation is loading. Please wait.
1
Coterie availability in sites Flavio Junqueira and Keith Marzullo University of California, San Diego DISC, Krakow, Poland, September 2005
2
DISC’05 2 Multi-site systems Emerging class of distributed systems Collection of sites across a WAN Multiple nodes in each site Share resources Data sets Computational power E.g. BIRN, Geon, TeraGrid, PlanetLab Site failure All the nodes in a site simultaneously unavailable
3
DISC’05 3 Site availability — BIRN 10 sites experience at least one outage One site under 97%
4
DISC’05 4 Improving availability Better availability through replication Coteries Set system of processes: a set of subsets of processes Each subset is called a quorum Minimal sets, pairwise intersect Coteries are useful Distributed mutual exclusion Distributed registers Consensus through Paxos Coterie availability in multi-site systems
5
DISC’05 5 Roadmap System model Availability metrics Previous deterministic metrics not necessarily good A new metric Failure model Characterize failures using survivor sets Survivor sets: more expressive Quorum construction Multi-site hierarchical construction Practical issues Failure model in practice PlanetLab experiment Conclusions
6
DISC’05 6 System model Set P of processes Pairwise connected by quasi-reliable asynchronous channels Process failure: crash Processes can recover Set B of sites Partition of the set processes Site failure: simultaneous failure of all the processes in the site Process failures are not independent Execution Sequence of steps of processes E : set of all executions In a step s Available process in s p P is available if p F(s)
7
DISC’05 7 Survivor sets A set S P is a survivor set iff Example Processes Sites E ={E 1,E 2,E 3,E 4 } E1,E2:E1,E2: s1s1 s2s2 E3:E3: s1s1 E4:E4: s1s1 NF(s i ) Survivor sets
8
DISC’05 8 Availability metrics Traditional deterministic metrics Undirected graph: nodes = processes, edges = comm. links Node vulnerability: Minimal number of nodes Edge vulnerability: Minimal number of edges Majority is optimal [Barbara and Garcia-Molina’86] Complete graphs
9
DISC’05 9 A counterexample Processes Survivor sets Sites Majority Quorum: 5 processes In some step, no quorum can be formed Using S P as quorums In every step, at least one quorum can be formed Majority is not optimal
10
DISC’05 10 Availability metrics Traditional deterministic metrics Undirected graph: nodes = processes, edges = comm. links Node vulnerability: Minimal number of nodes Edge vulnerability: Minimal number of edges Majority is optimal [Barbara and Garcia-Molina’86] Complete graphs A new metric A ( Q ), Q is a coterie Number of covered survivor sets in Q A survivor set S is covered in Q if:
11
DISC’05 11 Failure model Multi-site hierarchical model A set F s of subsets of B Subsets of simultaneously faulty sites An array F p One entry per site Each entry: subsets of processes in the site Subsets of simultaneously faulty processes at a site A survivor set S : FS F s B i FS: FP F p [i]:P\FP S B i FS:B i S = Processes ( P ) B1B1 B2B2 B3B3 F s ={{B 1 },{B 2 },{B 3 }} 1 23 1 23 1 23 F p [1]={{ }: i {1,2,3}} i F p [2]={{ }: i {1,2,3}} i F p [3]={{ }: i {1,2,3}} i Sites( B ) S p ={{ }: i, j,k,l {1,2,3} i j k l} ij kl {{ }: i, j,k,l {1,2,3} i j k l} ij kl ij kl
12
DISC’05 12 Quorum construction Optimal availability with respect to A Coterie Q : S p = Q OR Q dominates S p Survivor sets in S p pairwise intersect If not, then optimally discarding survivor sets is NP-Complete A special case: Qsite All subsets of B of size f s in F s All subsets of size t of B i in F p [i], for every i Site 1 Site 2 Site 3 E.g.: f s = 1, t = 1 Quorums
13
DISC’05 13 Model in practice Qsite f s : Threshold on site failures Data on site availability t : Threshold on process failures Markov chains One Markov chain for each site Transitions Failure transitions: same probability, homogeneous processes Repair transitions: variable probability, amount of resources used Failure transitionsRepair transitions
14
DISC’05 14 PlanetLab experiment Toy application Paxos: quorums of acceptors Client accessing quorums Hosts used Three sites: three from each site One UCSD host: proposer, learner Three settings 3Sites: One acceptor per site Quorum: two hosts 3SitesMaj: All hosts Quorum: four hosts, majority from each of two sites SimpleMaj: All hosts Quorum: any five processes UC Davis UT Austin Duke UC San Diego SimpleMaj has worse availability 3SitesMaj has better availability
15
DISC’05 15 The Bimodal model Sites are survivor sets S p is not a coterie “Throw out” survivor sets In general, optimal solution is NP-Complete Simple solution for this model Practical issues Practical for two sites More than two sites: open problem
16
DISC’05 16 Conclusions Coteries for multi-site systems Site failures: process failures not independent A new metric Counts covered survivor sets Multi-site hierarchical construction Practical Illustrated with Markov model Experiment shows better availability Using majority quorums is not a good idea Not optimal Poor performance Future work More experiments, more constructions, real deployment
17
DISC’05 17 END
18
DISC’05 18 Backup Slides
19
DISC’05 19 Failure models The multi-site hierarchical model A set F s of subsets of B An array F p One entry per site Each entry: subsets of processes in the site A survivor set S : FS F s B i FS: FP F p [i]:P\FP S B i FS:B i S = The bimodal model A set F s of subsets of B There is one site that is in no element of F s An array F p A survivor set S As in the previous model OR B i B : S = B i Processes B2B2 B1B1 F s = F p [1]={{ }: i {1,2,3}} 123123 i F p [2]={{ }: i {1,2,3}} i MSH: S p ={{ }: i, j,k,l {1,2,3} i j k l} i j kl B: S p ={{ }: i, j,k,l {1,2,3} i j k l} B ij kl
20
DISC’05 20 Bimodal construction Bimodal model By construction: Not all pairs of survivor sets intersect Discard survivor sets until remaining intersect Selecting optimally is NP-Complete Solution: Remove | B |-1 survivor sets Survivor sets containing processes from multiple sites pairwise intersect Construction is also optimal with respect to metric A A special case: Bsite All elements of F s have size f s All elements of F p [i] have the same size t, for every i E.g.: f s = 1, t = 1 B1B1 B2B2 Quorums
21
DISC’05 21 Site availability Goals Show that sites are unavailable frequently enough BIRN - Biomedical Informatics Research Network Test bed projects centered around brain imaging Currently: 19 universities, 26 research groups Availability Monthly basis Pings (BIRN-CC) Storage broker logs Site availability Jan/04-Aug/04 Availability under 100% On average in 5 out of the 8 months
22
DISC’05 22 Causes of site failures Misconfigured software Shared resources 1.Storage 2.Power circuits 3.Cooling pipes 4.Air conditioning 5.Network
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.