Correctness of Gossip-Based Membership under Message Loss Maxim GurevichIdit Keidar Technion
The Setting Many nodes – n ▫ 10,000s, 100,000s, 1,000,000s, … Come and go ▫ Churn Fully connected network ▫ Like the Internet Every joining node knows some others ▫ (Initial) Connectivity 2
Membership: Each node needs to know some live nodes Each node has a view ▫ Set of node ids ▫ Supplied to the application ▫ Constantly refreshed Typical size – log n 3
Applications ▫ Gossip-based algorithm ▫ Unstructured overlay networks ▫ Gathering statistics Work best with random node sample ▫ Gossip algorithms converge fast ▫ Overlay networks are robust, good expanders ▫ Statistics are accurate 4
Modeling Membership Views Modeled as a directed graph uv w vyw… y 5
Modeling Protocols: Graph Transformations View is used for maintenance Example: push protocol ……w… ……z… uv w v…w… w z 6
Desirable Properties? Randomness ▫ View should include random samples Holy grail for samples: IID ▫ Each sample uniformly distributed ▫ Each sample independent of other samples Avoid spatial dependencies among view entries Avoid correlations between nodes ▫ Good load balance among nodes 7
What About Churn? Views should constantly evolve ▫ Remove failed nodes, add joining ones Views should evolve to IID from any state Minimize temporal dependencies ▫ Dependence on the past should decay quickly ▫ Useful for application requiring fresh samples 8
Global Markov Chain A global state – all n views in the system A protocol action – transition between global states Global Markov Chain G uv uv 9
Defining Properties Formally Small views ▫ Bounded dout(u) Load balance ▫ Low variance of din(u) From any starting state, eventually (In the stationary distribution of MC on G) ▫ Uniformity Pr(v u.view) = Pr(w u.view) ▫ Spatial independence Pr(v u. view| y w. view) = Pr(v u. view) ▫ Perfect uniformity + spatial independence load balance 10
Temporal Independence Time to obtain views independent of the past From an expected state ▫ Refresh rate in the steady state Would have been much longer had we considered starting from arbitrary state ▫ O(n 14 ) [Cooper09] 11
Existing Work: Practical Protocols Tolerates asynchrony, message loss Studied only empirically ▫ Good load balance [Lpbcast, Jelasity et al 07] ▫ Fast decay of temporal dependencies [Jelasity et al 07] ▫ Induce spatial dependence Push protocol u v w u v w w zz 12
v…z… Existing Work: Analysis Analyzed theoretically [Allavena et al 05, Mahlmann et al 06] ▫ Uniformity, load balance, spatial independence ▫ Weak bounds (worst case) on temporal independence Unrealistic assumptions – hard to implement ▫ Atomic actions with bi-directional communication ▫ No message loss ……z………w… uv w v…w… w z Shuffle protocol z * 13
Our Contribution : Bridge This Gap A practical protocol ▫ Tolerates message loss, churn, failures ▫ No complex bookkeeping for atomic actions Formally prove the desirable properties ▫ Including under message loss 14
…… Send & Forget Membership The best of push and shuffle Some view entries may be empty uv w v…w… uw uw 15
S&F: Message Loss Message loss ▫ Or no empty entries in v’s view uv w u v w 16
S&F: Compensating for Loss Edges (view entries) disappear due to loss Need to prevent views from emptying out Keep the sent ids when too little ids in view ▫ Push-like when views are too small uv w u v w 17
S&F: Advantages over Other Protocols No bi-directional communication ▫ No complex bookkeeping ▫ Tolerates message loss Simple ▫ Without unrealistic assumptions ▫ Amenable to formal analysis Easy to implement 18
Degree distribution ▫ Closed-form approximation without loss ▫ Degree Markov Chain with loss Stationary distribution of MC on the global graph G ▫ Uniformity ▫ Spatial Independence ▫ Temporal Independence Hold even under (reasonable) message loss! Key Contribution: Analysis 19
Degree Distribution without loss In all reachable graphs: ▫ dout(u) + 2din(u) = const ▫ Better than in a random graph – indegree bounded Uniform stationary distribution on reachable states in G Combinatorial approximation of degree distribution ▫ The fraction of reachable graphs with specified node degree ▫ Ignoring dependencies among nodes 20
Degree Distribution without Loss: Results Similar (better) to that of a random graph Validated by a more accurate Markov model 21
Setting Degree Thresholds to Compensate for Loss Note: dout(u) + 2din(u) = const invariant no longer holds – indegree not bounded 22
Key Contribution: Analysis 23 Degree distribution ▫ Closed-form approximation without loss ▫ Degree Markov Chain with loss Stationary distribution of MC on the global graph G ▫ Uniformity ▫ Spatial Independence ▫ Temporal Independence
… Degree Markov Chain Given loss rate, degree thresholds, and degree distributions Iteratively compute the stationary distribution Transitions without loss Transitions due to loss State corresponding to isolated node outdegree indegree … … … … … … … 24
Results Outdegree is bounded by the protocol Decreases with increasing loss Indegree is not bounded by the protocol Still, its variance is low, even under loss Typical overload at most 2x 25
Degree distribution ▫ Closed-form approximation without loss ▫ Degree Markov Chain with loss Stationary distribution of MC on the global graph G ▫ Uniformity ▫ Spatial Independence ▫ Temporal Independence Key Contribution: Analysis 26
Uniformity Simple! Nodes are identical Graphs where u v isomorphic to graphs where u w Same probability in stationary distribution 27
Degree distribution ▫ Closed-form approximation without loss ▫ Degree Markov Chain with loss Stationary distribution of MC on the global graph G ▫ Uniformity ▫ Spatial Independence ▫ Temporal Independence Key Contribution: Analysis 28
Decay of Spatial Dependencies Assume initially > 2/3 independent good expander For uniform loss < 15%, dependencies decay faster than they are created uv w u v w u does not delete the sent ids … … uw 29
Decay of Spatial Dependencies: Results 1 – 2 loss rate fraction of view entries are independent ▫ E.g., for loss rate of 3% more than 90% of entries are independent 30
Degree distribution ▫ Closed-form approximation without loss ▫ Degree Markov Chain with loss Stationary distribution of MC on the global graph G ▫ Uniformity ▫ Spatial Independence ▫ Temporal Independence Key Contribution: Analysis 31
Temporal Independence Start from expected state ▫ Uniform and spatially independent views High “expected conductance” of G Short mixing time ▫ While staying in the “good” component 32
Temporal Independence: Results Ids travel fast enough ▫ Reach random nodes in O(log n) hops ▫ Due to “sufficiently many” independent ids in views Dependence on past views decays within O(log n view size) time 33
Conclusions Formalized the desired properties of a membership protocol Send & Forget protocol ▫ Simple for both implementation and analysis Analysis under message loss ▫ Load balance ▫ Uniformity ▫ Spatial Independence ▫ Temporal Independence 34
Thank You