On Heterogeneous Overlay Construction and Random Node Selection in Unstructured P2P Networks Presenter: 游創文
Outline Motivation Relate work Algorithm for graph construction Selection walks on the built graph Experiments Conclusion
Motivation Unstructed P2P and overlay networks –Use random walks to build unstructured graphs and do node selection Requirements to build a good graph and do random selection –Load balance –Heterogeneity –Simplicity –Scalability Design heterogeneous graph building and random node selection algorithms –Practical to deploy and functional over a wide range of requirements
Unstructured P2P overlay network Organize peers in a random graph in flat or hierarchical manners (e.g. super-peers layer) –Random graph construction Two steps involved in function of P2P Systems –Query Phase random node selection on top of the graph –Delivery Phase
Related work Gia –Unstructured file sharing system that uses random walk –Give high capacity nodes higher degrees and more information to store –Not give control over degree or load SCAMP –Build graphs where the average node degree is proportional to the log of the number of nodes –Not give enough control over node degree or load
Related work Araneola –Build regular graphs that could potentially be used for random selection –Doesn’t discuss the case of heterogeneity –Assumption: the existing nodes contacted by newly joining nodes are uniformly picked –Run constant background protocol Law and Siu –A distributed mechanism to construct a regular random graphs –Not handle heterogeneity and is vulnerable to unexpected node departures. SelfLoops & Iterative scaling –Random walk methods not suitable to use for graph construction or to accommodate heterogeneity –Extend by this paper
Initial node discovery For any node that wants to join the graph –know at least one already existing member in the graph Light-weight approach –A small set (of about 10) of the most recently joined nodes
Algorithms for graph construction (requirements) Truly random walk –Selected uniformly randomly –High degree nodes high probability to be selected –More severe for build walks (Compound itself as the network grows) Add bias to ensure high degree nodes don’t keep collecting more links Heterogeneous requirement –Higher capacity nodes have proportionally higher nodes degrees than lower capacity nodes
Algorithms for graph construction Node i in the graph to establish a fixed number of links Ki, called outlinks, with randomly select nodes in the graph –Lead to nodes obtaining roughly as many inlinks as outlinks A node never has the option of refusing a request to create an inlink –For simplicity –The bias tends to prevent the need for this Counteract the effects of early joiners obtaining more inlinks –Stronger bias –Actively manage each node’s indegrees Nodes with high indegrees to move an inlink to nodes with low indegrees
Taxonomy of Biased walks Biased-halting –Next hop at a node is picked uniformly from all links at the node. –Ended at each node with a random probability Inversely to the degree of the node –Average length could be fixed –Ex: SelfLoops, SCAMP Biased-forwarding –Selection of next hop in the walk Weighted against high degree nodes –# of hops is set at a fixed constant H –InlinkInvProb, TotalInvProb, Iterative Scaling
Taxonomy of Biased walks Tradeoff –Biased-forwarding state exchange with their neighbors (their degree..) –biased-halting tend to unfairly load high- degree nodes (tends to be forwarded to high- degree nodes)
SelfLoops (SL) Emulate a graph with perfectly uniform node degrees –adding virtual links to oneself (self loops). Original work –not support heterogeneity or provide the needed bias for build walks Modification –For selection walks the virtual degree of each node α its outdegree –For build walks The virtual degree α (outdegree 2 /indegree)
Refresh of SelfLoops (SL) A node discards one of its links and chooses another Steady state –Net change in the expected indegree of i due to the refresh is zero loses an inlink = gains an inlink c*indeg(i) = c’ * (outdeg(i) 2 /indeg(i) ) indeg(i) = c’’ * outdeg(i) c’’ = 1 Much harder to estimate the virtual hop length to achieve a desired average real hop-length –Conservative option is to use a large enough value results a larger average hop-length
Problems of biased-halting Any given walk can be quite short Hybrid approach –If the expected walk length was h hops –First h/2 hops, use on of the biased- forwarding –Later half, use selfloops –Call this as hybrid TotalInvProb-SelfLoops
Inverse-Probability walks Probability of forwarding a walk to a node –InlinkInvProb, IP α ( outdegree/ indegree), used for build walks –TotalInvProb, or TIP α ( outdegree/ total degree), used for selection walks
Iterative Scaling (IS) Biased-forwarding walk –Each node assigns outgoing and incoming weights to each of its links –Iterative computation across all links Derive the elements of a matrix when the row and column sums are known –The outgoing (incoming) weight of a link the node’s probability (perception of the probability) that it’s picked during a random walk from the other (the other end of the link)
Iterative Scaling (IS) Incoming weight assigned by a to link l = the outgoing weight assigned by b to link l –set Wt A IN (l) = Wt B out (l) Bring the system to a state where at every node both the incoming and outgoing weights add to 1 each. A sufficiently long random walk is equally likely to end at any node. Heterogeneity –Select Ideal probability that a node is selected α its outdegree –Build Ideal probability is α (outdegree 2 /indegree) Update at A –The incoming weight for each link A B is scaled by the estimated probability of a walk reaching B before the normalization is performed.
N A WNAWNA
Some issues Exchanging neighbor information (IP or IS) –Each node send a message to all of its neighbors every time it experiences a link change –Piggyback the neighbor information on other message –Not good for high-degree graphs Graph refreshes –Periodically remove a outlink and replace it with another randomly –Overhead, graph changes, more complex
SwapLinks (SW) Inspire from Law and Siu Modify It to handle heterogeneity and robust to unexpected node departures Two kinds of walks –OnlyInLinks – each node chooses uniformly randomly among its inlinks only –OnlyOutLinks – each node chooses uniformly randomly among its outlinks only
SwapLinks (SW) Lost a outlink –Relace the outlink with a new neighbor O discovered with an OnlyInLinks build walk Lost an inlink –Check if its indegree < its outdegree If so, use OnlyOutLinks to discover a node I No need to exchange state with neighbors BI Neighbor of I BI
Selection walks How a graph is walked Four ways –TotalInverseProb (TIP), Iterative Scaling (IS), SelfLoop (SL), hybrid TIP-SL (Hyb-TIP-SL)
Experiments setup Static simulation –Fully added or deleted before the next node is added or deleted –No packet loss Two scenarios –Shrink (build then shrink to 25% of its original) –Churn (build then churn, 2N churn events, expected network size is N) Setup –N = 5000, build walk length of 10 hops –Except In the case of heterogeneity, a constant outdegree of 5 at every node 10M selection walks, the graph has M nodes at the time of selection –Look the distribution of the selected nodes and the selection load balance
Four graph-construction techniques (Homogeneous case) Graph with or without refreshes (exception SwapLink) Status update (IS and IP) –1-hop updates of any link change –piggybacking SCAMP, TrueRandom (each node forms 5 (expected) outlinks with distinct uniformly chosen nodes) Load balance under node addition –10 nodes are added and the load placed on previously existing nodes (Repeat 100 times) –Average load per node (AvgBLoad-Add ), Standard deviation of the load ( Dev(BLoad-Add) ) –focus on the load placed on already existing nodes –The same load on the network irrespective of the size of the graph
Four graph-construction techniques (Homogeneous case) Node departure –M/5 nodes randomly –Average load per node AvgBLoad-Kill and Dev(BLoad-Kill)
Result of four graph-construction techniques (Homogeneous case)
Graph Construction under heterogeneity Heterogeneous –Expected outdegree is also 5 Each of the N nodes is a default-degree node with probability 0.5 (outdegree of 5) A heterogeneous node with probability 0.5 (outdegree uniformly randomly from [2, 50]) –Churn or shrink is performed after all nodes have joined and formed all their outlinks. The modifications made –The walk prob indeed work as intened
Graph Construction under heterogeneity Average indegree and the build load grow linearly with the outdegree
Quality of random selection on homogeneous graphs Distribution of the selected nodes, distribution of load imposed by the select walks –Use Standard deviation of hits rate to represent Not employ piggybacking on the selection walks –# of selection walks is comparatively large –Start from a single node, log the number of hits each node receives 1. All nodes same outdegree of 5 2. # of walks = 10 *current number of nodes in the graph
Quality of random selection on homogeneous graphs Selection load seen by a node –# of selection walks that pass through or end at the node The origins of the walks is distributed across the graph –Load distribution should be uniform
Selection with heterogeneity Quality of selection when nodes have different outdegrees Same setup as graph building under heterogeneity Running random selection walks All selection methos are function satisfactorily well Distribution of selection hits as a function of the outdegree
Scaling to larger sizes Measure # of hops it takes to obtain a random selection distribution whose standard diviation is within 5% of that of true random distribution Graphs are churned before the selections # of selection = 10 * the network size
Scaling to larger sizes Swaplink –builds good graphs even at large scale
Conclusion A mechanism for building random graphs and doing random selection (SwapLinks) –Simple –Scalable –Good control over heterogeneity –Enable the desired random selection by setting of only a single parameter (desired node degree) of each node Future work –Implement and test in a real setting –Compare with a random selection strategy that uses DHTs. –Consider misbehaving nodes –Establishing proximal neighbors (low latency or high BW)