Controlling the Cost of Reliability in Peer-to-Peer Overlays

Controlling the Cost of Reliability in Peer-to-Peer Overlays
Michael Fairbanks

Authors: Ratul Mahajan, Miguel Castro, and Antony Rowstron
University of Washington, Seattle Microsoft Research, Cambridge Published in the Proceedings of the Second International Workshop on Peer-to-Peer Systems, 2003.

Outline Introduction Background Reliability and Cost Models
Reducing Maintenance Cost Node Arrivals and Departures Self-tuning Massive failures Applicability Beyond Pastry Related Work / Conclusions

Introduction Structured p2p overlay networks are a useful substrate for building distributed applications Provide hash table like primitive Overlays update routing state automatically

Scalability, self-organizing, and reliability have a cost
Intro, cont. Scalability, self-organizing, and reliability have a cost Current approach is to configure overlays statically to achieve desired reliability and performance Results high cost in common case, or poor reliability in worse than expected conditions

Intro, cont. Paper studies the cost of overlay maintenance in realistic environments Also present techniques to reduce maintenance cost by observing and adapting to the environment Self-tuning mechanism Mechanisms to deal with uncommon conditions

Background - PASTRY Nodes & objects assigned random identifiers from a 128-bit id space Primitive to send a message to a key that routes the message to the live node whose NodeID is numerically closest to the key in the id space

PASTRY, routing Routing state maintained by each node consists of the leaf set and the routing table Leaf set Routing Table On average only rows have non-empty entries

Routes a message to a key using no more than hops on average
PASTRY, routing cont. Routes a message to a key using no more than hops on average Routing state updated when nodes join and leave the overlay Periodic probing for failure detection Keep alive every Assume dead if no response in Liveness probe to each entry every

PASTRY Routing, cont. Faulty entries removed from the routing state but must replace them with other nodes Piggybanking in keep-alive messages Routing Table Maintenance: periodically asking a node in each row of the routing table for the corresponding row in its routing table

Environmental Model Assumptions:
Nodes join according to a Poisson process with rate l and leave according to an exponential distribution with rate u All nodes leave ungracefully without informing other nodes Nodes never return with the same nodeID

Reliability and Cost Models
Pastry forwards messages with UDP with no acknowledgments by default Applications can retransmit messages and set a flag indicating that they should be acknowledged at each hop Use message loss rate, L since it models both performance and reliability

Reliability and Cost Models, cont.
Cost of maintaining the overlay Each node generates control traffic for five operations: leaf set keep-alives, routing table entry probes, node joins, background routing table maintenance, and locality probes

Reliability and Cost Models, cont.
Verified using simulation

Reducing Maintenance Cost
Node Arrivals and departures in realistic environments Monitored 17,000 nodes is Gnutella overlay over 60 hour period Average session time 2.3 hours and the number of active nodes varied between 1300 and 2700 Large daily variations in failure rate

Reducing Maintenance Cost, cont.
Monitored 65,000 nodes in Microsoft corporate network probing each every hour for a month Average session time 37.7 hours Large daily and weekly variations in failure rate Current static configuration approach would require different settings for each environment, and also expensive configurations if good performance is desired at all times

Self-tuning Goal: enable an overlay to operate at the desired trade-off point between cost and reliability In the loss rate and control traffic equations there are 4 variables we can set:

Mechanisms to estimate N and u Density of nodeIDs in the leaf set to estimate N Value of u is estimated by using node failures in the routing table and the leaf set If nodes fail with rate u a node with M unique nodes in its routing state should observe K failures in time

Accuracy of u’s estimate depends on K; increasing K increases accuracy but decreases responsiveness Evaluated self-tuning mechanism using simulations driven by Gnutella trace Self-tuned – loss rate of 1% Hand-tuned – trial and error

Dealing With Massive Failures
Currently Pastry relies on invariant that each node has at least one live set member on each side Large leaf sets (l = 32) Use entries in routing table Smaller leaf sets -> less maintenance traffic

Dealing With Massive Failures, cont.
Algorithm Node n detects all members in one side of its leaf set are faulty, it selects the nodeID numerically closest Seed node returns entry in its routing state with nodeID closest to n’s Process repeated until no more live nodes with closer nodeIDs found Node closest nodeID inserted in inserted in leaf set and leaf set used to complete the repair

Improve by adding shadow leaf set Contains the l/2 nodes in the right leaf set of furthest leaf on the right, and l/2 nodes in the left leaf set of its furthest leaf on the left Inexpensive Leaf set repairs complete in one round using the nodes in the shadow

Failure large number nodes in a small time results in large number of faulty entries in node’s routing table which increases loss rate Long time in environments with low failure rate Failure several nodes in leaf set in the same probing period

Applicability beyond Pastry
Average number of hops CAN Chord Average size of routing state CAN 2d Chord Estimate N using density CAN size local and neighboring zones Chord density of the successor state Estimate u: failures in routing state

Related Work Previous work studied overlay maintenance under static conditions Overlay maintenance cost Efficient node failure discovery

Exploring different self-tuning goals
Future Work Exploring different self-tuning goals Operating at arbitrary points in reliability vs. cost curve, using different targets Choosing target that takes into account the application’s retransmission behavior Studying impact of failures on other performance criteria such as locality

Conclusions Examined cost of maintaining structure p2p overlay in realistic, dynamic conditions Techniques adjust control traffic based on observed failure rate and the ability to detect and recover from massive failures efficiently Results show that concerns over the overlay maintenance cost are no longer warranted

Conclusions, cont. Techniques enable high reliability and performance even in adverse conditions with low maintenance cost Done in context of Pastry can be extended to CAN, and Chord

Any Questions?

Controlling the Cost of Reliability in Peer-to-Peer Overlays

Similar presentations

Presentation on theme: "Controlling the Cost of Reliability in Peer-to-Peer Overlays"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Controlling the Cost of Reliability in Peer-to-Peer Overlays

Similar presentations

Presentation on theme: "Controlling the Cost of Reliability in Peer-to-Peer Overlays"— Presentation transcript:

Similar presentations

About project

Feedback