Download presentation
Presentation is loading. Please wait.
1
Self-organized fault-tolerant routing in P2P overlays Wojciech Galuba, Karl Aberer EPFL, Switzerland Zoran Despotovic, Wolfgang Kellerer Docomo Euro-Labs, Munich, Germany
2
2 © 2009 EPFL, Docomo Euro-Labs What are the P2P overlays? Underlying blue network (e.g. TCP/IP) Red peers come and go Peers form an overlay network (red links)
3
3 Routing in P2P overlays Overlays (usually) have their own address space Goal: provide point-to-point connectivity or rather point-to-service connectivity... © 2009 EPFL, Docomo Euro-Labs source destination
4
4 © 2009 EPFL, Docomo Euro-Labs What is the problem? Failures in large-scale systems are the norm, not the exception Permanent failures well understood Overlay maintenance algorithms Intermittent failures Transient network connectivity problems Peer overload, resource exhaustion Cannot be addressed in the same way as permanent failures
5
5 Existing solutions - multipath Multiple paths Goal: at least one path reaches destination © 2009 EPFL, Docomo Euro-Labs source destination - lossy peer
6
6 Existing solutions – iterative routing Source controls the routing process Successively ask nodes for their neighbors High redundancy if one node fails, use others © 2009 EPFL, Docomo Euro-Labs source destination - lossy peer j
7
7 Exisisting solutions - problems Heavily rely on message redundancy High bandwidth cost Do not learn from failures Likely to repeat the same routing mistakes © 2009 EPFL, Docomo Euro-Labs
8
8 Forward feedback protocol (FFP) Requestor determines the quality of the provided service decision binary: good or bad Feedback follows the same path as the request Feedback is obligatory, no feedback = bad feedback
9
9 A peer on the path Knows only its overlay neighbors Based on feedback, learns which neighbors are reliable Associates a success estimator with each (j, dz) pair: j – neighbor address dz – destination zone A success estimator is an exponentially averaged success rate, [0..1] Initially 0.5 Increased on positive feedback Decreased on negative feedback or feedback timeout © 2009 EPFL, Docomo Euro-Labs phpeer nh
10
10 Next hop selection Based on the state of the success estimators Pick a neighbor j for which the current value of a success estimator is the highest i.e. maximize the probability of success based on performance history © 2009 EPFL, Docomo Euro-Labs
11
11 The FFP protocol in action ph peer nh2 nh1 nh1 has history of success but starts failing peer switches to nh2 -- + - + + © 2009 EPFL, Docomo Euro-Labs
12
12 Cumulative effect The root cause of the failure receives the most negative feedback The links to the faulty peer are avoided by its neighbors - lossy peer © 2009 EPFL, Docomo Euro-Labs
13
13 Scalability through dest zoning O(log N) zones and O(log N) neighbors Total state at each node: O(log 2 N) © 2009 EPFL, Docomo Euro-Labs Increasing overlay distance to destination Increasing destination zone number 012 3 Exponentially decreasing zone size
14
14 Evaluation PlanetLab – a planetary-scale testbed 350 peers Conditions: Median system load: 5.3 Unpredictable delays and loss „natural” on PlanetLab Challege: introduce loss and delays in a Chord-like DHT place a tight 3s timeout on service requests see if protocols can route around faulty peers Workload: multi-source, multi-destination © 2009 EPFL, Docomo Euro-Labs
15
15 The line-up BASE – baseline, no fault-tolerance mechanisms MULTI4 – 4-way multipath routing ITER4 – Kademlia-based iterative routing, 4 parallel RPCs FFP © 2009 EPFL, Docomo Euro-Labs
16
16 Every 5 mins: a new 10% of peers become droppers Droppers drop all requests © 2009 EPFL, Docomo Euro-Labs
17
17 © 2009 EPFL, Docomo Euro-Labs
18
18 © 2009 EPFL, Docomo Euro-Labs Every 5 mins: a new 10% of peers become delayers Delayers delay all messages by 100-2000ms
19
19 25% of droppers arrive at 300s Convergence time depends on the traffic pattern © 2009 EPFL, Docomo Euro-Labs
20
20 Topology-oblivious routing Starts with all success estimators = 0.5 Empty routing tables Learn by trial and error Which neighbors are good forwarders for which destinations Routing tables are entirely emergent Initially random walks converge to reliable routes © 2009 EPFL, Docomo Euro-Labs
21
21 © 2009 EPFL, Docomo Euro-Labs Warmup: initially use the original Chord routing tables After some time switch to FFP routing tables
22
22 Summary FFP uses 2-5 times less bandwidth than MULTI and ITER Same or higher fault-tolerance More suitable for workloads: that are high-rate with fewer src-dest pairs © 2009 EPFL, Docomo Euro-Labs
23
23 Benefits of the self-org approach Decentralized scalability Topology-oblivious Applicable to many networks Agnostic to the causes of failures Robust to many failure scenarios Even those it was not designed for © 2009 EPFL, Docomo Euro-Labs
24
24 FFP used for secure routing in MANETs Additional crypto to prevent feedback forgery No PKI ! Tech report: http://tinyurl.com/ffp-manet © 2009 EPFL, Docomo Euro-Labs
25
25 FFP: a signaling meta-protocol Feedback is binary FFP can be used to signal any Boolean property of the routing path Service provisioning success (currently) Congestion on the path (ECN bit in IP) Congestion control Delay exceeding thresholds Latency-minimizing routing? What about non-Boolean? © 2009 EPFL, Docomo Euro-Labs
26
26 At 800s, 40% peers become droppers FFP’s performance is not affected by churn © 2009 EPFL, Docomo Euro-Labs
27
27 Loop-freedom Requests stuck in a loop negative feedback If requests exit a loop they have already accumulated a large delay potentially negative feedback All in all, the © 2007 EPFL, DoCoMo Euro-Labs
28
28 Compared to ant algorithms FFP designed with focus on scalability FFP designed for highly dynamic systems: P2P overlays MANETs Not exactly an ant algorithm: we found the „evaporation” to degrade performance © 2009 EPFL, Docomo Euro-Labs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.