Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ben Y. Zhao University of California at Berkeley

Similar presentations


Presentation on theme: "Ben Y. Zhao University of California at Berkeley"— Presentation transcript:

1 An Overlay Infrastructure for Decentralized Object Location and Routing
Ben Y. Zhao University of California at Berkeley Computer Science Division

2 Peer-based Distributed Computing
Cooperative approach to large-scale applications peer-based: available resources scale w/ # of participants better than client/server: limited resources & scalability Large-scale, cooperative applications are coming content distribution networks (e.g. FastForward) large-scale backup / storage utilities leverage peers’ storage for higher resiliency / availability cooperative web caching application-level multicast video on-demand, streaming movies Maybe no p2p logos. Give negative impression Less focus on legitimacy, more on properties September 20, 2018

3 What Are the Technical Challenges?
File system: replicate files for resiliency/performance how do you find close by replicas? how does this scale to millions of users? billions of files? September 20, 2018

4 Node Membership Changes
Nodes join and leave the overlay, or fail data or control state needs to know about available resources node membership management a necessity September 20, 2018

5 A Fickle Internet Internet disconnections are not rare (UMichTR98,IMC02) TCP retransmission is not enough, need route-around IP route repair takes too long: IS-IS  5s, BGP  3-15mins good end-to-end performance requires fast response to faults People will just say TCP, TCP is good for retransmitting to get past congestion, but if there is a disconnected route, TCP will just keep going til it times out. Then application has to deal with error, which is very hard. Or if it’s heavy congestion, then TCP scales back transmission rate heavily, result is similar to large packet loss to application. September 20, 2018

6 An Infrastructure Approach
First generation of large-scale apps: vertical approach Hard problems, difficult to get right instead, solve common challenges once build single overlay infrastructure at application layer FastForward application overlay Yahoo IM SETI presentation data location session data location data location dynamic membership Explicit mention fact that all apps solve same problems efficient, scalable data location transport dynamic membership dynamic membership reliable comm. network dynamic node membership algorithms reliable comm. link reliable comm. reliable communication physical Internet September 20, 2018

7 Personal Research Roadmap
service discovery service XSet lightweight XML DB Mobicom 99 5000+ downloads TSpaces PRR 97 Tapestry multicast (Bayeux) NOSSDAV 02 file system (Oceanstore) ASPLOS99/FAST03 spam filtering (SpamWatch) Middleware 03 rapid mobility (Warp) IPTPS 04 a p p l i c a t i o n s robust dynamic algorithms resilient overlay routing DOLR structured overlay APIs SPAA 02 / TOCS ICNP 03 IPTPS 03 WAN deployment (1500+ downloads) landmark routing (Brocade) IPTPS 02 modeling of non- stationary datasets 4300 visits (8-9/day), 1300 downloads 50+ countries 421 academic, 193 industry, 55 labs Search engine co, european bank, open source film distribution, hospital, TV station JSAC 04 September 20, 2018

8 Talk Outline Motivation Decentralized object location and routing
Resilient routing Tapestry deployment performance Wrap-up September 20, 2018

9 What should this infrastructure look like?
here is one appealing direction…

10 Structured Peer-to-Peer Overlays
Node IDs and keys from randomized namespace (SHA-1) incremental routing towards destination ID each node has small set of outgoing routes, e.g. prefix routing log (n) neighbors per node, log (n) hops between any node pair ID: ABCE ABC0 To: ABCD AB5F A930 September 20, 2018

11 Related Work Unstructured Peer to Peer Approaches
Napster, Gnutella, KaZaa probabilistic search (optimized for the hay, not the needle) locality-agnostic routing (resulting in high network b/w costs) Structured Peer to Peer Overlays the first protocols (2001): Tapestry, Pastry, Chord, CAN then: Kademlia, SkipNet, Viceroy, Symphony, Koorde, Ulysseus… distinction: how to choose your neighbors Tapestry, Pastry: latency-optimized routing mesh distinction: application interface distributed hash table: put (key, data); data = get (key); Tapestry: decentralized object location and routing September 20, 2018

12 Defining the Requirements
efficient routing to nodes and data low routing stretch (ratio of latency to shortest path distance) flexible data location applications want/need to control data placement allows for application-specific performance optimizations directory interface publish (ObjID), RouteToObj(ObjID, msg) resilient and responsive to faults more than just retransmission, route around failures reduce negative impact (loss/jitter) on the application Data can stay in place and be mutable!! September 20, 2018

13 Decentralized Object Location & Routing
routeobj(k) backbone routeobj(k) k publish(k) k where objects are placed is orthogonal, we’re providing a slightly lower level abstraction and allowing application to place data. data placement strategy is area of research redirect data traffic using log(n) in-network redirection pointers average # of pointers/machine: log(n) * avg files/machine keys to performance proximity-enabled routing mesh with routing convergence September 20, 2018

14 Why Proximity Routing? 01234 01234 Fewer/shorter IP hops: shorter e2e latency, less bandwidth/congestion, less likely to cross broken/lossy links September 20, 2018

15 Performance Impact (Proximity)
Simulated Tapestry w/ and w/o proximity on 5000 node transit-stub network Measure pair-wise routing stretch between 200 random nodes September 20, 2018

16 DOLR vs. Distributed Hash Table
DHT: hash content  name  replica placement modifications  replicating new version into DHT DOLR: app places copy near requests, overlay routes msgs to it September 20, 2018

17 Performance Impact (DOLR)
simulated Tapestry w/ DOLR and DHT interfaces on 5000 node T-S measure route to object latency from clients in 2 stub networks DHT: 5 object replicas DOLR: 1 replica placed in each stub network September 20, 2018

18 Talk Outline Motivation Decentralized object location and routing
Resilient and responsive routing Tapestry deployment performance Wrap-up September 20, 2018

19 How do you get fast responses to faults?
Response time = fault-detection + alternate path discovery time to switch

20 Fast Response via Static Resiliency
Reducing fault-detection time monitor paths to neighbors with periodic UDP probes O(log(n)) neighbors: higher frequency w/ low bandwidth exponentially weighted moving average for link quality estimation avoid route flapping due to short term loss artifacts loss rate: Ln = (1 - )  Ln-1 +   p Eliminate synchronous backup path discovery actively maintain redundant paths, redirect traffic immediately repair redundancy asynchronously create and store backups at node insertion restore redundancy via random pair-wise queries after failures End result fast detection + precomputed paths = increased responsiveness September 20, 2018

21 Routing Policies Use estimated overlay link quality to choose shortest “usable” link Use shortest overlay link with minimal quality > T Alternative policies prioritize low loss over latency use least lossy overlay link use path w/ minimal “cost function” cf = x latency + y loss rate This is not perfect because of possible correlated failures, but can leverage existing work on failure independent overlay construction If there’s just 1 link, nobody can win Todo: consider removing correlated failures bullet September 20, 2018

22 Talk Outline Motivation Decentralized object location and routing
Resilient and responsive routing Tapestry deployment performance Wrap-up September 20, 2018

23 Tapestry, a DOLR Protocol
Routing based on incremental prefix matching Latency-optimized routing mesh nearest neighbor algorithm (HKRZ02) supports massive failures and large group joins Built-in redundant overlay links 2 backup links maintained w/ each primary Use “objects” as endpoints for rendezvous nodes publish names to announce their presence e.g. wireless proxy publishes nearby laptop’s ID e.g. multicast listeners publish multicast session name to self organize Rendezvous point September 20, 2018

24 Weaving a Tapestry inserting node (0123) into network
route to own ID, find 012X nodes, fill last column request backpointers to 01XX nodes measure distance, add to rTable prune to nearest K nodes repeat 2—4 ID = 0123 XXXX 0XXX 01XX 012X 1XXX 2XXX 3XXX 00XX 02XX 03XX 010X 011X 013X 0120 0121 0122 Existing Tapestry September 20, 2018

25 Implementation Performance
Java implementation lines in core Tapestry, downloads Micro-benchmarks per msg overhead: ~ 50s, most latency from byte copying performance scales w/ CPU speedup 5KB msgs on P-IV 2.4Ghz: throughput ~ 10,000 msgs/sec Routing stretch route to node: < 2 route to objects/endpoints: < 3 higher stretch for close by objects September 20, 2018

26 Responsiveness to Faults (PlanetLab)
300 660 = 0.2 = 0.4 20 runs for each point B/W  network size N, N=300  7KB/s/node, N=106  20KB/s sim: if link failure < 10%, can route around 90% of survivable failures September 20, 2018

27 Stability Under Membership Changes
kill nodes constant churn large group join success rate (%) Routing operations on 40 node Tapestry cluster Churn: nodes join/leave every 10 seconds, average lifetime = 2mins September 20, 2018

28 Talk Outline Motivation Decentralized object location and routing
Resilient and responsive routing Tapestry deployment performance Wrap-up September 20, 2018

29 Lessons and Takeaways Consider system constraints in algorithm design
limited by finite resources (e.g. file descriptors, bandwidth) simplicity wins over small performance gains easier adoption and faster time to implementation Wide-area state management (e.g. routing state) reactive algorithm for best-effort, fast response proactive periodic maintenance for correctness Naïve event programming model is too low-level much code complexity from managing stack state important for protocols with asychronous control algorithms need explicit thread support for callbacks / stack management September 20, 2018

30 Future Directions Ongoing work to explore p2p application space
resilient anonymous routing, attack resiliency Intelligent overlay construction router-level listeners allow application queries efficient meshes, fault-independent backup links, failure notify Deploying and measuring a lightweight peer-based application focus on usability and low overhead p2p incentives, security, deployment meet the real world A holistic approach to overlay security and control p2p good for self-organization, not for security/ management decouple administration from normal operation explicit domains / hierarchy for configuration, analysis, control Interplay between two goals in first class infrastructure September 20, 2018

31 Thanks! Questions, comments?

32 Impact of Correlated Events
+ + = event handler ? A B C Network ? ? ? ? historically, events largely independent web server requests focus on throughput event relationships becoming increasingly prevalent peer to peer control messages large scale data aggregation networks action X requires A *and* B *and* C for progress correlated requests: A+B+CD e.g. online continuous queries, sensor aggregation, p2p control layer, streaming data mining web / application servers independent requests maximize individual throughput September 20, 2018

33 Some Details Simple fault detection techniques
periodically probe overlay links to neighbors exponentially weighted moving average for link quality estimation avoid route flapping due to short term loss artifacts loss rate: Ln = (1 - )  Ln-1 +   p p = instantaneous loss rate,  = filter constant other techniques topics of open research How do we get and repair the backup links? each hop has flexible routing constraint e.g. in prefix routing, 1st hop just requires 1 fixed digit backups always available until last hop to destination create and store backups at node insertion restore redundancy via random pair-wise queries after failures e.g. to replace 123X neighbor, talk to local 12XX neighbors September 20, 2018

34 Route Redundancy (Simulator)
Simulation constructs shortest paths to emulate IP routes Now break links, then look at reachability Simulation of Tapestry, 2 backup paths per routing entry 2 backups: low maintenance overhead, good resiliency September 20, 2018

35 Another Perspective on Reachability
Portion of all pair-wise paths where no failure-free paths remain A path exists, but neither IP nor FRLS can locate the path Portion of all paths where IP and FRLS both route successfully FRLS finds path, where short-term IP routing fails September 20, 2018

36 Single Node Software Architecture
application programming interface applications Dynamic Tap. Patchwork core router distance map Remove small message text network SEDA event-driven framework Java Virtual Machine September 20, 2018

37 Related Work Unstructured Peer to Peer Applications
Napster, Gnutella, KaZaa probabilistic search, difficult to scale, inefficient b/w Structured Peer to Peer Overlays Chord, CAN, Pastry, Kademlia, SkipNet, Viceroy, Symphony, Koorde, Coral, Ulysseus, … routing efficiency application interface Resilient routing traffic redirection layers Detour, Resilient Overlay Networks (RON), Internet Indirection Infrastructure (I3) our goals: scalability, in-network traffic redirection September 20, 2018

38 Node to Node Routing (PlanetLab)
Median=31.5, 90th percentile=135 Ratio of end-to-end latency to ping distance between nodes All node pairs measured, placed into buckets September 20, 2018

39 Object Location (PlanetLab)
90th percentile=158 Ratio of end-to-end latency to client-object ping distance Local-area stretch improved w/ additional location state September 20, 2018

40 Micro-benchmark Results (LAN)
100mb/s Per msg overhead ~ 50s, latency dominated by byte copying Performance scales with CPU speedup For 5K messages, throughput = ~10,000 msgs/sec September 20, 2018

41 Structured Peer to Peer Overlay
Traffic Tunneling Legacy Node B Legacy Node A B P’(B) A, B are IP addresses register register Proxy Proxy put (hash(B), P’(B)) P’(B) get (hash(B)) put (hash(A), P’(A)) Structured Peer to Peer Overlay Not a unique engineering approach, similar approach at i3 Store mapping from end host IP to its proxy’s overlay ID Similar to approach in Internet Indirection Infrastructure (I3) September 20, 2018

42 Constrained Multicast
Used only when all paths are below quality threshold Send duplicate messages on multiple paths Leverage route convergence Assign unique message IDs Mark duplicates Keep moving window of IDs Recognize and drop duplicates Limitations Assumes loss not from congestion Ideal for local area routing 2225 2299 2274 2286 2046 2281 2530 ? ? ? 1111 September 20, 2018

43 Link Probing Bandwidth (PL)
Bandwidth increases logarithmically with overlay size Medium sized routing overlays incur low probing bandwidth September 20, 2018


Download ppt "Ben Y. Zhao University of California at Berkeley"

Similar presentations


Ads by Google