Tapestry: Scalable and Fault-tolerant Routing and Location Stanford Networking Seminar October 2001 Ben Y. Zhao

Slides:



Advertisements
Similar presentations
Peer-to-Peer Infrastructure and Applications Andrew Herbert Microsoft Research, Cambridge
Advertisements

Brocade: Landmark Routing on Peer to Peer Networks Ben Y. Zhao Yitao Duan, Ling Huang, Anthony Joseph, John Kubiatowicz IPTPS, March 2002.
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Supporting Rapid Mobility via Locality in an Overlay Network Ben Y. Zhao Anthony D. Joseph John D. Kubiatowicz Sahara / OceanStore Joint Session June 10,
Perspective on Overlay Networks Panel: Challenges of Computing on a Massive Scale Ben Y. Zhao FuDiCo 2002.
Tapestry Architecture and status UCB ROC / Sahara Retreat January 2002 Ben Y. Zhao
Qualifying Examination A Decentralized Location and Routing Infrastructure for Fault-tolerant Wide-area Applications John Kubiatowicz (Chair)Satish Rao.
Tapestry: Decentralized Routing and Location SPAM Summer 2001 Ben Y. Zhao CS Division, U. C. Berkeley.
One Hop Lookups for Peer-to-Peer Overlays Anjali Gupta, Barbara Liskov, Rodrigo Rodrigues Laboratory for Computer Science, MIT.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
*Towards A Common API for Structured Peer-to-Peer Overlays Frank Dabek, Ben Y. Zhao, Peter Druschel, John Kubiatowicz, Ion Stoica MIT, U. C. Berkeley,
David Choffnes, Winter 2006 OceanStore Maintenance-Free Global Data StorageMaintenance-Free Global Data Storage, S. Rhea, C. Wells, P. Eaton, D. Geels,
Approximate Object Location and Spam Filtering on Peer-to-Peer Systems Feng Zhou, Li Zhuang, Ben Y. Zhao, Ling Huang, Anthony D. Joseph and John D. Kubiatowicz.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Rapid Mobility via Type Indirection Ben Y. Zhao, Ling Huang, Anthony D. Joseph, John D. Kubiatowicz Computer Science Division, UC Berkeley IPTPS 2004.
The Oceanstore Regenerative Wide-area Location Mechanism Ben Zhao John Kubiatowicz Anthony Joseph Endeavor Retreat, June 2000.
SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy,
1 Towards a Common API for Structured Peer-to-Peer Overlays Frank Dabek, Ben Zhao, Peter Druschel, John Kubiatowicz, Ion Stoica Presented for Cs294-4 by.
15-441: Computer Networking Lecture 26: Networking Future.
Internet Indirection Infrastructure Ion Stoica UC Berkeley.
Tapestry Software Infrastructure Enabling Virtual Private Networks K. Balakrishnan, N. Chokshi, R. Huang, A. Konrad, S. Kumar, O. Sakdamnuson, B. Zhao.
Brocade Landmark Routing on Structured P2P Overlays Ben Zhao, Yitao Duan, Ling Huang Anthony Joseph and John Kubiatowicz (IPTPS 2002) Goals Improve routing.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Tapestry: Wide-area Location and Routing Ben Y. Zhao John Kubiatowicz Anthony D. Joseph U. C. Berkeley.
Tapestry : An Infrastructure for Fault-tolerant Wide-area Location and Routing Presenter: Chunyuan Liao March 6, 2002 Ben Y.Zhao, John Kubiatowicz, and.
Object Naming & Content based Object Search 2/3/2003.
Weaving a Tapestry Distributed Algorithms for Secure Node Integration, Routing and Fault Handling Ben Y. Zhao (John Kubiatowicz, Anthony Joseph) Fault-tolerant.
Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.
Tapestry on PlanetLab Deployment Experiences and Applications Ben Zhao, Ling Huang, Anthony Joseph, John Kubiatowicz.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
CITRIS Poster Supporting Wide-area Applications Complexities of global deployment  Network unreliability.
Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.
OceanStore/Tapestry Toward Global-Scale, Self-Repairing, Secure and Persistent Storage Anthony D. Joseph John Kubiatowicz Sahara Retreat, January 2003.
Or, Providing High Availability and Adaptability in a Decentralized System Tapestry: Fault-resilient Wide-area Location and Routing Issues Facing Wide-area.
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Tapestry: Finding Nearby Objects in Peer-to-Peer Networks Joint with: Ling Huang Anthony Joseph Robert Krauthgamer John Kubiatowicz Satish Rao Sean Rhea.
Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.
*Towards A Common API for Structured Peer-to-Peer Overlays Frank Dabek, Ben Y. Zhao, Peter Druschel, John Kubiatowicz, Ion Stoica MIT, U. C. Berkeley,
Tapestry: Decentralized Routing and Location System Seminar S ‘01 Ben Y. Zhao CS Division, U. C. Berkeley.
Locality Aware Mechanisms for Large-scale Networks Ben Y. Zhao Anthony D. Joseph John D. Kubiatowicz UC Berkeley Future Directions in Distributed Computing.
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
Arnold N. Pears, CoRE Group Uppsala University 3 rd Swedish Networking Workshop Marholmen, September Why Tapestry is not Pastry Presenter.
Overlay network concept Case study: Distributed Hash table (DHT) Case study: Distributed Hash table (DHT)
Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
1 More on Plaxton routing There are n nodes, and log B n digits in the id, where B = 2 b The neighbor table of each node consists of - primary neighbors.
Tapestry: A Resilient Global-scale Overlay for Service Deployment 1 Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Tapestry : An Infrastructure for Fault-tolerant Wide-area Location and Routing Presenter : Lee Youn Do Oct 5, 2005 Ben Y.Zhao, John Kubiatowicz, and Anthony.
Bruce Hammer, Steve Wallis, Raymond Ho
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
Peer-to-Peer Information Systems Week 12: Naming
OceanStore: An Architecture for Global-Scale Persistent Storage
Accessing nearby copies of replicated objects
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service
John D. Kubiatowicz UC Berkeley
An Overview of Peer-to-Peer
Rapid Mobility via Type Indirection
Tapestry: Scalable and Fault-tolerant Routing and Location
Peer-to-Peer Information Systems Week 12: Naming
Brocade: Landmark Routing on Peer to Peer Networks
Presentation transcript:

Tapestry: Scalable and Fault-tolerant Routing and Location Stanford Networking Seminar October 2001 Ben Y. Zhao

Stanford Networking Seminar, 10/20012 Challenges in the Wide-area Trends: Exponential growth in CPU, storage Network expanding in reach and b/w Can applications leverage new resources? Scalability: increasing users, requests, traffic Resilience: more components  inversely low MTBF Management: intermittent resource availability  complex management schemes Proposal: an infrastructure that solves these issues and passes benefits onto applications

Stanford Networking Seminar, 10/20013 Driving Applications Leverage of cheap & plentiful resources: CPU cycles, storage, network bandwidth Global applications share distributed resources Shared computation: SETI, Entropia Shared storage OceanStore, Gnutella, Scale-8 Shared bandwidth Application-level multicast, content distribution networks

Stanford Networking Seminar, 10/20014 Key: Location and Routing Hard problem: Locating and messaging to resources and data Goals for a wide-area overlay infrastructure Easy to deploy Scalable to millions of nodes, billions of objects Available in presence of routine faults Self-configuring, adaptive to network changes Localize effects of operations/failures

Stanford Networking Seminar, 10/20015 Talk Outline Motivation Tapestry overview Fault-tolerant operation Deployment / evaluation Related / ongoing work

Stanford Networking Seminar, 10/20016 What is Tapestry? A prototype of a decentralized, scalable, fault- tolerant, adaptive location and routing infrastructure (Zhao, Kubiatowicz, Joseph et al. U.C. Berkeley) Network layer of OceanStore Routing: Suffix-based hypercube Similar to Plaxton, Rajamaran, Richa (SPAA97) Decentralized location: Virtual hierarchy per object with cached location references Core API: publishObject(ObjectID, [serverID]) routeMsgToObject(ObjectID) routeMsgToNode(NodeID)

Stanford Networking Seminar, 10/20017 Routing and Location Namespace (nodes and objects) 160 bits  2 80 names before name collision Each object has its own hierarchy rooted at Root f (ObjectID) = RootID, via a dynamic mapping function Suffix routing from A to B At h th hop, arrive at nearest node hop(h) s.t. hop(h) shares suffix with B of length h digits Example: 5324 routes to 0629 via 5324  2349  1429  7629  0629 Object location: Root responsible for storing object’s location Publish / search both route incrementally to root

Stanford Networking Seminar, 10/20018 Publish / Lookup Publish object with ObjectID: // route towards “virtual root,” ID=ObjectID For (i=0, i<Log 2 (N), i+=j) { //Define hierarchy j is # of bits in digit size, (i.e. for hex digits, j = 4 ) Insert entry into nearest node that matches on last i bits If no matches found, deterministically choose alternative Found real root node, when no external routes left Lookup object Traverse same path to root as publish, except search for entry at each node For (i=0, i<Log 2 (N), i+=j) { Search for cached object location Once found, route via IP or Tapestry to object

Stanford Networking Seminar, 10/ Tapestry Mesh Incremental suffix-based routing NodeID 0x43FE NodeID 0x13FE NodeID 0xABFE NodeID 0x1290 NodeID 0x239E NodeID 0x73FE NodeID 0x423E NodeID 0x79FE NodeID 0x23FE NodeID 0x73FF NodeID 0x555E NodeID 0x035E NodeID 0x44FE NodeID 0x9990 NodeID 0xF990 NodeID 0x993E NodeID 0x04FE NodeID 0x43FE

Stanford Networking Seminar, 10/ Routing in Detail Neighbor Map For “5712” (Octal) Routing Levels 1234 xxx xxx0 xxx3 xxx4 xxx5 xxx6 xxx7 xx xx22 xx32 xx42 xx52 xx62 xx72 x012 x112 x212 x312 x412 x512 x Example: Octal digits, 2 12 namespace, 5712  7510

Stanford Networking Seminar, 10/ Object Location Randomization and Locality

Stanford Networking Seminar, 10/ Talk Outline Motivation Tapestry overview Fault-tolerant operation Deployment / evaluation Related / ongoing work

Stanford Networking Seminar, 10/ Fault-tolerant Location Minimized soft-state vs. explicit fault-recovery Redundant roots Object names hashed w/ small salts  multiple names/roots Queries and publishing utilize all roots in parallel P(finding reference w/ partition) = 1 – (1/2) n where n = # of roots Soft-state periodic republish 50 million files/node, daily republish, b = 16, N = 2 160, 40B/msg, worst case update traffic: 156 kb/s, expected traffic w/ 2 40 real nodes: 39 kb/s

Stanford Networking Seminar, 10/ Fault-tolerant Routing Strategy: Detect failures via soft-state probe packets Route around problematic hop via backup pointers Handling: 3 forward pointers per outgoing route (2 backups) 2 nd chance algorithm for intermittent failures Upgrade backup pointers and replace Protocols: First Reachable Link Selection (FRLS) Proactive Duplicate Packet Routing

Stanford Networking Seminar, 10/ Summary Decentralized location and routing infrastructure Core routing similar to PRR97 Distributed algorithms for object-root mapping, node insertion / deletion Fault-handling with redundancy, soft-state beacons, self-repair Decentralized and scalable, with locality Analytical properties Per node routing table size: bLog b (N) N = size of namespace, n = # of physical nodes Find object in Log b (n) overlay hops

Stanford Networking Seminar, 10/ Talk Outline Motivation Tapestry overview Fault-tolerant operation Deployment / evaluation Related / ongoing work

Stanford Networking Seminar, 10/ Deployment Status Java Implementation in OceanStore Running static Tapestry Deploying dynamic Tapestry with fault- tolerant routing Packet-level simulator Delay measured in network hops No cross traffic or queuing delays Topologies: AS, MBone, GT-ITM, TIERS ns2 simulations

Stanford Networking Seminar, 10/ Evaluation Results Cached object pointers Efficient lookup for nearby objects Reasonable storage overhead Multiple object roots Improves availability under attack Improves performance and perf. stability Reliable packet delivery Redundant pointers approximate optimal reachability FRLS, a simple fault-tolerant UDP protocol

Stanford Networking Seminar, 10/ First Reachable Link Selection Use periodic UDP packets to gauge link condition Packets routed to shortest “good” link Assumes IP cannot correct routing table in time for packet delivery ABCDEABCDE IP Tapestry No path exists to dest.

Stanford Networking Seminar, 10/ Talk Outline Motivation Tapestry overview Fault-tolerant operation Deployment / evaluation Related / ongoing work

Stanford Networking Seminar, 10/ Bayeux Global-scale application-level multicast (NOSSDAV 2001) Scalability Scales to > 10 5 nodes Self-forming member group partitions Fault tolerance Multicast root replication FRLS for resilient packet delivery More optimizations Group ID clustering for better b/w utilization

Stanford Networking Seminar, 10/ FE ABFE E 73FE 423E 793E 44FE 9990 F E 04FE 093E 29FE F9FE 79FE 555E 035E 23FE 43FE Multicast Root Receiver 793E 79FE 793E 79FE 793E 79FE Bayeux: Multicast

Stanford Networking Seminar, 10/ FE ABFE E 73FE 423E 793E 44FE 9990 F E 04FE 093E 29FE F9FE 79FE 555E 035E 23FE 43FE Multicast Root Multicast Root Receiver JOIN Bayeux: Tree Partitioning JOIN

Stanford Networking Seminar, 10/ Overlay Routing Networks CAN: Ratnasamy et al., (ACIRI / UCB) Uses d-dimensional coordinate space to implement distributed hash table Route to neighbor closest to destination coordinate Chord: Stoica, Morris, Karger, et al., (MIT / UCB) Linear namespace modeled as circular address space “Finger-table” point to logarithmic # of inc. remote hosts Pastry: Rowstron and Druschel (Microsoft / Rice ) Hypercube routing similar to PRR97 Objects replicated to servers by name Fast Insertion / Deletion Constant-sized routing state Unconstrained # of hops Overlay distance not prop. to physical distance Simplicity in algorithms Fast fault-recovery Log 2 (N) hops and routing state Overlay distance not prop. to physical distance Fast fault-recovery Log(N) hops and routing state Data replication required for fault-tolerance

Stanford Networking Seminar, 10/ Ongoing Research Fault-tolerant routing Reliable Overlay Networks (MIT) Fault-tolerant Overlay Routing (UCB) Application-level multicast Bayeux (UCB), CAN (AT&T), Scribe and Herald (Microsoft) File systems OceanStore (UCB) PAST (Microsoft / Rice) Cooperative File System (MIT)

Stanford Networking Seminar, 10/ For More Information Tapestry: OceanStore: Related papers: