Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.

Slides:



Advertisements
Similar presentations
Peer-to-Peer Infrastructure and Applications Andrew Herbert Microsoft Research, Cambridge
Advertisements

CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
Introduction to Computer Networks Spanning Tree 1.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Packet Switching COM1337/3501 Textbook: Computer Networks: A Systems Approach, L. Peterson, B. Davie, Morgan Kaufmann Chapter 3.
Storage management and caching in PAST Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Impact Analysis of Cheating in Application Level Multicast s 1090176 Masayuki Higuchi.
SplitStream by Mikkel Hesselager Blanné Erik K. Aarslew-Jensen.
SplitStream: High- Bandwidth Multicast in Cooperative Environments Monica Tudora.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims.
SCRIBE A large-scale and decentralized application-level multicast infrastructure.
Ranveer Chandra , Kenneth P. Birman Department of Computer Science
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
Scribe An application level multicast infrastructure Kasper Egdø and Morten Bjerre.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Bowstron & Peter Druschel Presented by: Long Zhang.
Presented by Tom Ternquist CS /28/10
SplitStream: High-Bandwidth Multicast in Cooperative Environments Marco Barreno Peer-to-peer systems 9/22/2003.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.
Secure routing for structured peer-to-peer overlay networks Miguel Castro, Ayalvadi Ganesh, Antony Rowstron Microsoft Research Ltd. Peter Druschel, Dan.
Scalable Application Layer Multicast Suman Banerjee Bobby Bhattacharjee Christopher Kommareddy ACM SIGCOMM Computer Communication Review, Proceedings of.
Pastry Partially borrowed for Gabi Kliot. Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems  Antony Rowstron.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Slide Set 15: IP Multicast. In this set What is multicasting ? Issues related to IP Multicast Section 4.4.
SCRIBE: A large-scale and decentralized application-level multicast infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec and Antony Rowstron.
Secure routing for structured peer-to-peer overlay networks (by Castro et al.) Shariq Rizvi CS 294-4: Peer-to-Peer Systems.
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
SCALLOP A Scalable and Load-Balanced Peer- to-Peer Lookup Protocol for High- Performance Distributed System Jerry Chou, Tai-Yi Huang & Kuang-Li Huang Embedded.
Application Layer Multicast for Earthquake Early Warning Systems Valentina Bonsi - April 22, 2008.
MULTICASTING Network Security.
An Evaluation of Scalable Application-level Multicast Using Peer-to-peer Overlays Miguel Castro, Michael B. Jones, Anne-Marie Kermarrec, Antony Rowstron,
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
P2P Course, Structured systems 1 Introduction (26/10/05)
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems (Antony Rowstron and Peter Druschel) Shariq Rizvi First.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
Chapter 22 Network Layer: Delivery, Forwarding, and Routing
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
Multicast Routing Algorithms n Multicast routing n Flooding and Spanning Tree n Forward Shortest Path algorithm n Reversed Path Forwarding (RPF) algorithms.
Chapter 22 Network Layer: Delivery, Forwarding, and Routing Part 5 Multicasting protocol.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
1 Ad Hoc On-Demand Distance Vector Routing (AODV) Dr. R. B. Patel.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Secure Routing for Structured Peer-to-Peer Overlay Networks M. Castro, P. Druschel, A. Ganesh, A. Rowstron and D. S. Wallach Proc. Of the 5 th Usenix Symposium.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
DHT-based unicast for mobile ad hoc networks Thomas Zahn, Jochen Schiller Institute of Computer Science Freie Universitat Berlin 報告 : 羅世豪.
Pastry Antony Rowstron and Peter Druschel Presented By David Deschenes.
Information-Centric Networks10b-1 Week 10 / Paper 2 Hermes: a distributed event-based middleware architecture –P.R. Pietzuch, J.M. Bacon –ICDCS 2002 Workshops.
Information-Centric Networks Section # 10.2: Publish/Subscribe Instructor: George Xylomenos Department: Informatics.
Peer to Peer Network Design Discovery and Routing algorithms
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony I.T.
Peer-to-Peer Networks 10 Fast Download Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Peer-to-Peer Networks 05 Pastry Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Fabián E. Bustamante, Fall 2005 A brief introduction to Pastry Based on: A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and.
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Controlling the Cost of Reliability in Peer-to-Peer Overlays
COS 461: Computer Networks
Host Multicast: A Framework for Delivering Multicast to End Users
PASTRY.
CS5412: Using Gossip to Build Overlay Networks
COS 461: Computer Networks
Optional Read Slides: Network Multicast
CS5412: Using Gossip to Build Overlay Networks
Presentation transcript:

Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T. Rowstron IEEE Journal on Selected Areas in Communications, Oct, 2002

Outline Pastry A peer-to-peer location and routing substrate Scribe Built on top of Pastry Experimental evaluation Delay penalty Node stress (routing tables) Link stress (network bandwidth)

Pastry (1/2) Each Pastry node has a unique, 128-b nodeId. The set of existing nodeIds is uniformly distributed. This is achieved by basing the nodeId on a secure hash of the node ’ s public key or IP address.

Pastry (2/2) Each node contains Routing tables (some of live nodes) Each entry maps a nodeId to the associated node ’ s IP address. IP addresses for the nodes in its “ leaf set ”. Leaf set (total l nodes) The set of nodes with l/2 numerically closest larger nodeId l/2 numerically closest smaller nodeId

Routing Given a message and a key, Pastry reliably routes the message to the node with the nodeId that is numerically closest to the key among all live nodes. In each routing step, the current node normally forwards the message to a node whose nodeId shares a longer prefix with the key. The key can be different from the destination nodeId.

Routing a message From node 65a1fc with key d46a1c

Locality properties Short routes property Concern the total distance that messages travel along Pastry routes. In each step, a message is routed to the nearest node with a longer prefix match. Route convergence property Concern the distance traveled by two messages sent to the same key before their routes converge. AB C E Converge D

Node addition The new nodeId X can initialize its state by contacting a nearby node A. A will route a special message using X as the key. This message is routed to the existing node Z with nodeId numerically closest to X. X then obtains the leaf set the routing table from Z. Z is the nearest node, so their leaf sets are almost the same. Their routing tables are very similar.

Failure To handle node failures, neighboring nodes in the nodeId space periodically exchange keep- alive messages. If a node is unresponsive for a period T, it is presumed failed. All members of the failed node ’ s leaf set are then notified and they update their leaf sets. Routing table entries that refer to the failed nodes are repaired lazily.

Scribe Scribe uses Pastry to manage group creation, group joining and to build a per-group multicast tree. Implementation CREATE JOIN MULTICAST LEAVE

Multicast tree creation CREATE 0111 JOIN 1001 forwarder 0100 JOIN 1101 forwarder 1111 forwarder b = 1 ( match 1 bit at a time) Because b = 1, so both 1111 and 1101 can be a forwarder.

Membership Rendezvous point The root of the multicast tree. Can be changed. Forwarder Scribe nodes that are part of a group ’ s multicast tree. They may or may not be member of the group. Each forwarder maintains a children table.

Multicast message dissemination Multicast sources use Pastry to locate the rendezvous point of a group. They route to the rendezvous point and ask it to return its IP address. They cache the rendezvous point ’ s IP address and use it in subsequent multicasts to the group. Multicast messages are disseminated from the rendezvous point along the multicast tree. Why? Each multicast source can also be viewed as the root. If each multicast source transmit data by itself, the delay penalty in worst case can become twice.

Reliability Each nonleaf node in the tree sends a heartbeat message to its children. A child suspects that its parent is faulty when it fails to receive heartbeat messages. Upon detection of the failure of its parent, a node calls Pastry to route a JOIN message to a new parent. If the failed node is the root, a new root (the live node with the numerically closet nodeId to the groupId) will replace it.

Experimental evaluation Compare with IP multicast Delay penalty Node stress Link stress Experimental setup A network topology with 5,050 routers Scribe run on 100,000 end nodes. 1,500 groups

Delay penalty Scribe increases the delay to deliver messages relative to IP multicast. RMD The ratio between the maximum delay using Scribe and the maximum delay using IP multicast. RAD The ratio between the average delay using Scribe and the average delay using IP multicast.

Delay penalty Scribe / IP multicast The number of groups with a RAD or RMD lower than or equal to the relative delay.

Node stress (1/2)

Node stress (2/2) Each node averagely remembers few children. Long tail

Link stress IP multicast 950 Scribe 4031

Bottleneck remover (1/3) Reasons Some node may have less computational power or bandwidth available than others. The distribution of children table entries has a long tail. Algorithm When a node is overloaded, it selects the group that consumes the most resources. It chooses the child in this group that is farthest away.

Bottleneck remover (2/3) The parent drops the chosen child by sending it a message containing the children table for the group. When the child receives the message, It measures the delay between itself and other nodes in the table. It computes the total delay between itself and the parent via each node in the table. It sends a join message to the node that provides the smallest combined delay.

Bottleneck remover (3/3)

Node stress No long tail

Scalability Evaluating Scribe ’ s scalability with a large number of groups. Experimental setup 50,000 Scribe nodes 30,000 groups with 11 members

Node stress (1/2) Collapse will be introduced later.

Node stress (2/2) Scribe is inappropriate to small groups! Long tail

Scribe collapse (1/2) If a multicast group has few members, the group may require many other nodes to become forwarders. (The tree is inefficient.) The new algorithm collapses long paths in the tree. Removing nodes that are not members of a group and have only one entry on the group ’ s children table.

Scribe collapse (2/2)

Link stress Na ï ve unicast Scribe IP multicast Scribe collapse