Plethora: A Locality Enhancing Peer-to-Peer Network Ronaldo Alves Ferreira Advisor: Ananth Grama Co-advisor: Suresh Jagannathan Department of Computer.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Small-world Overlay P2P Network
Presented by Elisavet Kozyri. A distributed application architecture that partitions tasks or work loads between peers Main actions: Find the owner of.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Scalable Application Layer Multicast Suman Banerjee Bobby Bhattacharjee Christopher Kommareddy ACM SIGCOMM Computer Communication Review, Proceedings of.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
“Umbrella”: A novel fixed-size DHT protocol A.D. Sotiriou.
Peer-to-peer file-sharing over mobile ad hoc networks Gang Ding and Bharat Bhargava Department of Computer Sciences Purdue University Pervasive Computing.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems (Antony Rowstron and Peter Druschel) Shariq Rizvi First.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Mobile Ad-hoc Pastry (MADPastry) Niloy Ganguly. Problem of normal DHT in MANET No co-relation between overlay logical hop and physical hop – Low bandwidth,
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 Plaxton Routing. 2 Introduction Plaxton routing is a scalable mechanism for accessing nearby copies of objects. Plaxton mesh is a data structure that.
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Resource Addressable Network (RAN) An Adaptive Peer-to-Peer Substrate for Internet-Scale Service Platforms RAN Concept & Design  Adaptive, self-organizing,
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
An Improved Kademlia Protocol In a VoIP System Xiao Wu , Cuiyun Fu and Huiyou Chang Department of Computer Science, Zhongshan University, Guangzhou, China.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
DHT-based unicast for mobile ad hoc networks Thomas Zahn, Jochen Schiller Institute of Computer Science Freie Universitat Berlin 報告 : 羅世豪.
Pastry Antony Rowstron and Peter Druschel Presented By David Deschenes.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
Malugo – a scalable peer-to-peer storage system..
Fabián E. Bustamante, Fall 2005 A brief introduction to Pastry Based on: A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Plethora: Infrastructure and System Design
Early Measurements of a Cluster-based Architecture for P2P Systems
PASTRY.
Presentation transcript:

Plethora: A Locality Enhancing Peer-to-Peer Network Ronaldo Alves Ferreira Advisor: Ananth Grama Co-advisor: Suresh Jagannathan Department of Computer Sciences – Purdue University July

Outline Introduction Motivation IP Addresses as Virtual IDs Autonomous Systems as Basis for Locality: Plethora Organization and Algorithms Simulation Results Conclusions Ongoing Work

Introduction Peer-to-Peer (P2P) networks are self-organizing distributed systems where participating nodes both provide and receive services from each other in a cooperative manner without distinguished roles as pure clients or pure servers. P2P Internet applications have recently been popularized by file sharing applications like Napster and Gnutella. P2P systems have many interesting technical aspects such as decentralized control, self-organization, adaptation and scalability. One of the key problems in large-scale P2P applications is to provide efficient algorithms for object location and routing within the network.

Location and Routing Central server (Napster) Controlled flooding (Gnutella) Sequential version of flooding (Freenet) Structured solution – “DHT” (Chord, Pastry, Tapestry, CAN)

Location and Routing - DHT All known proposals take as input a key and, in response, route a message to the node responsible for that key. The keys are strings of digits of some length (generally 128 bits). Nodes have identifiers taken from the same space as the keys (same number of digits). Each node maintains a routing table consisting of a small subset of nodes in the system. Nodes route queries to neighbor nodes that make the most “progress” towards resolving the query.

Location and Routing - DHT The notion of progress differs from algorithm to algorithm. Plaxton developed the first ideas that could be applied in a scalable manner. While intended for a static node population, Plaxton algorithm provides efficient routing of queries. The algorithm works by “correcting” a single digit at a time. Chord, Pastry, and Tapestry are variants of Plaxton algorithm.

Location and Routing - DHT 0XXX1XXX2XXX3XXX START 0112 routes a message to key First hop fixes first digit (2) Second hop fixes second digit (20) END 2001 closest live node to 2000.

Location and Routing - DHT

Location and Routing - DHT Node 0 Routing Table Leaf Set

Location and Routing - DHT Node 0 Routing Table

Location and Routing - Pastry Computers (nodes) have unique ID  Typically 128 bits long  Assignment should lead to uniform distribution in the node ID space, for example SHA-1 of node’s IP Primitive: route(msg, key)  Deliver msg to currently alive node with ID numerically closest to key Node state  Routing table  Neighborhood set  Leaf set Scalable, efficient  O(log(N)) routing table entries per node  Route in O(log(N)) number of hops

DHT Performance Issues Virtualization destroys locality. Messages may have to travel around the world to reach a node in the same LAN. Query responses do not contain locality information. Heuristics to minimize the problem:  Proximity routing  Topology-based node ID assignment  Proximity neighbor selection

Motivation Virtualization destroys locality. Query responses do not contain locality information. Recent studies show that queries for multiple keys in P2P networks follow a Zipf-like distribution. For many wide-are distributed applications, nodes in the same region share common interests. For example, music sharing applications.

IP Addresses as Virtual IDs A natural way of building locality in an overlay network is to explore the addressing scheme of the underlying network. In most cases, nodes with IP addresses that are numerically close are also physically close. Organization of the Internet in ASs. By correcting a few bits in each hop, the last hops would be inside an AS.

IP Addresses as Virtual IDs IP space is not uniformly populated by peers. Load imbalance at the peers. The upper bound of O(log n) can no longer be guaranteed.

IP Addresses as Virtual IDs How severe would be the load imbalance if we use the IP address of the node as its overlay identifier? Is it possible to find a boundary in the IP address such that distribution of peers is uniform and such that some form of locality is captured? Experimental Basis: Gnutella traces from June 2002 with 56M messages. 62,000 different IP addresses. Addresses were validated using a whois server and Ping.

IP Addresses as Virtual IDs

2,420 nodes. 20 keys per node.

IP Addresses as Virtual IDs

Average CIDR prefix length for the address over 19 bits. Negative result. Provides us with an insight to propose a two-level overlay architecture. One global overlay, and several local overlays. A local overlay is formed with nodes that share the first 8 bits.

IP Addresses as Virtual IDs

Plethora Two-level overlay  One global overlay  Several local overlays Global overlay is the main repository of data. Any DHT protocol can be used. Global overlay helps nodes organize themselves into local overlays. Local overlays explore the organization of the Internet in ASs. Local overlays use a modified version of Pastry. Size of the local overlay is controlled by a local overlay leader.  Uses efficient distributed algorithms for merging and splitting local overlays.

Plethora – Data Access

Plethora – LO Routing Information Corrects a single bit at each hop. Each node has a routing table and a leaf set as in Pastry. Each routing table entry has pointers to a primary and to a secondary neighbor. Primary neighbors are used to implement proximity neighbor selection. Secondary neighbors are used to implement the local overlay split operation.

Node Arrivals When joining the network, a node first joins the global overlay using the specific DHT protocol. After joining the global overlay, the new node contacts the rendezvous point of its AS to determine which local overlay it will join. A new node uses its AS neighborhood information to join other AS local overlays when there is no node of its own AS in the network.

Splitting Local Overlays AS Invariant Nodes of the same autonomous system must always stay in the same local overlay after a split operation.

Splitting Local Overlays Nodes use a hash function on their AS numbers to determine other nodes that will stay together in the same local overlay after a split operation. During network operation, nodes make secondary neighbor pointers in their routing tables point to nodes with the same AS hash value. Local overlay leader periodically circulates a message to determine the number of nodes in the LO. If the number of nodes exceeds the maximum threshold, the leader issues a split message to all nodes in the LO. On receiving a split message, a node n discards pointers to nodes whose hash values differ from it.

Splitting Local Overlays

Lemma: After a split operation, the two new local overlays are connected with high probability.  Set the leaf set to K log M, where M is the maximum number of nodes allowed in a local overlay, and K is a constant greater than 1.  Assuming that the hash values 0 and 1 are equally possible, the probability of a node n being disconnected is equal to:  The probability of n being in a connected overlay is:

Node Departures Node departures are handled lazily. If a node detects that one of its neighbors has left the network, it routes using alternative mechanisms (for example, leaf set) and tries to find a replacement for the missing node. If the local overlay leader leaves, the first node that detects its departure triggers a new leader election protocol.

Merging Local Overlays If the sizes of the two overlays differ for more than a constant factor α, simple insertions of the nodes of the smaller overlay are performed into the larger overlay. If the sizes of the two overlays are within a constant factor α, use distributed algorithm based on hypercube merging. Analogous to merging two hypercubes of dimension d to produce a hypercube of dimension d+1. On receiving a merge message, nodes add a new row to their routing table.

Merging Local Overlays

Node 0 Routing Table Leaf Set L L 1 2

Merging Local Overlays Node 0 Routing Table Leaf Set

Simulation Setup Internet topology generated using GT-ITM topology generator. 10 transit domains. 1,000 stub domains. 100,000 hosts Each stub domain is one AS. 10,000 overlay nodes selected randomly from the hosts. NLANR web proxy trace with 500,254 objects. Zipf distribution parameters: {0.70, 0.75, 0.80, 0.85, 0.90} Maximum overlay sizes: {200; 300; 400; 500; 1,000; 2,000} Local cache size: 5MB (LRU replacement policy).

Simulation Results Response Delay

Simulation Results Response Delay

Simulation Results Response Delay

Simulation Results Number of Messages

Simulation Results Number of Messages

Simulation Results Split Operation

Simulation Results Merge Operation

Conclusions Use of IP addresses as virtual IDs would probably produce overlays with good locality properties, but the non-uniform population of nodes in the IP space leads to severe load imbalances and no guarantees on the number of hops exist. Plethora is a two-level overlay architecture. Local overlays are created to cluster nodes that are close in the underlying network. Plethora uses efficient distributed algorithms for merging and splitting local overlays. The performance gains of a two-level architecture are significant, when compared with a single global overlay. The costs of maintaining the two-level architecture are very low.

Future Work Short term goal: develop a caching replacement policy using availability of the nodes as a parameter. Long term goal: implementation of a version-based wide- area read-write distributed file system using Plethora as its routing core.