ECE 6102 Qiyu Liu Ethan Trewhitt

Slides:



Advertisements
Similar presentations
CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Kademlia: A Peer-to-peer Information System Based on the XOR Metric Petar Mayamounkov David Mazières A few slides are taken from the authors’ original.
Massively Distributed Database Systems Distributed Hash Spring 2014 Ki-Joune Li Pusan National University.
Helper Protocols Protocols that either make it easier for IP to do its job, or extend the capabilities of the network layer.
A Scalable Content Addressable Network (CAN)
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
Chapter 9: Huffman Codes
Content Addressable Networks. CAN Associate with each node and item a unique id in a d-dimensional space Goals –Scales to hundreds of thousands of nodes.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
P2P Course, Structured systems 1 Introduction (26/10/05)
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
Project Mimir A Distributed Filesystem Uses Rateless Erasure Codes for Reliability Uses Pastry’s Multicast System Scribe for Resource discovery and Utilization.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
P2P File Sharing Systems
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Introduction Widespread unstructured P2P network
Other Structured P2P Systems CAN, BATON Lecture 4 1.
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
Chapter 2: Application layer
 A P2P IRC Network Built on Top of the Kademlia Distributed Hash Table.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
Computer Science 6390 – Advanced Computer Networks Dr. Jorge A. Cobb Deering, Estrin, Farinacci, Jacobson, Liu, Wei SIGCOMM 94 An Architecture for Wide-Area.
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications  app architectures  app requirements r 2.2 Web and HTTP r.
1 Detecting and Reducing Partition Nodes in Limited-routing-hop Overlay Networks Zhenhua Li and Guihai Chen State Key Laboratory for Novel Software Technology.
Content Addressable Networks CAN is a distributed infrastructure, that provides hash table-like functionality on Internet-like scales. Keys hashed into.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
BATON A Balanced Tree Structure for Peer-to-Peer Networks H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
PeerNet: Pushing Peer-to-Peer Down the Stack Jakob Eriksson, Michalis Faloutsos, Srikanth Krishnamurthy University of California, Riverside.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
TreeCast: A Stateless Addressing and Routing Architecture for Sensor Networks Santashil PalChaudhuri, Shu Du, Ami K. Saha, and David B. Johnson Department.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Domain Name System: DNS To identify an entity, TCP/IP protocols use the IP address, which uniquely identifies the Connection of a host to the Internet.
CSE 486/586 Distributed Systems Distributed Hash Tables
Peer-to-Peer Data Management
Distributed Systems CS
Plethora: Infrastructure and System Design
EE 122: Peer-to-Peer (P2P) Networks
Chapter 9: Virtual-Memory Management
DHT Routing Geometries and Chord
A Scalable content-addressable network
ECE 544 Project3 Team member: BIAO LI, BO QU, XIAO ZHANG 1 1.
An Overview of Peer-to-Peer
Distributed Systems CS
Compact routing schemes with improved stretch
COMPUTER NETWORKS PRESENTATION
Optional Read Slides: Network Multicast
Kademlia: A Peer-to-peer Information System Based on the XOR Metric
Design and Implementation of OverLay Multicast Tree Protocol
Presentation transcript:

ECE 6102 Qiyu Liu Ethan Trewhitt PeerCluster: A Cluster-Based Peer-to-Peer System Xin-Mao Huang, Cheng-Yue Chang, and Ming-Syan Chen, Fellow, IEEE ECE 6102 Qiyu Liu Ethan Trewhitt

Agenda Background Structure Functional Protocols Structural Protocols Scaling Performance

Background – Existing P2P Systems Centralized system - Napster Pro: Low cost to resolve queries Cons: Single point of failure Decentralized/unstructured - Gnutella Pro: Fault-tolerant, resilient to join/leaves Cons: Search mechanism scales poorly Decentralized/structured - PeerCluster Same benefits of decentralized/unstructured Cluster structure reduces broadcast flooding

Background – PeerCluster Principle of interest grouping A given user has few interests Queries relate to interests How to exploit? Logically group users with similar topics Increases query efficiency

Background – PeerCluster

Background – Query Resolution A node receives a query if (query topic = present cluster’s interest topic) { broadcast to all nodes in present cluster // intracluster broadcasting } else route to responsible node in corresponding interest cluster // intercluster broadcasting Intra/intercluster broadcasting are main operations in query resolution How to implement?

Structure – Hypercube Three interests can be implemented with 5-D hypercube Nodes & edges are virtual One hypercube address  one computer However, one computer  multiple hypercube addresses

Structure – Clusters Interest-based Realized with hypercubes within the overall system hypercube Initial size based on popularity, Huffman coding

Structure – Tree Creation Assume n-dimensional hypercube with k different interest topics Ij: jth interest topic where 0 ≤ j ≤ k - 1 pop[Ij]: popularity of Ij 0 < pop[Ij] < 1 and Construct Huffman tree based on pop[Ij] Cluster size = 2n-length(prefix[Ij])

Structure – Routing Table Routing table created for each computer Must keep track of mapping of neighboring computers to send messages addr(A): addresses owned by computer A NH(A): neighboring hypercube addresses =Uai Є addr(A) Ne(ai) – addr(A) where Ne(ai) is set of hypercube addresses adjacent to address ai

Structure – Assigned Tree Assigned tree records number of free addresses in every cluster Root address is lowest address Parent and child address differ by 1 bit only Child address is longer than parent address Present address manages assignment of child address Every address records number of free addresses of all its children. Initial number of free addresses of children = total number of subtrees When parent address wants to assign free address to joining request, checks number of free addresses starting from lowest address

Functional Protocol – Broadcast Proc_Broadcast(subq, msg, node_addr, step) for (i = step to subq – 1) { dest_addr = node_addr xor 2i; send(subq, msg, dest_addr, i++); }

Functional Protocol – Route Proc_Route(msg, dest_addr, node_addr) if (dest_addr != node_addr) { i = Compare(dest_addr, node_addr); send(msg, dest_addr, node_addr xor 2i); }

JOIN Protocol Joining computer A finds any computer B in the system Ask computer B to find computer C with the same major interest Ask computer C to find computer D that holds an available alias address* Take the available address and notify neighbors Computer D notifies its parent nodes of one less available address *if there are no available addresses, a cluster expansion must be performed

LEAVE Protocol Leaving computer A finds the root node B (smallest address) of the cluster Donate address (and aliases) to computer at B Computer B notifies its neighbors that A has left

SEARCH Protocol Searching computer A wants to find something Query computer B in the corresponding interest cluster who has the same postfix Computer B broadcasts query to its cluster Computers in the queried cluster respond directly to A with relevant results

Cluster Expansion Runs whenever a computer wants to join but the cluster is full Query the utilization rates of neighboring clusters Choose a neighboring cluster The neighboring cluster splits and loans the upper half of its addresses Upper-half addresses rejoin at the lower half

Cluster Expansion Issues Expansion and splitting cause partitions Clusters are no longer a single hypercube System restoration consolidates clusters If the cluster can’t be expanded or the system is full, the system must be expanded

System Expansion Easier than cluster expansion Addresses gain an additional bit, entire system doubles in size Each node becomes two Each cluster doubles in size

Performance Setup Uses data from the Open Directory Project Compares Gnutella and PeerCluster Determined the “query efficiency”, which is the ratio of files found to query messages sent Varied the Search Limit (SL), which acts like a TTL value Also varied the number of interest clusters Base 4 vs. base 2

Performance

Questions?