Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Evaluation of a Scalable P2P Lookup Protocol for Internet Applications
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Scalable Content-Addressable Network Lintao Liu
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
1 Turning Heterogeneity into an Advantage in Overlay Routing Gisik Kwon Dept. of Computer Science and Engineering Arizona State University Published in.
Technion –Israel Institute of Technology Computer Networks Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Expediting Searching Processes via Long Paths in P2P Systems 05/30 IDEA Lab.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Searching in Unstructured Networks Joining Theory with P-P2P.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
1 Unstructured P2P overlay. 2 Centralized model  e.g. Napster  global index held by central authority  direct contact between requestors and providers.
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
1 P2P Computing. 2 What is P2P? Server-Client model.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
HERO: Online Real-time Vehicle Tracking in Shanghai Xuejia Lu 11/17/2008.
Introduction of P2P systems
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Routing Indices For P-to-P Systems ICDCS Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.
Network Computing Laboratory Scalable File Sharing System Using Distributed Hash Table Idea Proposal April 14, 2005 Presentation by Jaesun Han.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
A NOVEL SOCIAL CLUSTER-BASED P2P FRAMEWORK FOR INTEGRATING VANETS WITH THE INTERNET Chien-Chun Hung CMLab, CSIE, NTU, Taiwan.
Data Indexing in Peer- to-Peer DHT Networks Garces-Erice, P.A.Felber, E.W.Biersack, G.Urvoy-Keller, K.W.Ross ICDCS 2004.
1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Advanced Computer Networks: Part 2 Complex Networks, P2P Networks and Swarm Intelligence on Graphs.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
CHAPTER 3 Architectures for Distributed Systems
Early Measurements of a Cluster-based Architecture for P2P Systems
EE 122: Peer-to-Peer (P2P) Networks
A Semantic Peer-to-Peer Overlay for Web Services Discovery
Consistent Hashing and Distributed Hash Table
#02 Peer to Peer Networking
Presentation transcript:

Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai

Outline Background Design of Interest-based Locality Simulation of Interest-based Locality Enhancement of Interest-based Locality Understanding the scheme

Background 3 types of P2P systems Centralized P2P: Napster Decentralized Unstructured: Gnutella Decentralized Structured: Distributed Hash Table (DHT)

Background Each peer is connected randomly, and searching is done by flooding. Allow keyword search Example of searching a mp3 file in Gnutella network. The query is flooded across the network.

Background DHT (Chord): Given a key, Chord will map the key to the node. Each node need to maintain O(log N) information Each query use O(log N) messages. Key search means searching by exact name An chord with about 50 nodes. The black lines point to adjacent nodes while the red lines are “finger” pointers that allow a node to find key in O(log N) time.

Outline Background Design of Interest-based Locality Simulation of Interest-based Locality Enhancement of Interest-based Locality Understanding the scheme

Interest-based Locality Peers have similar interest will share similar contents

Architecture Shortcuts are modular. Shortcuts are performance enhancement hints.

Creation of shortcuts The peer use the underlying topology (e.g. Gnutella) for the first few searches. One of the return peers is selected from random and added to the shortcut lists. Each shortcut will be ordered by the metric, e.g. success rate, path latency. Subsequent queries go through the shortcut lists first. If fail, lookup through underlying topology.

Outline Background Design of Interest-based Locality Simulation of Interest-based Locality Enhancement of Interest-based Locality Understanding the scheme

Performance Evaluation Performance metric: success rate load characteristics (query packets per peers process in the system) query scope (the fraction of peers in each query) minimum reply path length additional state kept in each node

Methodology – query workload Create traffic trace from the real application traffic: Boeing firewall proxies Microsoft firewall proxies Passively collect the web traffic between CMU and the Internet Passively collect typical P2P traffic (Kazza, Gnutella) Use exact matching rather than keyword matching in the simulation. “song.mp3” and “my artist – song.mp3” will be treated as different.

Methodology – Underlying peers topology Based on the Gnutella connectivity graph in 2001, with 95% nodes about 7 hops away. Searching TTL is set to 7. For each kind of traffic (Boeing, Microsoft… etc), run 8 times simulations, each with 1 hour.

Methodology – Storage and replication modeling (web) The first peer make the web request will be modeled as first node containing the web pages. Subsequent search from other peers will search from this peer and replicate the page. a b c Node a is the first peer to search for a.html, and it will be modeled as the first node containing a.html a.html node b retrieve a.html from node a node c can retrieve a.html from node a, node b

Methodology – Storage and replication modeling (P2P) From the traffic trace collected, if a file is downloaded for download at t 0. The file should also be available for download before t 0. However, if the file isn’t downloaded during the sampled trace, There is no information to indicate the existence of the file. t S t=t 0 simulation end (t E ) File is downloaded from t 0

Simulation Results – success rate

Simulation Results – load, scope and path length -- Query load for Boeing and Microsoft Traffic: -- Query scope for shortcut scheme is about 0.3%, where in Gnutella is about 100%. -- Average path length of the traces:

Outline Background Design of Interest-based Locality Simulation of Interest-based Locality Enhancement of Interest-based Locality Understanding the scheme

Increase Number of Shortcuts 7 ~ 12 % performance gain Diminished return Add all shortcut at a time, no limit on the shortcut size Add k shortcut at a time, only 100 shortcuts are used.

Using Shortcuts’ Shortcuts Idea: Add the shortcut’s shortcut Performance gain of 7% on average

Outline Background Design of Interest-based Locality Simulation Enhancement of Interest-based Locality Understanding the scheme

Interest-based Structures When viewed as an undirected graph: In the first 10 minutes, there are many connected components, each component has a few peers in between. At the end of simulation, there are few connected components, each component has several hundred peers. Each component is well connected. The clustering coefficient is about 0.6 ~ 0.7, which is higher than that in Web graph.

Web Objects Locality Webpage contains several web objects, locality should exists in between these objects. There is performance drop of 10% when we retrieve web objects rather than webpages. Performance is gained back when we exhaust all the shortcuts.

Locality Across Publishers Same publisher exhibit low interest locality, peer actually may interest different publishers content. Same publisher shortcuts means shortcuts that are originally created as accessing the same content from the same publisher for the current request.

Sensitivity of Shortcuts Run Interest based shortcuts over DHT (Chord) instead of Gnutella. Query load is reduced by a factor 2 – 4. Query scope is reduced from 7/N to 1.5/N

Conclusion Interest based shortcuts are modular and performance enhancement hints over existing P2P topology. Shortcuts are proven can enhance the searching efficiencies. Shortcuts form clusters within a P2P topology, and the clusters are well connected.