Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

Data and Computer Communications
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen Scott Shenker This is a modified version of the original presentation by the authors.
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Intel Research Seattle Sylvia Ratnasamy, Lee Breslau, Scott Shenker, and Nick Lanham.
Modeling and Analysis of Random Walk Search Algorithms in P2P Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE, Rensselaer Polytechnic Institute.
Technion –Israel Institute of Technology Computer Networks Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Technion –Israel Institute of Technology Software Systems Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi Melamed.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
1 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen, Scott Shenker ACM SIGCOMM Computer Communication Review, Proceedings of the.
Evolution of P2P Content Distribution Pei Cao. Outline History of P2P Content Distribution Architectures History of P2P Content Distribution Architectures.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Protecting Free Expression Online with Freenet Presented by Ho Tsz Kin I. Clarke, T. W. Hong, S. G. Miller, O. Sandberg, and B. Wiley 14/08/2003.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)
Making Gnutella-like P2P Systems Scalable Presented by: Karthik Lakshminarayanan Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, and Scott.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Replication.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Searching in Unstructured Networks Joining Theory with P-P2P.
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
An affinity-driven clustering approach for service discovery and composition for pervasive computing J. Gaber and M.Bakhouya Laboratoire SeT Université.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
1 BitHoc: BitTorrent for wireless ad hoc networks Jointly with: Chadi Barakat Jayeoung Choi Anwar Al Hamra Thierry Turletti EPI PLANETE 28/02/2008 MAESTRO/PLANETE.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Replication Strategies in Unstructured Peer-to-Peer Networks Edith CohenScott Shenker Some slides are taken from the authors’ original presentation.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search in Unstructured P2p.
Aug 22, 2002Sigcomm 2002 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen AT&T Labs-research Scott Shenker ICIR.
1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
School of Electrical Engineering &Telecommunications UNSW Cost-effective Broadcast for Fully Decentralized Peer-to-peer Networks Marius Portmann & Aruna.
Malugo – a scalable peer-to-peer storage system..
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
1 “Hybrid Search Schemes for Unstructured Peer- to-Peer Networks” “Random Walks in Peer-to-Peer Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Distributed Hash Tables (DHT) Jukka K. Nurminen *Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Unstructured Networks: Search Márk Jelasity. 2 Outline ● Emergence of decentralized networks ● The Gnutella network: how it worked and looked like ● Search.
Early Measurements of a Cluster-based Architecture for P2P Systems
GIA: Making Gnutella-like P2P Systems Scalable
Presentation transcript:

Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002

Outline Brief survey of P2P architectures Evaluation Methodology Search Methods Replication Conclusions

Peer-to-Peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g., music, videos, etc.) Dynamic: nodes join or leave frequently

P2P Network Architectures I Centralized: –Use of central directory server (CDS) –Peers query to the CSD to find other peers that hold the desired object Pros: very efficient Cons: poorly scales single point of failure

P2P Network Architectures II Decentralized: No central directory server –But structured: P2P network topology is tightly controlled Files are placed at specified locations –Unstructured: No control in Network topology or file placement

P2P Network Architectures III Decentralized but Structured “loose structured” –Placement of files is based on hints “tight structure” –Precisely declare structure of P2P network and file placement –Use of distributed hash table Pros: Efficient satisfaction of queries Good scaling Cons: No proof it works

P2P Network Architectures IV Decentralized and Unstructured Placement of files not based on topology knowledge Finding files –Node queries neighbors (usually using flooding) Pros: extremely resilient to network changes Cons: extremely unscalable generates large loads

Evaluation Methodology I Terminology Network Topology: instant graph formed by nodes in the network Query Distribution: frequency of lookups to files Replication Distribution: percentage of nodes that have a particular file

Evaluation Methodology II Network Topologies –Powel-Law Random Graph (PLRG) Max node degree: 1746, median: 1 average 4.46 –Normal Random Graph (Random) Average and median node degree is 4 –Gnutella graph (Gnutella) Oct 2000 snapshot Max degree: 136, median: 2, average: 5.5 –Two-dimensional Grid 100x100  nodes

Evaluation Methodology III Object query distribution q i –Uniform –Zipf-like Object replication density distribution r i –Uniform –Proportional: r i  q i –Square-Root: r i   q i

Evaluation Methodology IV Metrics –User aspects Pr(success) #hops –Load aspects Average #messages per node #nodes visited Peak #messages

Limitation of Flooding I Gnutella uses TTL to check #hops queries travel Problem: –Hard to choose TTL: For objects that are widely present in the network, small TTLs suffice For objects that are rare in the network, large TTLs are necessary –Number of query messages grow exponentially as TTL grows

Limitation of Flooding II Node may receive the same messages more than once Need for duplication detection mechanisms Still duplication increases as TTL increases in flooding

Limitation of Flooding Conclusion Flooding increases per-node overhead Need for more scalable search methods: –Expanding Ring –Random Walks

Expanding Ring Adaptively Adjust TTL –Multiple floods: start with TTL=1; increment TTL by 2 each time until search succeeds Still have duplicate messages

Random Walk Simple random walk –Takes too long to find anything Multiple-walker random walk –K walkers after each walking T steps visits as many nodes as 1 walker walking K*T steps – More messages  more overhead –When to terminate the search: TTL Checking: check back with query originator once every C steps

Search Traffic Comparison

Search Delay Comparison

Lessons Learned about Search Methods Key: Cover the right number of nodes as quickly as possible and with as little overhead as possible Pay Attention to –Adaptive termination –Minimize message duplication –Small expansion in each step

Replication In unstructured P2P systems, search success is essentially about coverage: visiting enough nodes to find the object => replication density matters Goal: minimize average search size (number of probes till query is satisfied) Theoretical Optimal: copy everything everywhere –Limited node storage

Replication Strategies Uniform Replication –pi = 1/m –Simple, resources are divided equally Proportional Replication –pi = qi –“Fair”, resources per item proportional to demand – Reflects current P2P practices

Square-Root Replication p i is proportional to square-root(q i ) Lies “In-between” Uniform and Proportional

Achieving Square-Root Replication I Assuming that each query keeps track the number of probes needed Store an object at a number of nodes that is proportional to the number of probes Two implementations: –Path replication: store the object along the path of a successful “walk” –Random replication: store the object randomly among nodes visited by the agents

Achieving Square-Root Replication II

Evaluation of Replication Methods I Metrics –Overall message traffic –Search delay Dynamic simulation –Assume Zipf-like object query probability –5 query/sec Poisson arrival –Results are during 5000sec-9000sec –Search method: 32-walkers random walk with state keeping and check every 4 steps

Evaluation of Replication Methods II Square-Root Replication reduces search traffic

Evaluation of Replication Methods III

Conclusions Multi-walker random walk scales much better than flooding –Can find data more quickly –Reduces the traffic overload Square-root replication distribution is desirable –Minimizes search delay –Minimizes the overall search traffic