Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)

Slides:



Advertisements
Similar presentations
Ch. 12 Routing in Switched Networks
Advertisements

Peer-to-Peer and Social Networks An overview of Gnutella.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen Scott Shenker This is a modified version of the original presentation by the authors.
Scalable Content-Addressable Network Lintao Liu
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Massively Distributed Database Systems Distributed Hash Spring 2014 Ki-Joune Li Pusan National University.
Modeling and Analysis of Random Walk Search Algorithms in P2P Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE, Rensselaer Polytechnic Institute.
Technion –Israel Institute of Technology Computer Networks Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Expediting Searching Processes via Long Paths in P2P Systems 05/30 IDEA Lab.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Technion –Israel Institute of Technology Software Systems Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi Melamed.
FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Brief Overview of Academic Research on P2P Pei Cao.
Evolution of P2P Content Distribution Pei Cao. Outline History of P2P Content Distribution Architectures History of P2P Content Distribution Architectures.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Vassilios V. Dimakopoulos and Evaggelia Pitoura Distributed Data Management Lab Dept. of Computer Science, Univ. of Ioannina, Greece
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Replication.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Searching in Unstructured Networks Joining Theory with P-P2P.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Replication Strategies in Unstructured Peer-to-Peer Networks Edith CohenScott Shenker Some slides are taken from the authors’ original presentation.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search in Unstructured P2p.
Peer to Peer Network Design Discovery and Routing algorithms
Aug 22, 2002Sigcomm 2002 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen AT&T Labs-research Scott Shenker ICIR.
Topologically-Aware Overlay Construction and Sever Selection Sylvia Ratnasamy, Mark Handley, Richard Karp, Scott Shenker.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
Peer-to-Peer Video Systems: Storage Management CS587x Lecture Department of Computer Science Iowa State University.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
School of Electrical Engineering &Telecommunications UNSW Cost-effective Broadcast for Fully Decentralized Peer-to-peer Networks Marius Portmann & Aruna.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Distributed Hash Tables (DHT) Jukka K. Nurminen *Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Unstructured Networks: Search Márk Jelasity. 2 Outline ● Emergence of decentralized networks ● The Gnutella network: how it worked and looked like ● Search.
Peer-to-Peer and Social Networks
Early Measurements of a Cluster-based Architecture for P2P Systems
Peer-to-Peer Video Services
Presentation transcript:

Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)

Disclaimer Results, statements, opinions in this talk do not represent Cisco in anyway This talk is about technical problems in networking, and does not discuss moral, legal and other issues related to P2P networks and their applications

Outline Brief survey of P2P architectures Evaluation methodologies Search methods Replication strategies and analysis Simulation results

Characteristics of Peer-to-Peer Networks Unregulated overlay network Current application: file swapping Dynamic: nodes join or leave frequently Example systems: –Napster, Gnutella; –Freenet, FreeHaven, MajoNation, Alpine,... –JXTA, Ohaha, … –Chord, CAN, “Past”, “Tapestry”, Oceanstore

Architecture Comparisons Napster: centralized –A central website to hold file directory of all participants; Very efficient –Scales –Problem: Single point of failure Gnutella: decentralized –No central directory; use “flooding w/ TTL” –Very resilient against failure –Problem: Doesn’t scale

Architecture Comparisons Various research projects such as CAN: decentralized, but “structured” –CAN: distributed hash table –“Structure”: all nodes participate in a precise scheme to maintain certain invariants –Extra work when nodes join and leave –Scales very well, but can be fragile

Architecture Comparisons FreeNet: decentralized, but semi-structured –Intended for file storage –Files are stored along a route biased by hints –Queries for files follow a route biased by the same hints –Scales very well –Problem: would it really work? Simulation says yes in most cases, but no proof so far

Our Focus: Gnutella-Style Systems Advantages of Gnutella: –Support more flexible queries Typically, precise “name” search is a small portion of all queries –Simplicity, high resilience against node failures Problems of Gnutella: Scalability –Bottleneck: interrupt rates on individual nodes –Self-limiting network: nodes have to exit to get real work done!

Evaluation Methodologies Simulation based: Network topology Distribution of object popularity Distribution of replication density of objects

Evaluation Methods Network topologies: –Uniform Random Graph (Random) Average and median node degree is 4 –Power-Law Random Graph (PLRG) max node degree: 1746, median: 1, average: 4.46 –Gnutella network snapshot (Gnutella) Oct 2000 snapshot max degree: 136, median: 2, average: 5.5 –Two-dimensional grid (Grid)

Modeling Methods Object popularity distribution p i –Uniform –Zipf-like Object replication density distribution r i –Uniform –Proportional: r i  p i –Square-Root: r i   p i

Evaluation Metrics Overhead: average # of messages per node per query Probability of search success: Pr(success) Delay: # of hops till success

Load on Individual Nodes Why is a node interrupted: –To process a query –To route the query to other nodes –To process duplicated queries sent to it

Duplication in Flooding-Based Searches Duplication increases as TTL increases in flooding Worst case: a node A is interrrupted by N * q * degree(A) messages

Duplications in Various Network Topologies

Relationship between TTL and Search Successes

Problems with Simple TTL- Based Flooding Hard to choose TTL: –For objects that are widely present in the network, small TTLs suffice –For objects that are rare in the network, large TTLs are necessary Number of query messages grow exponentially as TTL grows

Idea #1: Adaptively Adjust TTL “Expanding Ring” –Multiple floods: start with TTL=1; increment TTL by 2 each time until search succeeds Success varies by network topology –For “Random”, 30- to 70- fold reduction in message traffic –For Power-law and Gnutella graphs, only 3- to 9- fold reduction

Limitations of Expanding Ring

Idea #2: Random Walk Simple random walk –takes too long to find anything! Multiple-walker random walk –N agents after each walking T steps visits as many nodes as 1 agent walking N*T steps –When to terminate the search: check back with the query originator once every C steps

Search Traffic Comparison

Search Delay Comparison

Lessons Learnt about Search Methods Adaptive termination Minimize message duplication Small expansion in each step

Flexible Replication In unstructured systems, search success is essentially about coverage: visiting enough nodes to probabilistically find the object => replication density matters Limited node storage => what’s the optimal replication density distribution? –In Gnutella, only nodes who query an object store it => r i  p i –What if we have different replication strategies?

Optimal r i Distribution Goal: minimize  ( p i / r i ), where  r i =R Calculation: –introduce Lagrange multiplier, find r i and that minimize:  ( p i / r i ) + * (  r i - R) => - p i / r i 2 = 0 for all i => r i   p i

Square-Root Distribution General principle: to minimize  ( p i / r i ) under constraint  r i =R, make r i propotional to square root of p i Other application examples: –Bandwidth allocation to minimize expected download times –Server load balancing to minimize expected request latency

Achieving Square-Root Distribution Suggestions from some heuristics –Store an object at a number of nodes that is proportional to the number of node visited in order to find the object –Each node uses random replacement Two implementations: –Path replication: store the object along the path of a successful “walk” –Random replication: store the object randomly among nodes visited by the agents

Evaluation of Replication Methods Metrics –Overall message traffic –Search delay Dynamic simulation –Assume Zipf-like object query probability –5 query/sec Poisson arrival –Results are during 5000sec-9000sec

Distribution of r i

Total Search Message Comparison Observation: path replication is slightly inferior to random replication

Search Delay Comparison

Summary Multi-walker random walk scales much better than flooding –It won’t scale as perfectly as structured network, but current unstructured network can be improved significantly Square-root replication distribution is desirable and can be achieved via path replication