“A Local Search Mechanism for Peer-to-Peer Networks”

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Scalable Content-Addressable Network Lintao Liu
Efficient Search - Overview Improving Search In Peer-to-Peer Systems Presented By Jon Hess cs294-4 Fall 2003.
Improving Search in Peer-to-Peer Networks Beverly Yang Hector Garcia-Molina Presented by Shreeram Sahasrabudhe
Massively Distributed Database Systems Distributed Hash Spring 2014 Ki-Joune Li Pusan National University.
University of Cincinnati1 Towards A Content-Based Aggregation Network By Shagun Kakkar May 29, 2002.
Technion –Israel Institute of Technology Computer Networks Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi.
Routing Indices For Peer-to-Peer Systems Arturo Crespo, Hector Garcia-Molina Stanford ICDCS 2002.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Routing indexes A. Crespo & H. Garcia-Molina ICDCS 02.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
PROMISE: Peer-to-Peer Media Streaming Using CollectCast Mohamed Hafeeda, Ahsan Habib et al. Presented By: Abhishek Gupta.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
LSDS-IR’08, October 30, Peer-to-Peer Similarity Search over Widely Distributed Document Collections Christos Doulkeridis 1, Kjetil Nørvåg 2, Michalis.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Searching in Unstructured Networks Joining Theory with P-P2P.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
09/07/2004Peer-to-Peer Systems in Mobile Ad-hoc Networks 1 Lookup Service for Peer-to-Peer Systems in Mobile Ad-hoc Networks M. Tech Project Presentation.
Searching In Peer-To-Peer Networks Chunlin Yang. What’s P2P - Unofficial Definition All of the computers in the network are equal Each computer functions.
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
1 Exploiting locality for scalable information retrieval in peer-to-peer networks D. Zeinalipour-Yazti, Vana Kalogeraki, Dimitrios Gunopulos Manos Moschous.
P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Routing indexes A. Crespo & H. Garcia-Molina ICDCS 02.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Routing Indices For P-to-P Systems ICDCS Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.
Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong,
“Information Retrieval in Peer-to-Peer Systems” Demetrios Zeinalipour-Yazti M.Sc. Thesis Defense Monday, May 5,
Structuring P2P networks for efficient searching Rishi Kant and Abderrahim Laabid Abderrahim Laabid.
Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
03/19/02Scalab Seminar Series1 Routing in Peer-to-Peer Systems Ramaswamy N.Vadivelu Scalab, ASU.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
PEER TO PEER (P2P) NETWORK By: Linda Rockson 11/28/06.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Information Retrieval in Peer to Peer Systems Modern Information Retrieval Sharif University of Technology Fall 2005.
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Information Retrieval in Peer to Peer Systems Modern Information Retrieval Sharif University of Technology Fall 2005.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Peer-to-Peer and Social Networks
Early Measurements of a Cluster-based Architecture for P2P Systems
EE 122: Peer-to-Peer (P2P) Networks
Peer to Peer Information Retrieval
Paraskevi Raftopoulou, Euripides G.M. Petrakis
Presentation transcript:

“A Local Search Mechanism for Peer-to-Peer Networks” Vana Kalogeraki, Dimitrios Gunopulos & Demetris Zeinalipour (University of California – Riverside) < vana@cs.ucr.edu, dg@cs.ucr.edu, csyiazti@cs.ucr.edu > CIKM 2002 – Eleventh International Conference on Information and Knowledge Management November 4-9, Mclean VA http://www.cs.ucr.edu/~csyiazti/publications.html

Presentation Outline Introduction: Information Retrieval (I.R) in Peer-to-Peer networks. Techniques for Distributed I.R. Breadth-First Search. Random Breadth-First Search. Intelligent Search with profiling. Experimental Evaluation. Related Work. Conclusions & Future Work.

Introduction to Peer-to-Peer Peer-to-Peer Computing definition: “Sharing of computer resources and information through direct exchange” The physical topology The virtual P2P topology Clients (downloaders) are also servers Clients may join or leave the network at any time => highly fault-tolerant but with a cost! Searches are done within the virtual network while actual downloads are done offline (with HTTP). Cost = increased network activity, large scale, hetoregenous resources, geographically distributed

Introduction to Peer-to-Peer Peer-to-Peer (P2P) systems are increasingly becoming popular. P2P file-sharing systems, such as Gnutella, Napster and Freenet realized a distributed infrastructure for sharing files. Traditionally, files were shared using the Client-Server model (e.g. http). Not scalable since they are centralized services. P2P uncover new advantages in simplicity of use, robustness, self organization and scalability. Advancements in networks and multimedia

Information Retrieval in P2P Problem: “How to efficiently retrieve Information in P2P systems where each node shares a collection of documents?” keywords I.R: Bulit a system that retrieves documents that users are likely to find relevant Documents consists of keywords. Resembles Information Retrieval but resources are distributed now. Primary Data Structures such as Global Inverted Indexes can’t be maintained efficiently.

Solutions for P2P Information Retrieval 1) Centralized Approaches Centralized Indexes e.g. Napster, SETI@HOME 2) Purely Distributed Approaches Each node has only local knowledge. I.R is done using Brute force mechanisms e.g. Gnutella, Fasttrack (Kazaa) 3) Hybrid Approaches One or more peers have partial indexes of the contents of others. e.g. Limewire's Ultrapeers 1) Upload Index Centralized Index 2) Query/QueryHit 3) Download (offline) 1 2 3 1) Connect 2) Query/QueryHit 3) Download (offline) 1,2 3 1) Connect 2) Intelligent Query/QueryHit 3) Download (offline) 1,2 3

Motivation On 1st June we crawled the Gnutella P2P Network for 5 hours with 17 workstations. We analyzed 15,153,524 query messages. Observation: High locality of specific queries. We try to exploit this property for more efficient searches?

Presentation Outline Introduction: Information Retrieval (I.R) in Peer-to-Peer networks. Techniques for Distributed I.R. Breadth-First Search. Random Breadth-First Search. Intelligent Search with profiling. Experimental Evaluation. Related Work. Conclusions & Future Work.

Techniques for Distributed I.R. Breadth-First Search (Gnutella) Each Query Message is propagated along all outgoing links of a peer using TTL (time-to-live). TTL is decremented on each forward until it becomes 0 Technique for I.R in P2P systems such as Gnutella. Results? The physical network comes to its knees Long Delays for search results. P2P Network N A QUERY 1 QUERYHIT 2 Peer q Peer d

Techniques for Distributed I.R. 2. Modified Random BFS Each Query Message is forwarded to only a fraction of outgoing links (e.g. ½ of them). TTL is again decremented on each forward until it becomes 0. Results? Fewer Messages but possibly less results This algorithm is probabilistic. Some segments may become unreachable unreachable B A QUERY 1 Peer q QUERYHIT P2P Network N 2 C Peer d

Techniques for Distributed I.R. 3. Intelligent Search Mechanism (ISM) Idea: Each Query Message is forwarded intelligently based on what queries a peer answered in the past. Components of ISM (for each node u) Profile Mechanism, for each neighbor N(u). Peer Ranking Mechanism, for ranking peers locally and send a search query only to the ones that most likely will answer. Similarity Function, for finding similar search queries. Search Mechanism, for propagating queries based on local indexes A QUERY 1 Peer q profiles QUERYHIT 2 ? Peer d

Techniques for Distributed I.R. 3. Intelligent Search Mechanism (ISM) a) Profile mechanism. Maintains a list of past queries routed through that host. Every time a QueryHit is received the table is updated The profile manager uses a Least Recently Used policy to keep most recent queries in repository. Profiles are kept for neighbors only so the cost for maintaining this cost is O(Td), T is a limiting factor per profile, d is the degree of a node Size: T*d Query GUID Connection timestamp Elections Bush Clinton G439ID Socket1 100002222 Super Bowl San Diego F549QL --- 100065652 *** Italy earthquake disaster PN329D Socket5 100022453 }

Techniques for Distributed I.R. 3. Intelligent Search Mechanism (ISM) b) Peer Ranking Mechanism. Before forwarding a Query Message a peer performs an on-the-fly ranking of its peers to determine the best paths. We use the Aggregate Similarity of peer Pi to a query q, computed by a peer Pk as: Example Assume host Pk needs to forward a query q=“italy disaster” to two of its peers {P1, P2, P3}. Pk maintains queries {q1 ,q2,. ,q5} in its profile. P2 { P1 P3 { Sim(q, q1) = 0.8 Sim(q, q2) = 0.6 Sim(q, q3) = 0.5 Sim(q, q4) = 0.4 Sim(q, q5) = 0.4 } => PsimP(P2, q) = 0.61 + 0.51 = 1.1 } => PsimP(P3, q) = 0.41 + 0.31 = 0.7 => PsimP(P1, q) = 0.81 = 0.8 a is a constant which allows us to ass some weight to the most similar queries

Techniques for Distributed I.R. 3. Intelligent Search Mechanism (ISM) c) Similarity Function – The cosine similarity. Assume that L is a set of all words (in Profile Manager)\ e.g. L={elections, bush, clinton, super, bowl, san, diego, … ,italy, earthquake, disaster} We define an |L|-dimensional space where each query is a vector. If q=“italy disaster” => q (vector of q) = [0,0,0,…,1,0,1] Recall that we have a vector for each qi stored in the Profile Manager ( i.e. qi )

Techniques for Distributed I.R. 3. Intelligent Search Mechanism (ISM) d) Search Mechanism Utilizes the Peer Ranking Mechanism to forward Queries to nodes that will potentially contain the info we are looking for Peer d profiles ? QUERY 1 Peer q ?

Presentation Outline Introduction: Information Retrieval (I.R) in Peer-to-Peer networks. Techniques for Distributed I.R. Breadth-First Search. Random Breadth-First Search. Intelligent Search with profiling. Experimental Evaluation. Related Work. Conclusions & Future Work.

Experimental Evaluation We use a decentralized Newspaper application built on top of the REUTERS dataset (22,531 documents grouped by 84 countries). Random Network of 100 peers Each peer has documents from 3 countries The average degree of a node is 7 ~= log2100 (connected graph) Data-Peer (e.g. usa) PDOM-XML Manager P2P Network Module XML Data Files XQL u.k germany france Routing Structures (Profiles) mexico argentina china india greece italy usa.graph

Experimental Evaluation We perform 400 sequential queries with a delay of 4 sec. We compare Doc. Ratio (recall rate) vs. Num. of messages BFS (Gnutella Message Flooding) (forward to degree nodes). Modified BFS (randomly forward to degree/2 nodes). Intelligent Search Mechanism (forward to M=3 highest rank nodes + 1 random).

Experimental Evaluation We measure Doc. Ratio (recall rate) vs. Num. of messages with Time-to-Live (TTL)=4 BFS (Gnutella) uses ~763 messages w/ recall rate 100% Random BFS(degree/2) uses ~120 (16%) msgs w/ recall rate 42% Intelligent Search uses ~131 (17%) msgs w/ recall rate ~55% Recall Rate improves over time with Intelligent Search since Peer Profiles get more knowledge.

Experimental Evaluation We again measure Doc. Ratio (recall rate) vs. Num. of messages by increasing Time-to-Live (TTL) = 5 BFS (TTL=4) uses ~763 messages w/ recall rate 100% Random BFS(degree/2) uses ~28% msgs w/ recall rate ~72% Intelligent Search uses ~35% (of BFS msgs) w/ recall rate ~90% ! A large number of peers receive unnecessary messages. We get almost identical recall (90%) with only 35% of msgs

Presentation Outline Introduction: Information Retrieval (I.R) in Peer-to-Peer networks. Techniques for Distributed I.R. Breadth-First Search. Random Breadth-First Search. Intelligent Search with profiling. Experimental Evaluation. Related Work. Conclusions & Future Work.

Related Work Improving Search in P2P B.Yang et al. (Stanford) Iterative Deepening, until Z results are returned Directed BFS based on aggregate statistics (e.g. num of results a peer returned, shortest queue, forwarded the most data) Local Indexes, each node maintains an index over the data of peers r hops away. Routing Indices for P2P Crespo et al. (Stanford) Compound Indices, each node sends a clustered summary of its topic to its neighbors. (e.g. 100 databases, 4 theory, 10 OS) Might be too costly for Highly dynamic P2P systems. Compound Indices => costly for highly dynamic systems.

Related Work Freenet (Clark et al.) Search by Identifiers. uses SHA1 hashes of resources and information is retrieved based on the key closeness in a DFS manner. Others such as Chord. Systems that focus on scalable object location, which becomes feasible by hashing and distributing objects in the P2P system. (Searches are by Identifier). Chord -> But we are searching using keywords

Conclusions P2P systems offer several advantages such as scalability, robustness and simplicity of use. Efficient P2P Information Retrieval is not feasible with the current Search Algorithms. We propose an Intelligent Search Mechanism that uses local knowledge to improve Information Retrieval in P2P. Our mechanism achieves 90% recall rate while using only 35% of the initial messaging.

Future Work We plan to deploy our middleware infrastructure on a larger P2P network with more Queries. We want to probe different Network Topologies such as ASMap with PowerLaws. We want to probe different Peer-Profile maintenance policies at peers. Compare the performance of our method with different proposed algorithms (iterative deepening, local indexes, etc).

“A Local Search Mechanism for Peer-to-Peer Networks” Vana Kalogeraki, Dimitrios Gunopulos & Demetris Zeinalipour (University of California – Riverside) < vana@cs.ucr.edu, dg@cs.ucr.edu, csyiazti@cs.ucr.edu > CIKM 2002 – Eleventh International Conference on Information and Knowledge Management November 4-9, Mclean VA http://www.cs.ucr.edu/~csyiazti/publications.html