IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Scalable Content-Addressable Network Lintao Liu
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Efficient Search - Overview Improving Search In Peer-to-Peer Systems Presented By Jon Hess cs294-4 Fall 2003.
Improving Search in Peer-to-Peer Networks Beverly Yang Hector Garcia-Molina Presented by Shreeram Sahasrabudhe
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Routing Indices For Peer-to-Peer Systems Svetlana Strunjas University of Cincinnati May,2002.
Routing Indices For Peer-to-Peer Systems Arturo Crespo, Hector Garcia-Molina Stanford ICDCS 2002.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Routing indexes A. Crespo & H. Garcia-Molina ICDCS 02.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
A Dynamic Routing Protocol for Keyword Search in Unstructured Peer-to-peer Networks Authors: Cong Shi, Dingyi Han, Yuanjie Liu, Shicong Meng, Yong Yu APEX.
UNIVERSITY OF JYVÄSKYLÄ Chedar P2P platform InBCT 3.2 Peer-to-Peer communication Cheese Factory -project Research Assistant.
“A Local Search Mechanism for Peer-to-Peer Networks”
A Trust Based Assess Control Framework for P2P File-Sharing System Speaker : Jia-Hui Huang Adviser : Kai-Wei Ke Date : 2004 / 3 / 15.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Efficient Search in Peer to Peer Networks By: Beverly Yang Hector Garcia-Molina Presented By: Anshumaan Rajshiva Date: May 20,2002.
Text-Based Content Search and Retrieval in ad hoc P2P Communities Francisco Matias Cuenca-Acuna Thu D. Nguyen
Searching in Unstructured Networks Joining Theory with P-P2P.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Cache Updates in a Peer-to-Peer Network of Mobile Agents Elias Leontiadis Vassilios V. Dimakopoulos Evaggelia Pitoura Department of Computer Science University.
P2P File Sharing Systems
Freenet: A Distributed Anonymous Information Storage and Retrieval System Presentation by Theodore Mao CS294-4: Peer-to-peer Systems August 27, 2003.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Searching In Peer-To-Peer Networks Chunlin Yang. What’s P2P - Unofficial Definition All of the computers in the network are equal Each computer functions.
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
1 Exploiting locality for scalable information retrieval in peer-to-peer networks D. Zeinalipour-Yazti, Vana Kalogeraki, Dimitrios Gunopulos Manos Moschous.
P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Routing indexes A. Crespo & H. Garcia-Molina ICDCS 02.
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Routing Indices For P-to-P Systems ICDCS Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.
“Information Retrieval in Peer-to-Peer Systems” Demetrios Zeinalipour-Yazti M.Sc. Thesis Defense Monday, May 5,
Structuring P2P networks for efficient searching Rishi Kant and Abderrahim Laabid Abderrahim Laabid.
Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
03/19/02Scalab Seminar Series1 Routing in Peer-to-Peer Systems Ramaswamy N.Vadivelu Scalab, ASU.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Information Retrieval in Peer to Peer Systems Modern Information Retrieval Sharif University of Technology Fall 2005.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.
1 Reading Report 3 Yin Chen 20 Feb 2004 Reference: Efficient Search in Peer-to-Peer Networks, Beverly Yang, Hector Garcia-Molina, In 22 nd Int. Conf. on.
1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
P2P Content Search: Give the Web Back to the People Matthias Bender Sebastin Michel Peter Triantafillou Gerhard Weikum Christian Zimmer Mariam John CSE.
CMSC 691B Multi-Agent System A Scalable Architecture for Peer to Peer Agent by Naveen Srinivasan.
Information Retrieval in Peer to Peer Systems Modern Information Retrieval Sharif University of Technology Fall 2005.
Composing Web Services and P2P Infrastructure. PRESENTATION FLOW Related Works Paper Idea Our Project Infrastructure.
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Peer-to-Peer Data Management
Early Measurements of a Cluster-based Architecture for P2P Systems
Paraskevi Raftopoulou, Euripides G.M. Petrakis
Query Caching in Agent-based Distributed Information Retrieval
DATA RETRIEVAL IN ADHOC NETWORKS
Presentation transcript:

IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos Presented By Ranjan Dash

IR Techniques For P2P Networks2 Layout Introduction P2P Network IR Techniques PeerWare Infrastructure and experiments

IR Techniques For P2P Networks3 Introduction Major challenge efficiently search the content of other peers Definition Large number of peers collaborate dynamically in an ad hoc manner and share information in large-scale distributed environments without centralized co-ordination P2P environment characteristic Each peer has a database or collection of docs Query contains set of key words Reply message contains pointers to matching documents Different from static data environments No central repository Nodes join and leave in ad hoc and dynamically

IR Techniques For P2P Networks4 P2P Network IR Techniques Breadth-First Search (BFS) Random Breadth-First-Search (RBFS) Intelligent Search Mechanism (ISM) Directed BFS and >RES Random Walker Searches Randomized Gossiping Local Routing Indices Centralized Approaches Searching Object Identifiers Distributed IR

IR Techniques For P2P Networks5 P2P Network IR Techniques Breadth-First Search (BFS) Widely used in file-sharing systems Propagates to all neighbors except sender QueryHit Msg (#of docs, bandwidth info) follows the same path Simple, guarantees high hit rate Poor in performance and network utilization Low bandwidth node - a bottleneck Can be improved using TTL

IR Techniques For P2P Networks6 P2P Network IR Techniques Random Breadth-First Search (RBFS) Dramatic improvements over BFS Forwards only to a fraction of its peers, selected at random Does not need global knowledge, takes local decisions - faster Probabilistic – might not reach some large network segments

IR Techniques For P2P Networks7 P2P Network IR Techniques Intelligent Search Mechanism (ISM) Quick, efficient and least communication costs Propagates only to peers more likely to reply Consists of 2 components that run in each peer Profile mechanism Relevance rank Works good for query locality Forwards to same neighbor always -Starvation for new peers Solution – add small random subset of peers to most relevant set

IR Techniques For P2P Networks8 P2P Network IR Techniques Profile mechanism Builds a profile for each of its neighboring peers Maintains T most recent Queries and QueryHits with no of results Least recently used replacement policy for most recent query

IR Techniques For P2P Networks9 P2P Network IR Techniques Relevance rank Ranking of neighbors to decide which ones to forward a query Ranking of a peer ‘Pi’ for a query ‘q’ Qsim is cosine similarity between 2 queries = 0, most results in the past that matters like >RES

IR Techniques For P2P Networks10 P2P Network IR Techniques Directed BFS and >RES forwards a query to a subset of its peers based on some aggregated statistics Send out to ‘k’ peers which had returned the most results for the last ‘m’ queries BFS turned into a DFS for ‘k’ = 1, ‘m’=10 Similar to ISM, but simpler Does not explore nodes that contain content related to query Performs well because it routes larger networks segments

IR Techniques For P2P Networks11 P2P Network IR Techniques Random-Walker Searches Each node randomly forwards a query message, called a walker to one of its peers Can be extended from 1-walker to k-walker Resembles RBFS but message numbers increase linearly Like RBFS does not use most relevant content to guide query Adaptive Probability search (APS) – similar Uses feed back from previous searches to probabilistically guide future walkers

IR Techniques For P2P Networks12 Randomized Gossiping – PlanetP Global inverted index, partially constructed by each node, called local index bloom filter Propagates it to the rest through gossiping Adv. Of bloom filter – Smaller messages Saving in network I/O Problem of scalability for PlanetP P2P Network IR Techniques

IR Techniques For P2P Networks13 Local Routing Indices by Arturo Crespo and Hector Garcia-Molina Hybrid technique uses local indices containing the “direction” toward the documents 3 techniques – compound routing indices (CRI) hop-count routing index (HRI) exponentially aggregated index (ERI) Good for topologies where only few nodes have very large numbers of neighbors - (tree, tree with cycles) The routing indices are similar to the routing tables deployed in the Bellman–Ford CRI - a node q maintains statistics for each neighbor that indicate how many documents are reachable through each neighbor. HRI - CRI for k hops – prohibitive storage cost for large k. ERI - addresses the issue of HRI by aggregating HRI using a cost formula. P2P Network IR Techniques

IR Techniques For P2P Networks14 Centralized Approaches maintain an inverted index over all the documents in the participating hosts’ collections - Google, Yahoo, Napster Each joining peer A uploads an index of all its shared documents to the central repository R. A querying node B searches A’s documents through R. B can communicate with A directly (using an out-of-band protocol such as HTTP). Kazaa - Little different. Uses a set of more-powerful peers that acts as a central repositories different kind of animal than the rest. Simple, Robust, shorter search time, guaranteed to find all results P2P Network IR Techniques

IR Techniques For P2P Networks15 Searching Object Identifiers Distributed file indexing systems - Chord, OceanStore, and Content –Addressable Network (CAN), Freenet efficient searches using object identifiers (a hashcode on the name of a file) rather than keywords. Perform object lookup operations to get the address (an IP address) of the node that is storing the object. Optimizes object retrieval by minimizing the numbers of messages and hops required. Disadvantage - only search for object identifiers and thus can’t capture the relevance of the doc. P2P Network IR Techniques

IR Techniques For P2P Networks16 Distributed IR Having distributed databases, the main IR problem is deciding which databases are most likely to contain the most relevant documents. It’s possible to achieve good results for conceptually separated collections. However, the assumption is that the querying party has some statistical knowledge about each database’s contents (word frequencies in documents) and therefore must have a global view of the system. P2P Network IR Techniques

IR Techniques For P2P Networks17 PeerWare Infrastructure and experiments Evaluation metrics – recall rate – the fraction of documents each of the search mechanisms retrieves Efficiency - the number of messages needed to find the results Implemented only algorithms that require local knowledge when searching for documents. BFS (the baseline) Implemented RBFS, >RES (k = 0.5 * d and m = 100, where d is the degree of a node), and ISM these 3 techniques forward query messages to half the neighbors that BFS contacts. >RES and ISM use previous knowledge to decide on which peers to forward the query

IR Techniques For P2P Networks18 BFS requires almost 2.5 times as many messages as its competitors. PeerWare Infrastructure and experiments

IR Techniques For P2P Networks19 PeerWare Infrastructure and experiments ISM found the most documents. ISM achieved almost a 90-percent recall rate while using only 38 percent of the messages BFS required. ISM improves its knowledge over time. Both >RES and ISM started out with a low recall rate (around 40 to 50 percent) because initially they randomly choose their neighbors.