AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin.

Slides:



Advertisements
Similar presentations
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Advertisements

P2PIR'06: "Distributed Cache Table (DCT)" Gleb Skobeltsyn, Karl Aberer D istributed T able: Efficient Query-Driven Processing of Multi-Term Queries in.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
PDPTA03, Las Vegas, June S-Chord: Using Symmetry to Improve Lookup Efficiency in Chord Valentin Mesaros 1, Bruno Carton 2, and Peter Van Roy 1 1.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
University of Cincinnati1 Towards A Content-Based Aggregation Network By Shagun Kakkar May 29, 2002.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
A Distributed Indexing Strategy for Efficient XML Retrieval Efficiency Issues in Information Retrieval Workshop 30th European Conference on Information.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
A Scalable Semantic Indexing Framework for Peer-to-Peer Information Retrieval University of Illinois at Urbana-Champain Zhichen XuYan Chen Northwestern.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Routing of Structured Queries in Large-Scale Distributed Systems Workshop on Large-Scale Distributed Systems for Information Retrieval ACM.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Submitting: Barak Pinhas Gil Fiss Laurent Levy
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Ecole Polytechnique Fédérale de Lausanne, Switzerland Efficient processing of XPath queries with structured overlay networks Gleb Skobeltsyn, Manfred Hauswirth,
Peer-to-peer file-sharing over mobile ad hoc networks Gang Ding and Bharat Bhargava Department of Computer Sciences Purdue University Pervasive Computing.
Overview of Search Engines
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
G.Skobeltsyn | Query-Driven Indexing for Scalable P2P Text Retrieval Query-Driven Indexing for Scalable P2P Text Retrieval Infoscale’07, June 6-8, 2007.
G.Skobeltsyn | Query-Driven Indexing for P2P Text Retrieval Query-Driven Indexing for P2P Text Retrieval The Future of Web Search Bertinoro,
Query-Driven Indexing for Peer-to-Peer Text Retrieval ** WWW 2007 Banff, Canada Contact: Gleb Skobeltsyn Contact: Gleb Skobeltsyn
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
On P2P Collaboration Infrastructures Manfred Hauswirth, Ivana Podnar, Stefan Decker Infrastructure for Collaborative Enterprise, th IEEE International.
A Distributed Architecture for Multi-dimensional Indexing and Data Retrieval in Grid Environments Athanasia Asiki, Katerina Doka, Ioannis Konstantinou,
1 P2P File-Sharing Solution CS654 – Software Architecture course project Guide: T V Prabhakar Members: S Pavan Kumar – Y1306 D V Janardhan Rao – Y
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Full-Text Search in P2P Networks Christof Leng Databases and Distributed Systems Group TU Darmstadt.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Efficient Peer to Peer Keyword Searching Nathan Gray.
PSI Peer Search Infrastructure. Introduction What are P2P Networks? The term "peer-to-peer" refers to a class of systems and applications that employ.
Othman Othman M.M., Koji Okamura Kyushu University 1.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Serverless Network File Systems Overview by Joseph Thompson.
Modern Information Retrieval Chapter 9: Parallel and Distributed IR Section 9.1: Introduction Section : MIMD Architectures Inverted Files November.
Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys ICDE 2007 Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys Ivana.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
Efficient Peer-to-Peer Keyword Searching 1 Efficient Peer-to-Peer Keyword Searching Patrick Reynolds and Amin Vahdat presented by Volker Kudelko.
Freenet “…an adaptive peer-to-peer network application that permits the publication, replication, and retrieval of data while protecting the anonymity.
Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Data Indexing in Peer- to-Peer DHT Networks Garces-Erice, P.A.Felber, E.W.Biersack, G.Urvoy-Keller, K.W.Ross ICDCS 2004.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Peer-to-Peer Systems: An Overview Hongyu Li. Outline  Introduction  Characteristics of P2P  Algorithms  P2P Applications  Conclusion.
G.Skobeltsyn | Web Text Retrieval with a P2P Query-Driven Index Web Text Retrieval with a P2P Query-Driven Index Gleb Skobeltsyn EPFL, Lausanne Switzerland.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
P2P Content Search: Give the Web Back to the People Matthias Bender Sebastin Michel Peter Triantafillou Gerhard Weikum Christian Zimmer Mariam John CSE.
Malugo – a scalable peer-to-peer storage system..
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
CHAPTER 3 Architectures for Distributed Systems
Determining the Peer Resource Contributions in a P2P Contract
Presentation transcript:

AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin Rajman, Karl Aberer VLDB_ /5/161

Outlines Introduction Distributed indexing/retrieval AlvisP2P architecture AlvisP2P software Conclusion 2012/5/162

Introduction The AlvisP2P IR engine enables efficient retrieval with multi-keyword queries from a global document collection available in a P2P network. ◦ Uses an optimized overlay network. ◦ A novel indexing/retrieval mechanisms. ◦ Ensure low bandwidth consumption. ◦ Enabling unlimited network growth. 2012/5/163

Cont. Two important properties: ◦ the generated distributed index stores posting lists for carefully chosen indexing term combinations (hereafter called keys), and ◦ the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. 2012/5/164

Cont. Guaranteeing acceptable storage and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the transmitted posting lists never exceed a constant size. 2012/5/165

Cont. Two key generation techniques : ◦ Highly Discriminative Keys (HDK) ◦ Query-Driven Indexing (QDI) Overcoming the scalability problem of single-term retrieval in structured P2P networks, while preserving a retrieval quality fully. 2012/5/166

Distributed indexing/retrieval Each peer is responsible for: ◦ the generation of index entries to be stored in the global distributed index for its local documents, and ◦ the storage and maintenance of the fraction of the global index associated with the keys that have been assigned to the peer by the hashing mechanism used in the underlying DHT. 2012/5/167

Cont. HDK approach : ◦ Generates new keys during the indexing phase based on observed document frequencies: ◦ Each time a posting list for some key k exceeds a predefined size, new indexing keys with more terms are generated. QDI approach : ◦ Using decentralized monitoring of query statistics to detect and index new popular keys, as well as to remove obsolete keys from the index. 2012/5/168

Cont. Steps : A peer receives a new query. To explore the lattice of query term combinations(query lattice). The querying peer requests the posting list associated with the term combination from the peer responsible for it. 2012/5/169

Cont. If the term combination is indeed present in the global index, the requested posting list is sent back to the querying peer. If this list is not truncated, the part of the query lattice dominated by the term combination is excluded from further lattice exploration. 2012/5/1610

Cont. 2012/5/1611 Additional approximations can be made to improve load balancing with an only marginal loss in retrieval precision.

Cont. During the exploration, each contacted peer also updates the usage statistics for the requested term combination. The querying peer produces their union, ranks all the documents w.r.t the original query, and presents the top-ranked results to the user. 2012/5/1612

Cont. On-demand indexing mechanism. The peer responsible for this key acquires a new posting list containing a bounded number of top-ranked document references. The key is then considered as indexed and can thus be used for subsequent query processing. 2012/5/1613

Cont. Obsolete keys can be removed. Resulting in an efficient indexing structure adaptive to the current query popularity distribution. Increasing the overall retrieval quality. 2012/5/1614

AlvisP2P architecture 2012/5/1615

Cont. The components in higher layers exclusively rely on the functionalities provided by lower layers. Layers 1 and 2 implement the peer-to- peer overlay infrastructure. Layer 2 consists of a Distributed Hash Table (DHT) that is able to sustain high traffic loads. 2012/5/1616

Cont. Layer 3 implements one of the aforementioned techniques. Layer 4 is responsible for ranking. ◦ Ranking model ◦ Global document frequencies, average document length, term frequencies and other statistical information. 2012/5/1617

Cont. Layer 5 implements a possibly sophisticated “local search engine”. 2012/5/1618

AlvisP2P software The user can use a Web browser to query the AlvisP2P network. 2012/5/1619

Cont. Figure 5 shows the client's “Search” tab with a query result. 2012/5/1620

Cont. The corresponding “Manager of shared documents” tab is displayed in Figure /5/1621

Conclusion We have presented the AlvisP2P prototype for scalable full-text P2P-IR that uses carefully selected indexing term combinations associated with possibly truncated posting lists. The AlvisP2P can comparable to state-of- the-art centralized search engine. 2012/5/1622