P2P Content Search: Give the Web Back to the People Matthias Bender Sebastin Michel Peter Triantafillou Gerhard Weikum Christian Zimmer Mariam John CSE.

Slides:



Advertisements
Similar presentations
P2P Content Search: Give the Web Back to the People Christian Zimmer, Matthias Bender, Sebastian Michel, Gerhard Weikum Max-Planck-Institut for Informatics,
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
03/20/2003Parallel IR1 Papers on Parallel IR Agenda Introduction Paper 1:Inverted file partitioning schemes in multiple disk systems Paper 2: Parallel.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
On the Usage of Global Document Occurrences (GDO) in P2P Information Systems or… Avoiding overlapping results in P2P searching Odysseas Papapetrou 1,2.
A Distributed Indexing Strategy for Efficient XML Retrieval Efficiency Issues in Information Retrieval Workshop 30th European Conference on Information.
Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Routing of Structured Queries in Large-Scale Distributed Systems Workshop on Large-Scale Distributed Systems for Information Retrieval ACM.
MINERVA Infinity: A Scalable Efficient Peer-to-Peer Search Engine Middleware 2005 Grenoble, France Sebastian Michel Max-Planck-Institut für Informatik.
Object Naming & Content based Object Search 2/3/2003.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Text-Based Content Search and Retrieval in ad hoc P2P Communities Francisco Matias Cuenca-Acuna Thu D. Nguyen
Parallel and Distributed IR
EPFL-I&C-LSIR [P-Grid.org] Workshop on Distributed Data and Structures ’04 NCCR-MICS [IP5] presented by Anwitaman Datta Joint work with Karl Aberer and.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
“Umbrella”: A novel fixed-size DHT protocol A.D. Sotiriou.
Supporting Ranked Search in Parallel Search Cluster Networks Fang XiongQiong LuoDyce Jing Zhao {xfang, luo, Hong Kong University of.
Peer-to-peer file-sharing over mobile ad hoc networks Gang Ding and Bharat Bhargava Department of Computer Sciences Purdue University Pervasive Computing.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
Hashing it Out in Public Common Failure Modes of DHT-based Anonymity Schemes Andrew Tran, Nicholas Hopper, Yongdae Kim Presenter: Josh Colvin, Fall 2011.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.
COCONET: Co-Operative Cache driven Overlay NETwork for p2p VoD streaming Abhishek Bhattacharya, Zhenyu Yang & Deng Pan.
Multi-level Hashing for Peer-to-Peer System in Wireless Ad Hoc Environment Dewan Tanvir Ahmed and Shervin Shirmohammadi Distributed & Collaborative Virtual.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
A Distributed Architecture for Multi-dimensional Indexing and Data Retrieval in Grid Environments Athanasia Asiki, Katerina Doka, Ioannis Konstantinou,
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Master Thesis Defense Jan Fiedler 04/17/98
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Full-Text Search in P2P Networks Christof Leng Databases and Distributed Systems Group TU Darmstadt.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 2 ARCHITECTURES.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Routing Indices For P-to-P Systems ICDCS Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.
Efficient Peer to Peer Keyword Searching Nathan Gray.
Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong,
Search engines are used to for looking for documents. They compile their databases by employing "spiders" or "robots" to crawl through web space from.
Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
PEERSPECTIVE.MPI-SWS.ORG ALAN MISLOVE KRISHNA P. GUMMADI PETER DRUSCHEL BY RAGHURAM KRISHNAMACHARI Exploiting Social Networks for Internet Search.
AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
Modern Information Retrieval Chapter 9: Parallel and Distributed IR Section 9.1: Introduction Section : MIMD Architectures Inverted Files November.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Data Indexing in Peer- to-Peer DHT Networks Garces-Erice, P.A.Felber, E.W.Biersack, G.Urvoy-Keller, K.W.Ross ICDCS 2004.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Distributed Hash Tables Steve Ko Computer Sciences and Engineering University at Buffalo.
SIP-Based or DHT-Based? November 12, 2005 Eunsoo Shim Panasonic Digital Networking Laboratory P2P SIP Ad-hoc Meeting IETF64, Vancouver.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Modern Information Retrieval
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
CSE 486/586 Distributed Systems Distributed Hash Tables
CSE 486/586 Distributed Systems Distributed Hash Tables
CHAPTER 3 Architectures for Distributed Systems
EE 122: Peer-to-Peer (P2P) Networks
Peer to Peer Information Retrieval
Bookmark-driven Query Routing in Peer-to-Peer Web Search
CSE 486/586 Distributed Systems Distributed Hash Tables
Presentation transcript:

P2P Content Search: Give the Web Back to the People Matthias Bender Sebastin Michel Peter Triantafillou Gerhard Weikum Christian Zimmer Mariam John CSE /27/2006

Why P2P Web Search  Full-fledged web search is under the control of centralized search engines.  Growing concern about the world’s dependency on a few quasi-monopolistic search engines and their susceptibility to commercial interests, spam, censorship, etc.  P2P search engine might be more robust than centralized search as the demise of a single server or site is unlikely to paralyze the entire search system.  All this leads to postulation that “the Web should be given back to the people”.

Challenges: P2P web search likely to work?  P2P search system has two main resource contstraints: storage and bandwidth.  Distribute conceptually global keyword index across a DHT-style network.  From a query processing and IR viewpoint, one of the key issues is query routing (Given a query, to which other peers should the query be forwarded to get the top-k ranked result set).  This decision requires statistical information about the data contents in the network. It can be made fairly efficient by utilizing a DHT-based distributed directory.

Challenges  Efficiency of P2P query routing is only one side of the coin. How about quality of the search result?  Goal is to be as good as centralized search engines.  P2P approach faces the challenge that the index lists and statistical information that lead to good search results are scattered across the network.

System Architecture - Minerva  Is a fully operational distributed search engine consisting of autonomous peers where:  Each peer has a local document collection.Local data collection is indexed by inverted lists, one for each keyword or term.  Conceptually global but physically distributed directory which is layered on top of a Chord-style distributed hash table (DHT) manages aggregated information about the peers local knowledge in compact form.  Chord DHT partitions the term space such that each peer is responsible for the statistics and metadata of a randomized set subset of terms within the directory.

Directory Maintenance  In the publishing process, each peer distributes per-term summaries (Posts) of its local index to the global directory.  The DHT determines the peer responsible for this term and this peer maintains a PeerList of all posts for this term.  Employs proactive replication of directory information to ensure certain degree of replication.

Query Execution  A query with multiple terms is processed as follows:  Query is executed locally using the peer’s local index.  If the user considers this result unsatisfactory, the peer issues a PeerList request to the directory for looking up potentially promising peers for each query term separately.  Query is executed completely on each of the remote peer.

Query Routing  Most query routing techniques works well on disjoint data collection.  What happens when autonomous peers crawl the web independently of each other.  It results in overlap of information which may be indexed my peers.

Exploiting correlations in Queries  Directory information about term correlation can be exploited for query routing in several ways.  First method:  Treat correlated term combinations as keys for DHT based overlay networks  Query initiator can locate the responsible directory peer by simply hashing the key and using standard DHT lookup routing.

..cont’d  Directory entry directly provides the query initiator with the query-specific peerList that reflects the best peer for the entire query.  What happens if this peerList is too short?  Query initiator always has the fallback option of decomposing the query into its individual terms and retrieving peerLists for each term.  What is the problem with the above method?

..cont’d  We still collect peerLists of high correlation term combinations.  Look up the directory for each query term separately.  Whenever a directory peer has a good peerList for the entire query, this information is returned to the query initiator, together with the per-term peerList.  This doesn’t cause any additional communication costs and also provides the query initiator with the best available information on all individual terms as well as the entire query.

Conclusion  Research efforts in the area of P2P content search is driven by the desire to “give the Web back to the people”.  This paper has explored the theme of leveraging “power of users” in a P2P Web search engine.  Observing user and community and user behavior is one potential key towards better search result quality.

References  Mathias Bender, Sebastin Michel,Peter Triantafillou, Gerhard Weikum, and Christian Zimmer,” Minerva:Collaborative P2P Search”, In VLDB,2005.  Mathias Bender, Sebastin Michel,Peter Triantafillou, Gerhard Weikum, and Christian Zimmer,” P2P Content Search: Give the Web Back to the People”.  Mathias Bender, Sebastin Michel,Peter Triantafillou, Gerhard Weikum, and Christian Zimmer, “ Improving Collection Selection with Overlap-Awareness,” In SIGIR, 2005.