A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.

Slides:



Advertisements
Similar presentations
1 Web Search Environments Web Crawling Metadata using RDF and Dublin Core Dave Beckett Slides:
Advertisements

Change… The process of becoming different… Why? Ageing population… too many older people for the current system to cope with… Or too few young people???
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
CS 431 The Semester in Elevator Speak Carl Lagoze – Cornell University May 5, 2004.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Grouping Search-Engine Returned Citations for Person Name Queries Reema Al-Kamha Research Supported by NSF.
Looking at both the Present and the Past to Efficiently Update Replicas of Web Content Luciano Barbosa * Ana Carolina Salgado ! Francisco Tenorio ! Jacques.
1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.
Algorithms (Contd.). How do we describe algorithms? Pseudocode –Combines English, simple code constructs –Works with various types of primitives Could.
P2P Networks Connecting Businesses, Individuals, and the World By: Katie Solie.
A machine learning approach to improve precision for navigational queries in a Web information retrieval system Reiner Kraft
INFO 624 Week 3 Retrieval System Evaluation
Routing of Structured Queries in Large-Scale Distributed Systems Workshop on Large-Scale Distributed Systems for Information Retrieval ACM.
Learning Bit by Bit Search. Information Retrieval Census Memex Sea of Documents Find those related to “new media” Brute force.
Parallel and Distributed IR
Information Retrieval
UNIVERSITY OF JYVÄSKYLÄ Resource Discovery Using NeuroSearch Presentation for the Agora Center InBCT-seminar Mikko Vapa, researcher InBCT 3.2.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
Cmpe 494 Peer-to-Peer Computing Anıl Gürsel Didem Unat.
Query-Driven Indexing for Peer-to-Peer Text Retrieval ** WWW 2007 Banff, Canada Contact: Gleb Skobeltsyn Contact: Gleb Skobeltsyn
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014.
MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.
FINDING NEAR DUPLICATE WEB PAGES: A LARGE- SCALE EVALUATION OF ALGORITHMS - Monika Henzinger Speaker Ketan Akade 1.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
Full-Text Search in P2P Networks Christof Leng Databases and Distributed Systems Group TU Darmstadt.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
ACM NOSSDAV 2007, June 5, 2007 IPTV Experiments and Lessons Learned Panelist: Klara Nahrstedt Panel: Large Scale Peer-to-Peer Streaming & IPTV Technologies.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Gregor Gisler-Merz How to hit in google The anatomy of a modern web search engine.
Efficient P2P Searches Using Result-Caching From U. of Maryland. Presented by Lintao Liu 2/24/03.
PSI Peer Search Infrastructure. Introduction What are P2P Networks? The term "peer-to-peer" refers to a class of systems and applications that employ.
A fast algorithm for the generalized k- keyword proximity problem given keyword offsets Sung-Ryul Kim, Inbok Lee, Kunsoo Park Information Processing Letters,
Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
GCSE ICT Year 9 Project 1a Collecting Information.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Ranking CSCI 572: Information Retrieval and Search Engines Summer 2010.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Searching the World Wide Web: Meta Crawlers vs. Single Search Engines By: Voris Tejada.
1 University of California, Irvine Done By : Ala Khalifeh (Note : Not Presented)
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Search Engines: A History  First search engine was Veronica for the Gopher network  1991 Gopher  After Gopher disappeared, the first one for modern.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Query Models CSCI 572: Information Retrieval and Search Engines Summer 2010.
Searching the Internet (Web) By Brigid Kosek Clara Love Elementary 2011.
Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: -
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
CS 440 Database Management Systems Web Data Management 1.
Theme Guidance - Network Traffic Proposed NMLRG IETF 95, April 2016 Sheng Jiang (Speaker, Co-chair) Page 1/6.
Traffic Source Tell a Friend Send SMS Social Network Group chat Banners Advertisement.
PageRank Google : its search listings always seemed deliver the “good stuff” up front. 1 2 Part of the magic behind it is its PageRank Algorithm PageRank™
Composing Web Services and P2P Infrastructure. PRESENTATION FLOW Related Works Paper Idea Our Project Infrastructure.
Large Scale Search: Inverted Index, etc.
DATA MINING Introductory and Advanced Topics Part III – Web Mining
Google’s Deep Web Crawler
ما الذي يريد صاحب العمل أن يعرفه؟
Paraskevi Raftopoulou, Euripides G.M. Petrakis
CS 440 Database Management Systems
CSSE 492 Final Review Dr. Yingwu Zhu Spring 2008.
Chaitali Gupta, Madhusudhan Govindaraju
Discussion Class 9 Google.
Presentation transcript:

A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang

What's wrong with ? unlikely to index everything that‘s of interest (deep web) infeasible to run expensive algorithms on 8 billion documents difficult to input human knowledge

Peer-to-peer search Approach 0 Each peer has a local crawler and index Nobody posts any information about local indices Search can only be done by (limited) flooding No way to know where to find information in advance Very low recall for unpopular queries Matrix factorizatio n Relevant nerd

P2P Search Other methods have been proposed (see I. Weber 2004) What’s wrong? –Too complicated protocol to collaborate the peers –Too much data traffic and communication –Low speed

Hybrid—possible solution Combine Google and P2P together –Google indexes all the peer machine, but how?? –Each peer machine has an local index –When querying, Google selects the “appropriate” peers and sends the query. –Finally, Google merges all the results together.

Hybrid—possible solution Benefits: –Efficient compared to P2P –May overcome Google’s drawback Challenge: –Google’s PageRank is benefited from its large scale of indexed documents, how to adapt to the hybrid system –How does Google collaborate with peer machine? How can the peer machine benefit from Google’s PageRank? Funding this with $10M, do you agree?

References I. Weber et al (2004) Concept-based P2P Search sb.mpg.de/~iweber/peer-to-peer/Concept- based%20P2P%20Search.ppthttp:// sb.mpg.de/~iweber/peer-to-peer/Concept- based%20P2P%20Search.ppt Inspired by the discussion with Shui- Lung Chuang