A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID 993934582.

Slides:



Advertisements
Similar presentations
Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Site Level Noise Removal for Search Engines André Luiz da Costa Carvalho Federal University of Amazonas, Brazil Paul-Alexandru Chirita L3S and University.
A Music Search Engine Built upon Audio-based and Web-based Similarity Measures P. Knees, T., Pohle, M. Schedl, G. Widmer SIGIR 2007.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Evaluating Search Engine
Information Retrieval in Practice
 How many pages does it search?  How does it access all those pages?  How does it give us an answer so quickly?  How does it give us such accurate.
PROBLEM BEING ATTEMPTED Privacy -Enhancing Personalized Web Search Based on:  User's Existing Private Data Browsing History s Recent Documents 
WMES3103 : INFORMATION RETRIEVAL
INFO 624 Week 3 Retrieval System Evaluation
Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.
Compare&Contrast: Using the Web to Discover Comparable Cases for News Stories Presenter: Aravind Krishna Kalavagattu.
The Vector Space Model …and applications in Information Retrieval.
INEX 2003, Germany Searching in an XML Corpus Using Content and Structure INEX 2003, Germany Yiftah Ben-Aharon, Sara Cohen, Yael Grumbach, Yaron Kanza,
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
1 Matching DOM Trees to Search Logs for Accurate Webpage Clustering Deepayan Chakrabarti Rupesh Mehta.
Vocabulary Spectral Analysis as an Exploratory Tool for Scientific Web Intelligence Mike Thelwall Professor of Information Science University of Wolverhampton.
Information Retrieval
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Databases & Data Warehouses Chapter 3 Database Processing.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Search A Basic Overview Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata October 20, 2014.
Pete Bohman Adam Kunk.  ChronoSearch: A System for Extracting a Chronological Timeline ChronoChrono.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Chapter 6: Information Retrieval and Web Search
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Vector Space Models.
Yajuan Lü, Jin Huang and Qun Liu EMNLP, 2007 Presented by Mei Yang, May 12nd, 2008 Improving SMT Preformance by Training Data Selection and Optimization.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Sudhanshu Khemka.  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
A code-centric cluster-based approach for searching online support forums for programmers Christopher Scaffidi, Christopher Chambers, Sheela Surisetty.
By Pamela Drake SEARCH ENGINE OPTIMIZATION. WHAT IS SEO? Search engine optimization (SEO) is the process of affecting the visibility of a website or a.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
INTERNET VOCAB. WEB BROWSER An app for finding info on the web.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
General Architecture of Retrieval Systems 1Adrienn Skrop.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Search Engine Optimization
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Information Retrieval and Web Search
IR Theory: Evaluation Methods
HITS Hypertext Induced Topic Selection
Disambiguation Algorithm for People Search on the Web
HITS Hypertext Induced Topic Selection
Information Retrieval and Web Design
Unsupervised learning of visual sense models for Polysemous words
WSExpress: A QoS-Aware Search Engine for Web Services
Discussion Class 9 Google.
Presentation transcript:

A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID

Problem statement Inability of existing search engines to answer relationship queries, although they excel in keyword matching and document ranking. Focus of the paper on finding relationship between two entities given as queries, by finding top ranked Web pages for each query and matching them to form list of web page pairs. Use of connecting terms for determining the relationship and ranking the Web page pairs. Given two entities E1 and E2, a Web search engine displays top pages which do not show any relationship between E1 and E2 Attempt to overcome the shortcoming of current search engines, by providing a system and interface for relationship queries. Proposed system dependent on Google search engine.

Solution Proposed The proposed system accepts two entities as queries through its interface. The top ranked pages of each entity E1 and E2 are retrieved separately from a search engine like Google. These pages or documents are preprocessed: elimination of HTML tags, stemming of words, stop-word removal (Porter stemmer) and elimination of irrelevant words (noise removal). Calculation of term weight for common term ‘t’ that shows relationship between P1 & P2.( P1 is a result of query E1, P2 of E2). Connecting terms: terms having higher term weights Use of cosine similarity (OKAPI method) to calculate similarity between P1 and P2—( Replacing ‘document’ and ‘query’ by P1 & P2 respectively) Sorting the web-page pairs in descending order of similarity( or weights) and displaying them along with the connecting terms for each pair.

Criticism of the solution Assumption: Top-ranked pages for E1 and top-ranked pages for E2 do not contain any relationship between E1 and E2. No ground truth provided. The fact might be the exact opposite. Overview of the relationship between entities E1 and E2 given as a random term ‘Ec’. Explanation missing about ‘Ec’. Less processing tasks, heavy dependence on Google results. If “Google” results are not perfect or correct (rarely…!!), the system fails. Explicit mention of “changes in results” if Google results vary. Use of standard “Porter Stemmer”. This stemmer is not so perfect. Stemming  (“ignition” is stemmed to “ignit”, “Monday” to “Mondai”) Paper concluded by unnecessary explanation of the influence on results when the steps of the proposed approach are eliminated one at a time, although all steps are necessary for the proper implementation of the system.

Relevance to IRM Significant relevance to the topics taught in the course. The crux of the paper is similarity calculation between Web Page Pairs(P1,P2). Cosine similarity is used for the same. The concept of TF-IDF is used for determining the term weights for terms present in the documents P1 and P2. Use of stemming to obtain root words Ranking done on the basis of the similarity values of the Web page pairs.

The proposed system accepts two entities as queries through its interface. The top ranked pages of each entity E1 and E2 are retrieved separately from a search engine like Google. These pages or documents are preprocessed: elimination of HTML tags, stemming of words, stop-word removal (Porter stemmer) and elimination of irrelevant words (noise removal). Calculation of term weight for common term ‘t’ that shows relationship between P1 & P2.( P1 is a result of query E1, P2 of E2). Connecting terms: terms having higher term weights Use of cosine similarity (OKAPI method) to calculate similarity between P1 and P2—( Replacing ‘document’ and ‘query’ by P1 & P2 respectively) Sorting the web-page pairs in descending order of similarity( or weights) and displaying them along with the connecting terms for each pair. Significant relevance to the topics taught in the course. The crux of the paper is similarity calculation between Web Page Pairs(P1,P2). Cosine similarity is used for the same. The concept of TF-IDF is used for determining the term weights for terms present in the documents P1 and P2. Use of stemming to obtain root words Ranking done on the basis of the similarity values of the Web page pairs. Inability of existing search engines to answer relationship queries, although they excel in keyword matching and document ranking. Focus of the paper on finding relationship between two entities given as queries, by finding top ranked Web pages for each query and matching them to form list of web page pairs. Use of connecting terms for determining the relationship and ranking the Web page pairs. Given two entities E1 and E2, a Web search engine displays top pages which do not show any relationship between E1 and E2 Attempt to overcome the shortcoming of current search engines, by providing a system and interface for relationship queries. Proposed system dependent on Google search engine. Assumption: Top-ranked pages for E1 and top-ranked pages for E2 do not contain any relationship between E1 and E2. No ground truth provided. The fact might be the exact opposite. Overview of the relationship between entities E1 and E2 given as a random term ‘Ec’. Explanation missing about ‘Ec’. Less processing tasks, heavy dependence on Google results. If “Google” results are not perfect or correct (rarely…!!), the system fails. Explicit mention of “changes in results” if Google results vary. Use of standard “Porter Stemmer”. This stemmer is not so perfect. Stemming  (“ignition” is stemmed to “ignit”, “Monday” to “Mondai”) Paper concluded by unnecessary explanation of the influence on results when the steps of the proposed approach are eliminated one at a time, although all steps are necessary for the proper implementation of the system. Problem statement (1) Criticism of the solution (3) Relevance to IRM (4) Solution Proposed (2)