Query Expansion.

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan.
Advertisements

Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Information Retrieval and Data Mining (AT71. 07) Comp. Sc. and Inf
Multimedia Database Systems
CpSc 881: Information Retrieval
Relevance Feedback and Query Expansion
Kevin Knight,
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
CS276 Information Retrieval and Web Search Lecture 9.
CS276A Information Retrieval Lecture 9. Recap of the last lecture Results summaries Evaluating a search engine Benchmarks Precision and recall.
Modern Information Retrieval
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
Modern Information Retrieval Chapter 5 Query Operations.
Query Reformulation: User Relevance Feedback. Introduction Difficulty of formulating user queries –Users have insufficient knowledge of the collection.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Information Retrieval - Query expansion
Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms  Synonyms, stemming variations, terms.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
PrasadL11QueryOps1 Query Operations Adapted from Lectures by Prabhakar Raghavan (Yahoo, Stanford) and Christopher Manning (Stanford)
Modern Information Retrieval Relevance Feedback
Query Relevance Feedback and Ontologies How to Make Queries Better.
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Class web page:
CS626/449 : Speech, NLP and the Web/Topics in AI Programming (Lecture 6: Wiktionary; semantic relatedness; how toread research papers) Pushpak Bhattacharyya.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
PrasadL11QueryOps1 Query Operations Adapted from Lectures by Prabhakar Raghavan (Google) and Christopher Manning (Stanford)
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Modern Information Retrieval Computer engineering department Fall 2005.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
1 Query Operations Relevance Feedback & Query Expansion.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Modeling term relevancies in information retrieval using Graph Laplacian Kernels Shuguang Wang Joint work with Saeed Amizadeh and Milos Hauskrecht.
CS276 Information Retrieval and Web Search
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
C.Watterscsci64031 Probabilistic Retrieval Model.
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea.
What Does the User Really Want ? Relevance, Precision and Recall.
(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications April 28, 2011.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Lecture 11: Relevance Feedback & Query Expansion
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Information Retrieval and Web Search Lecture 9: Relevance feedback & query.
Kevin Knight,
Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Christopher Manning and Prabhakar.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Lecture 9: Query Expansion. This lecture Improving results For high recall. E.g., searching for aircraft doesn’t match with plane; nor thermodynamic with.
RELEVANCE FEEDBACK & QUERY EXPANSION BY K.KARTHIKEYAN 1.
An Introduction to IR 8th Course, Chapter 9: Relevance feedback and query expansion.
CS276 Information Retrieval and Web Search
An Efficient Algorithm for Incremental Update of Concept space
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Query Expansion and Relevance Feedback
CSE 538 MRS BOOK – CHAPTER IX Relevance Feedback & Query Expansion
Lecture 12: Relevance Feedback & Query Expansion - II
COIS 442 Information Retrieval Foundations
Adapted from Lectures by
Multimedia Information Retrieval
CSE 635 Multimedia Information Retrieval
Retrieval Utilities Relevance feedback Clustering
Presentation transcript:

Query Expansion

Outline Motivation Definition Issues Involved Existing Techniques Global Thesauri/ WordNet Automatically Generated Thesauri Local Relevance Feedback Pseudo Relevance Feedback Conclusion

Problem with Keywords May not retrieve relevant documents that include synonymous terms “restaurant” vs. “cafe” “India” vs. “Bharat” May retrieve irrelevant documents that include ambiguous terms “bat” (baseball vs. mammal) “Apple” (company vs. fruit) “bit” (unit of data vs. act of eating)

Why Search Engines Fail to Search Relevant Documents Users do not give sufficient number of keywords Users do not give good keywords Vocabulary gaps / Diversity & Vastness of web Lack of domain knowledge Solution: Automatically generate new query which is better than initial query

Query Expansion Definition adding more terms (keyword spices) to a user’s basic query Goal to improve Precision and/or Recall Example User Query: car Expanded Query: car, cars, automobile, automobiles, auto, .. etc

Naïve Methods Finding synonyms of query terms and searching for synonyms as well Finding various morphological forms of words by stemming each word in the query Fixing spelling errors and automatically searching for the corrected form Re-weighting the terms in original query

Query Expansion Issues Two major issues – Which terms to include? Which terms to weight more? Concept based versus term based QE Is it better to expand based upon the individual terms in the query, or the overall concept of the query

Objective To get proper set of words, which will improve Precision, when added to basic search query, without loosing the recall in considerable amount

Existing QE techniques Global methods (static; of all documents in collection) Query expansion Thesauri (or WordNet) Automatic thesaurus generation Local methods (dynamic; analysis of documents in result set) Relevance feedback Pseudo relevance feedback

Global Analysis

Thesaurus based QE For each term, t, in a query, expand the query with synonyms and related words of t from the thesaurus feline → feline cat May weight added terms less than original query terms. Generally increases recall. May significantly decrease precision, particularly with ambiguous terms. “interest rate”  “interest rate fascinate evaluate” There is a high cost of manually producing a thesaurus And for updating it for scientific changes

Automatically Generated Thesauri Attempt to generate a thesaurus automatically by analyzing the collection of documents Two main approaches Co-occurrence based (co-occurring words are more likely to be similar) Shallow analysis of grammatical relations Entities that are grown, cooked, eaten, and digested are more likely to be food items. Co-occurrence based is more robust, grammatical relations are more accurate.

Example

Semantic Network/ Wordnet To expand a query, find the word in the semantic network and follow the various arcs to other related words.

Global Methods: Summary Pros Thesauri and Semantic Networks (WordNet) can be used to find good words for users “more like this” Cons Little improvement has been found with automatic techniques to expand query without user intervention Overall, not as useful as Relevance Feedback, may be as good as Pseudo Relevance Feedback

Local Analysis

Relevance Feedback Relevance feedback: user feedback on relevance of docs in initial set of results User issues a (short, simple) query The user marks returned documents as relevant or non-relevant. The system computes a better representation of the information need based on feedback. Relevance feedback can go through one or more iterations.

Relevance Feedback Example: Initial Query and Top 8 Results Query: New space satellite applications + 1. 0.539, 08/13/91, NASA Hasn't Scrapped Imaging Spectrometer + 2. 0.533, 07/09/91, NASA Scratches Environment Gear From Satellite Plan 3. 0.528, 04/04/90, Science Panel Backs NASA Satellite Plan, But Urges Launches of Smaller Probes 4. 0.526, 09/09/91, A NASA Satellite Project Accomplishes Incredible Feat: Staying Within Budget 5. 0.525, 07/24/90, Scientist Who Exposed Global Warming Proposes Satellites for Climate Research 6. 0.524, 08/22/90, Report Provides Support for the Critics Of Using Big Satellites to Study Climate 7. 0.516, 04/13/87, Arianespace Receives Satellite Launch Pact From Telesat Canada + 8. 0.509, 12/02/87, Telecommunications Tale of Two Companies

Relevance Feedback Example: Expanded Query 2.074 new 15.106 space 30.816 satellite 5.660 application 5.991 nasa 5.196 eos 4.196 launch 3.972 aster 3.516 instrument 3.446 arianespace 3.004 bundespost 2.806 ss 2.790 rocket 2.053 scientist 2.003 broadcast 1.172 earth 0.836 oil 0.646 measure

Top 8 Results After Relevance Feedback + 1. 0.513, 07/09/91, NASA Scratches Environment Gear From Satellite Plan + 2. 0.500, 08/13/91, NASA Hasn't Scrapped Imaging Spectrometer 3. 0.493, 08/07/89, When the Pentagon Launches a Secret Satellite, Space Sleuths Do Some Spy Work of Their Own 4. 0.493, 07/31/89, NASA Uses 'Warm‘ Superconductors For Fast Circuit + 5. 0.492, 12/02/87, Telecommunications Tale of Two Companies 6. 0.491, 07/09/91, Soviets May Adapt Parts of SS-20 Missile For Commercial Use 7. 0.490, 07/12/88, Gaping Gap: Pentagon Lags in Race To Match the Soviets In Rocket Launchers 8. 0.490, 06/14/90, Rescue of Satellite By Space Agency To Cost $90 Million

Relevance Feedback: Problems Why do most search engines not use relevance feedback? Users are often reluctant to provide explicit feedback It’s often harder to understand why a particular document was retrieved after applying relevance feedback

Pseudo Relevance Feedback Automatic local analysis Pseudo relevance feedback attempts to automate the manual part of relevance feedback. Retrieve an initial set of relevant documents. Assume that top m ranked documents are relevant. Do relevance feedback

Future Work Not much work has been done which exhaustively utilizes measures of Semantic Relatedness and Similarity to compute Semantic Relatedness I propose to approach the problem of Query Expansion through Semantic Relatedness