Sampath Jayarathna Cal Poly Pomona

Slides:



Advertisements
Similar presentations
Relevance Feedback & Query Expansion
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Evaluating Search Engine
Search Engines and Information Retrieval
1 Advanced information retrieval Chapter. 05: Query Reformulation.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 7 Topic Spotting & Query Expansion Martin Russell.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
Information Retrieval in Practice
Modern Information Retrieval Chapter 5 Query Operations.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Query Reformulation: User Relevance Feedback. Introduction Difficulty of formulating user queries –Users have insufficient knowledge of the collection.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Query Relevance Feedback and Ontologies How to Make Queries Better.
Query Expansion.
Search Engines and Information Retrieval Chapter 1.
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Class web page:
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Query Operations. Query Models n IR Systems usually adopt index terms to process queries; n Index term: u A keyword or a group of selected words; u Any.
1 Query Operations Relevance Feedback & Query Expansion.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Query Languages Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
Information Retrieval
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Information Retrieval CSE 8337 Spring 2007 Query Operations Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
Information Retrieval CSE 8337 Spring 2003 Query Operations Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Lecture 9: Query Expansion. This lecture Improving results For high recall. E.g., searching for aircraft doesn’t match with plane; nor thermodynamic with.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
Information Retrieval in Practice
Information Retrieval in Practice
Queries and Interfaces
Sampath Jayarathna Cal Poly Pomona
Sampath Jayarathna Cal Poly Pomona
Search Engine Architecture
Information Retrieval (in Practice)
Advanced information retrieval
Lecture 12: Relevance Feedback & Query Expansion - II
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval on the World Wide Web
Multimedia Information Retrieval
Special Topics on Information Retrieval
Relevance Feedback Hongning Wang
IR Theory: Evaluation Methods
Relevance Feedback & Query Expansion
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Dr. Sampath Jayarathna Cal Poly Pomona
Relevance and Reinforcement in Interactive Browsing
Retrieval Utilities Relevance feedback Clustering
Dr. Sampath Jayarathna Cal Poly Pomona
Presentation transcript:

Sampath Jayarathna Cal Poly Pomona Relevance Feedback Sampath Jayarathna Cal Poly Pomona Credit for some of the slides in this lecture goes to Prof. Ray Mooney at UT Austin & Prof. Rong Jin at MSU

Queries and Information Needs An information need is the underlying cause of the query that a person submits to a search engine information need is generally related to a task A query can be a poor representation of the information need User may find it difficult to express the information need User is encouraged to enter short queries both by the search engine interface, and by the fact that long queries don’t work

Paradox of IR If user knew the question to ask, there would often be no work to do. “The need to describe that which you do not know in order to find it” Roland Hjerppe It may be difficult to formulate a good query when you don’t know the collection well, but it is easy to judge particular documents returned from a query.

Interaction Key aspect of effective retrieval users can’t change ranking algorithm but can change results through interaction helps refine description of information need Interaction with the system occurs during query formulation and reformulation while browsing the result System can help with query refinement Fully automatically or With the user in the loop

Global Vs Local Methods Query expansion / reformulation with thesaurus or WordNet Query expansion via automatic thesaurus generation Techniques like spelling corrections Local Relevance Feedback Pseudo relevance feedback (blind relevance)

Relevance Feedback After initial retrieval results are presented, allow the user to provide feedback on the relevance of one or more of the retrieved documents. User identifies relevant (and maybe non-relevant) documents in the initial result list Use this feedback information to reformulate the query. System modifies query using terms from those documents and re-ranks documents Produce new results based on reformulated query. Allows more interactive, multi-pass process.

Relevance Feedback Architecture Document corpus Query String Query Reformulation Revised Query IR System Rankings ReRanked Documents 1. Doc2 2. Doc4 3. Doc5 . Ranked Documents 1. Doc1 2. Doc2 3. Doc3 . 1. Doc1  2. Doc2  3. Doc3  . Feedback

Query Reformulation Revise query to account for feedback: Query Expansion: Add new terms to query from relevant documents. Term Reweighting: Increase weight of terms in relevant documents and decrease weight of terms in irrelevant documents.

Query Reformulation Change query vector using vector algebra. Add the vectors for the relevant documents to the query vector. Subtract the vectors for the irrelevant docs from the query vector. This both adds both positive and negatively weighted terms to the query as well as reweighting the initial terms.

Relevance Feedback in Vector Model Java Microsoft Starbucks D2 D5 D4 D3 Query D1 D6

Relevance Feedback in Vector Model Java Microsoft Starbucks D2 D5 D4 D3 Query D1 D6

Relevance Feedback in Vector Model Java Microsoft Starbucks D2 D5 D4 D3 Query D1 D6

Relevance Feedback in Vector Space Goal: Move new query closer to relevant documents and meanwhile far away from the irrelevant documents Approach: New query is a weighted average of original query, and relevant and non-relevant document vectors Since all relevant documents unknown, just use the known relevant (R) and irrelevant (NR) sets of documents and include the initial query Q. Original Query

Relevance Feedback in Vector Space Goal: Move new query closer to relevant documents and meanwhile far away from the irrelevant documents Approach: New query is a weighted average of original query, and relevant and non-relevant document vectors  weights the relevant documents R: the set of relevant docs |R|: the number of relevant docs

Relevance Feedback in Vector Space Goal: Move new query closer to relevant documents and meanwhile far away from the irrelevant documents Approach: New query is a weighted average of original query, and relevant and non-relevant document vectors  weights the irrelevant documents NR: the set of irrelevant docs |NR|: the number of irrelevant docs

Evaluating Relevance Feedback By construction, reformulated query will rank explicitly-marked relevant documents higher and explicitly-marked irrelevant documents lower. Method should not get credit for improvement on these documents, since it was told their relevance. In machine learning, this error is called “testing on the training data.” Evaluation should focus on generalizing to other un-rated documents.

Fair Evaluation of Relevance Feedback Remove from the corpus any documents for which feedback was provided. Measure recall/precision performance on the remaining residual collection. Compared to complete corpus, specific recall/precision numbers may decrease since relevant documents were removed. However, relative performance on the residual collection provides fair data on the effectiveness of relevance feedback.

Relevance Feedback Problems Users sometimes reluctant to provide explicit feedback. Relevance feedback is not used in many applications Reliability issues, especially with queries that don’t retrieve many relevant documents Some applications use relevance feedback filtering, “more like this” Query suggestion more popular may be less accurate, but can work if initial query fails

Pseudo Relevance Feedback What if users only mark relevant documents? What if users only mark irrelevant documents? What if users do not provide any relevance judgments?

Pseudo Relevance Feedback What if users only mark relevant documents? Assume documents ranked at bottom to be irrelevant What if users only mark irrelevant documents? Let query be the relevant document What if users do not provide any relevance judgments? Treat top ranked documents as relevant Treat bottom ranked documents as irrelevant Implicit relevance feedback User click through data

Pseudo Feedback Architecture Document corpus Query String Query Reformulation Revised Query IR System Rankings ReRanked Documents 1. Doc2 2. Doc4 3. Doc5 . Ranked Documents 1. Doc1 2. Doc2 3. Doc3 . 1. Doc1  2. Doc2  3. Doc3  . Pseudo Feedback

PseudoFeedback Results Found to improve performance on TREC competition ad-hoc retrieval task. Works even better if top documents must also satisfy additional boolean constraints in order to be used in feedback.

Thesaurus A thesaurus provides information on synonyms and semantically related words and phrases. Example: physician syn: ||croaker, doc, doctor, MD, medical, mediciner, medico, ||sawbones rel: medic, general practitioner, surgeon,

Thesaurus-based Query Expansion For each term, t, in a query, expand the query with synonyms and related words of t from the thesaurus. May weight added terms less than original query terms. Generally increases recall. May significantly decrease precision, particularly with ambiguous terms. “interest rate”  “interest rate fascinate evaluate”

WordNet A more detailed database of semantic relationships between English words. Developed by famous cognitive psychologist George Miller and a team at Princeton University. About 144,000 English words. Nouns, adjectives, verbs, and adverbs grouped into about 109,000 synonym sets called synsets. https://wordnet.princeton.edu/

WordNet Synset Relationships Antonym: front  back Attribute: benevolence  good (noun to adjective) Pertainym: alphabetical  alphabet (adjective to noun) Similar: unquestioning  absolute Cause: kill  die Entailment: breathe  inhale Holonym: chapter  text (part to whole) Meronym: computer  cpu (whole to part) Hyponym: plant  tree (specialization) Hypernym: apple  fruit (generalization)

WordNet Query Expansion Add synonyms in the same synset. Add hyponyms to add specialized terms. Add hypernyms to generalize a query. Add other related terms to expand query.

Context and Personalization If a query has the same words as another query, results will be the same regardless of who submitted the query why the query was submitted where the query was submitted what other queries were submitted in the same session These other factors (the context) could have a significant impact on relevance difficult to incorporate into ranking

User Models Generate user profiles based on documents that the person looks at such as web pages visited, email messages, or word processing documents on the desktop Modify queries using words from profile Generally not effective imprecise profiles, information needs can change significantly

Query Logs Query logs provide important contextual information that can be used effectively Context in this case is previous queries that are the same previous queries that are similar query sessions including the same query Query history for individuals could be used for caching

Local Search Location is context Local search uses geographic information to modify the ranking of search results location derived from the query text location of the device where the query originated e.g., “cpp”

Local Search Identify the geographic region associated with web pages location metadata, or automatically identifying the locations such as place names, city names, or country names in text Identify the geographic region associated with the query 10-15% of queries contain some location reference Rank web pages using location information in addition to text and link-based features

Query Expansion Conclusions Expansion of queries with related terms can improve performance, particularly recall. However, must select similar terms very carefully to avoid problems, such as loss of precision.