1 Relevance Feedback and other Query Modification Techniques 課程名稱 : 資訊擷取與推薦技術 指導教授 : 黃三益 教授 報告者 : 博一 楊錦生 (d9142801) 博一 曾繁絹 (d9142803)

Slides:



Advertisements
Similar presentations
Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Advertisements

1 Chap 14 Ranking Algorithm 指導教授 : 黃三益 博士 學生 : 吳金山 鄭菲菲.
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Improved TF-IDF Ranker
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Stemming Algorithms 資訊擷取與推薦技術:期中報告 指導教授:黃三益 老師 學生: 黃哲修 張家豪.
Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
CSM06 Information Retrieval Lecture 3: Text IR part 2 Dr Andrew Salway
Modeling Modern Information Retrieval
Modern Information Retrieval Chapter 5 Query Operations.
1 Query Language Baeza-Yates and Navarro Modern Information Retrieval, 1999 Chapter 4.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Evaluating the Performance of IR Sytems
Query Reformulation: User Relevance Feedback. Introduction Difficulty of formulating user queries –Users have insufficient knowledge of the collection.
Presented by Zeehasham Rasheed
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
Information retrieval: overview. Information Retrieval and Text Processing Huge literature dating back to the 1950’s! SIGIR/TREC - home for much of this.
Modern Information Retrieval Chapter 5 Query Operations 報告人:林秉儀 學號:
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 5: Information Retrieval and Web Search
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Query Relevance Feedback and Ontologies How to Make Queries Better.
Query Expansion.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
1 Query Operations Relevance Feedback & Query Expansion.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Chapter 6: Information Retrieval and Web Search
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
1 Computing Relevance, Similarity: The Vector Space Model.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Chap. 5 Chapter 5 Query Operations. 2 Chap. 5 Contents Introduction User relevance feedback Automatic local analysis Automatic global analysis Trends.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
Information Retrieval
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Relevance Feedback Hongning Wang
(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications April 28, 2011.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Hsin-Hsi Chen5-1 Chapter 5 Query Operations Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
1 CS 430: Information Discovery Lecture 21 Interactive Retrieval.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Lecture 12: Relevance Feedback & Query Expansion - II
Multimedia Information Retrieval
Relevance Feedback Hongning Wang
Applying Key Phrase Extraction to aid Invalidity Search
Chapter 5: Information Retrieval and Web Search
CS 430: Information Discovery
Retrieval Utilities Relevance feedback Clustering
Presentation transcript:

1 Relevance Feedback and other Query Modification Techniques 課程名稱 : 資訊擷取與推薦技術 指導教授 : 黃三益 教授 報告者 : 博一 楊錦生 (d ) 博一 曾繁絹 (d )

2 Introduction Precision v.s. Recall In case high recall ratio is critical to users, they have to retrieve more relevant documents. Methods to retrieve more: “Expand” their search by broadening a narrow Boolean query or looking further down a ranked list of retrieved documents. Modify the original query.

3 Introduction (cont’d) “Word Mismatch” problem: Some of the unretrieved relevant documents are indexed by a different set of terms than those in the query or in most of the other relevant documents. Approaches for improving the initial query: Relevance Feedback Automatic Query Modification

4 Conceptual Model of Relevance Feedback Query Result Set New Query Based on Result Set User Relevance Feedback

5 Basic Ideas about Relevance Feedback Two components of relevance feedback: Reweighting of query terms based on the distribution of these terms in the relevant and nonrelevant documents retrieved in response to those queries Changing the actual terms in the query

6 Basic Ideas about Relevance Feedback (cont’d) Evaluation of Relevance Feedback The results after one iteration of feedback against those using no feedback generally show spectacular improvement Another evaluation of the results is to compare only the residual collections

7 Basic approach to Relevance Feedback Rocchio’s approach used the vector space model to rank documents

8 Ide developed three particular strategies extending Rocchio’s approach 1. Basic Roccho’s formula, minus the normalization for the number of relevant and nonrelevant documents 2. Allowed only feedback from relevant documents 3. Allowed limited negative feedback from only the highest-ranked nonrelevant document

9 Term reweighting without Query Expansion A probabilistic model proposed by Robertson and Sparck Jones (1976) Wij = the term weight for term i in query j r = the number of relevant documents for query j having term i R = the total number of relevant documents for query j n = the number of documents in the collection having term i N = the number of documents in the collection

10 Term reweighting without Query Expansion (cont’d) Croft (1983) extended this weighting scheme as below, initial search Feedback Wijk = the term weight for term I in query j and document k IDFi = the IDF weight for term I in the entire collection Pij = the probability that term i is assigned within the set of relevant documents for query j Qij = the probability that term i is assigned with the set of nonrelevant documents for query j Fik = K+(1-K)(freqik/maxfreqk) freqik=the frequency of term i in document k maxfreqk = the maximum frequency of any term in document k

11 Query Expansion The query could be expanded by offering users a selection of terms that are the terms most closely related to the initial query terms (thesaurus) presenting users with a sorted list of terms from the relevant documents or all retrieved documents

12 Query Expansion (cont’d) A proposed list of terms from relevant/nonrelevant documents using ranking methods User selection from the top N terms Automatically added to the query The early SMART experiments both expanded the query and reweighted the query terms by adding the vectors of the relevant and nonrelevant documents.

13 Query Expansion (cont’d) Modification of terms in relevant/nonrelevant documents: Any relevant document(s) as a “new query” (Noreault, 1979) If no relevant documents are indicated, the term list shown to the user is the list of related terms based on those previously sorted in the inverted file

14 Query Expansion with Term Reweighting The vast amount of relevance feedback and query expansion research has been done using both query expansion and term- reweighting. Three of most used feedback methods: Ide Regular

15 Query Expansion with Term Reweighting(cont’d) Ide dec-hi Standard Rocchio Si = the top ranked non-relevant document

16 Automatic Query Modification The major disadvantage of relevance feedback is that it increase the burden on the users [X97]. Approaches for automatic query modification: Local feedback Automatic query expansion Dictionary-based Global analysis Local analysis

17 Local Feedback Local feedback is similar to relevance feedback. Difference: assume the top ranked documents are relevant without human judgment. It saves the costs of relevance judgment, but it can result in poor retrieval if the top ranked documents are non-relevant.

18 Automatic Query Expansion Basic idea: Expanding a user query using semantically similar and/or statistically associated terms with corresponding weights are added. Thesauri are needed for similarity judgment. Two approach for thesauri construction: Manual thesauri Automatic thesauri

19 Dictionary-based Query Expansion Based on manual thesauri (e.g., WordNet [M95] ). In expansion process, synonymous (or other semantic relations) words of initial query terms are selected and assigned each term a weight. Disadvantage: Construction of manual thesaurus requires a lot of human labor. A general manual thesaurus does not consistently improve retrieval performance.

20 Example - WordNet

21 Automatic Thesauri Construction Approach Thesauri are construction from the whole (a part of) the data corpus. Basic idea of automatic thesauri construction: Term co-occurrence Methods of automatic thesauri construction: Traditional TFxIDF [Y02] Variant of TFxIDF (i.e., similarity thesaurus [QF93]) Mining Association Rule Approach [WBO00]

22 Example of Thesaurus Construction To each term t i is associated a vector: Where The relationship between two terms t u and t v According to [QF93]

23 Example of Thesaurus Construction (cont’d) Data Mining Knowledge Discovery Data Warehouse Classification Analysis Clustering Analysis C4.5 Decision Tree CRM 0.12 Text Mining Prediction

24 Global Analysis The whole collection of documents is used for thesaurus creation. Approaches: Similarity Thesaurus [QF93] Statistical Thesaurus [CY92]

25 Global Analysis (cont’d) Initial User Query Thesaurus Construction Query Expansion Retrieve Relevant Documents Data Corpus Thesaurus Expanded Query

26 Local Analysis Unlike the global analysis, only the top ranked documents are used for constructing thesaurus. Approaches: Local Clustering [AF77] Local Content Analysis [X97, XC96, XC00] According to [XC96, X97, X00], local analysis is more effective than global analysis.

27 Local Analysis (cont’d) 1st Retrieve Top Ranked Documents Initial User Query Thesaurus Construction Query Expansion 2nd Retrieve Relevant Documents Expanded Query

28

29 References [AF77] Attar, R. and Fraenkel, A. S., “Local Feedback in Full-Text Retrieval Systems,” Journal of the ACM, Volume 24, Issue 3, 1977, pp [BR99] Baeza-Yates, R, Ribeiro-Neto, B, Modern Information Retrieval, Addison Wesley/ACM Pres, Harlow, England, [CY92] Crouch, C. J., Yang, B., "Experiments in Automatic Statistical Thesaurus Construction," Proceedings of the 15th Annual International ACM SIGIR Conference on Research and development in information retrieval, 1992, pp [M95] Miller, G. A, “WordNet: A Lexical Database for English,” Communications of the ACM, Vol. 38, No. 11, November 1995, pp [QF93] Qiu, Y., Frei, H. P., "Concept Based Query Expansion," Proceedings of the 16th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, 1993, pp [WBO00] Wei, J., Bressan, S., and Ooi, B. C., “ Mining Term Association Rules for Automatic Global Query Expansion: Methodology and Preliminary Results, ” Proceedings of the First International Conference on Web Information Systems Engineering, Volume 1, 2000, pp

30 References (cont’d) [X97] Xu, J., “Solving the Word Mismatch Problem Through Automatic Text Analysis,” PhD Thesis, University of Massachusetts at Amherst, [XC96] Xu, J. and Croft, W. B., “Query Expansion Using Local and Global Document Analysis,” Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996, pp [XC00] Xu, J. and Croft, W. B., “Improving the Effectiveness of Information Retrieval with Local Context Analysis,” ACM Transactions on Information Systems, Volume 18, Issue 1, 2000, pp [Y02] Yang, C., “Investigation of Term Expansion on Text Mining Techniques,” Master Thesis, National Sun Yet-Sen University, Taiwan, 2002.