Query operations 1- Introduction 2- Relevance feedback with user relevance information 3- Relevance feedback without user relevance information - Local.

Slides:



Advertisements
Similar presentations
Special Topics in Computer Science The Art of Information Retrieval Chapter 5: Query Operations Alexander Gelbukh
Advertisements

Relevance Feedback & Query Expansion
Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Chapter 5: Query Operations Hassan Bashiri April
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Introduction to Information Retrieval
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
Learning Techniques for Information Retrieval Perceptron algorithm Least mean.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Modern Information Retrieval Chapter 5 Query Operations.
1 Query Language Baeza-Yates and Navarro Modern Information Retrieval, 1999 Chapter 4.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Evaluating the Performance of IR Sytems
Query Reformulation: User Relevance Feedback. Introduction Difficulty of formulating user queries –Users have insufficient knowledge of the collection.
The Vector Space Model …and applications in Information Retrieval.
Relevance Feedback Main Idea:
Automatic Indexing (Term Selection) Automatic Text Processing by G. Salton, Chap 9, Addison-Wesley, 1989.
Learning Techniques for Information Retrieval We cover 1.Perceptron algorithm 2.Least mean square algorithm 3.Chapter 5.2 User relevance feedback (pp )
Modern Information Retrieval Chapter 5 Query Operations 報告人:林秉儀 學號:
Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms  Synonyms, stemming variations, terms.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Query Expansion.
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Class web page:
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
1 Query Operations Relevance Feedback & Query Expansion.
Chapter 6: Information Retrieval and Web Search
1 Computing Relevance, Similarity: The Vector Space Model.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
SINGULAR VALUE DECOMPOSITION (SVD)
Chap. 5 Chapter 5 Query Operations. 2 Chap. 5 Contents Introduction User relevance feedback Automatic local analysis Automatic global analysis Trends.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
C.Watterscsci64031 Probabilistic Retrieval Model.
Motivation  Methods of local analysis extract information from local set of documents retrieved to expand the query  An alternative is to expand the.
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea.
Relevance Feedback Hongning Wang
Hsin-Hsi Chen5-1 Chapter 5 Query Operations Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Lecture 9: Query Expansion. This lecture Improving results For high recall. E.g., searching for aircraft doesn’t match with plane; nor thermodynamic with.
Sampath Jayarathna Cal Poly Pomona
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Lecture 12: Relevance Feedback & Query Expansion - II
Compact Query Term Selection Using Topically Related Text
Special Topics on Information Retrieval
Relevance Feedback Hongning Wang
Relevance Feedback & Query Expansion
Query Operations Berlin Chen 2003 Reference:
Relevance and Reinforcement in Interactive Browsing
Retrieval Utilities Relevance feedback Clustering
Query Operations Berlin Chen
Presentation transcript:

Query operations 1- Introduction 2- Relevance feedback with user relevance information 3- Relevance feedback without user relevance information - Local analysis (pseudo-relevance feedback) - Global analysis (thesaurus) 4- Evaluation 5- Issues

Introduction (1) uNo detailed knowledge of collection and retrieval environment édifficult to formulate queries well designed for retrieval éNeed many formulations of queries for effective retrieval uFirst formulation: often naïve attempt to retrieve relevant information uDocuments initially retrieved: éExamined for relevance information (user, automatically) é Improve query formulations for retrieving additional relevant documents uQuery reformulation: éExpanding original query with new terms éReweighting the terms in expanded query

Introduction (2) äApproaches based on feedback from users (relevance feedback) äApproaches based on information derived from set of initially retrieved documents (local set of documents) äApproaches based on global information derived from document collection

Relevance feedback with user relevance information (1) mMost popular query reformulation strategy mCycle: ãUser presented with list of retrieved documents ãUser marks those which are relevant  In practice: top ranked documents are examined  Incremental ãSelect important terms from documents assessed relevant by users ãEnhance importance of these terms in a new query mExpected: ãNew query moves towards relevant documents and away from non- relevant documents

Relevance feedback with user relevance information (2) mTwo basic techniques ãQuery expansion Add new terms from relevant documents ãTerm reweighting Modify term weights based on user relevance judgements mAdvantages ãShield users from details of query reformulation process ãSearch broken down in sequence of small steps ãControlled process  Emphasise some terms (relevant ones)  De-emphasise other terms (non-relevant ones)

Relevance feedback with user relevance information (3) mQuery expansion and term reweighting in the vector space model mTerm reweighting in the probabilistic model

Query expansion and term reweighting in the vector space model mTerm weight vectors of documents assessed relevant  Similarities among themselves mTerm weight vectors of documents assessed non-relevant  Dissimilar for those of relevant documents mReformulated query:  Closer to term weight vectors of relevant documents

Query expansion and term reweighting in the vector space model For query q ãDr: set of relevant documents among retrieved documents ãDn: set of non-relevant documents among retrieved documents ãCr: set of relevant documents among all documents in collection ã , ,  : tuning constants Assume that Cr is known (unrealistic!) Best query vector for distinguishing relevant documents from non-relevant documents

Query expansion and term reweighting in the vector space model mProblem: |Cr| is unknown mApproach ãFormulate initial query ãIncrementally change initial query vector ãUse |Dr| and |Dn| instead mRochio formula mIde formula

Rochio formula mDirect application of previous formula + add query mInitial formulation  =1 mUsually information in relevant documents more important than in non-relevant documents (  <<  ) mPositive relevance feedback (  =0)

Rochio formula in practice (SMART) m  =1 mTerms ãOriginal query ãAppear in more relevant documents that non-relevant documents ãAppear in more than half the relevant documents mNegative weights ignored

Ide formula mInitial formulation  =  =  =1 mSame comments as for the Rochio formula mBoth Ide and Rochio: no optimal criterion

Term reweighting for the probabilistic model m(see note on the BIR model) ãUse idf to rank documents for original query ãCalculate  Predict relevance  Improved (optimal) retrieval function

Term reweighting for the probabilistic model mIndependence assumptions ãI1: distribution of terms in relevant documents is independent and their distribution in all documents is independent ãI2: distribution of terms in relevant documents is independent and their distribution in irrelevant documents is independent mOrdering principle ãO1: probable relevance based on presence of search terms in documents ãO2: probable relevance based on presence of search terms in documents and their absence from documents

Term reweighting for the probabilistic model Various combinations

Term reweighting for the probabilistic model F1 formula r i = number of relevant documents containing t i n i = number of documents containing t i ratio of the proportion of relevant documents in which the query term t i occurs to the proportion of all documents in which the term t i occurs R = number of relevant documents N= number of documents in collection

Term reweighting for the probabilistic model F2 formula r i = number of relevant documents containing t i n i = number of documents containing t i proportion of relevant documents in which the term t i occurs to the proportion of all irrelevant documents in which t i occurs R = number of relevant documents N= number of documents in collection

Term reweighting for the probabilistic model ratio of “relevance odds” (ratio of relevant documents containing term t i and non-relevant documents containing term t i ) and “collection odds” (ratio of documents containing t i and documents not containing t i ) r i = number of relevant documents containing t i n i = number of documents containing t i F3 formula R = number of relevant documents N= number of documents in collection

Term reweighting for the probabilistic model ratio of “relevance odds” and “non- relevance odds” (ratio of relevant documents not containing t i and the non-relevant documents not containing t i ) r i = number of relevant documents containing t i n i = number of documents containing t i F4 formula R = number of relevant documents N= number of documents in collection

Experiments mF1, F2, F3 and F4 outperform no relevance weighting and idf mF1 and F2; F3 and F4 perform in the same range mF3 and F4 > F1 and F2 mF4 slightly > F3 ãO2 is correct (looking at presence and absence of terms) mNo conclusion with respect to I1 and I2, although I2 seems a more realistic assumption.

Relevance feedback without user relevance mRelevance feedback with user relevance ãClustering hypothesis known relevant documents contain terms which can be used to describe a larger cluster of relevant documents ãDescription of cluster built interactively with user assistance mRelevance feedback without user relevance ãObtain cluster description automatically ãIdentify terms related to query terms  (e.g. synonyms, stemming variations, terms close to query terms in text) mLocal strategies mGlobal strategies

Local analysis (pseudo-relevance feedback) mExamine documents retrieved for query to determine query expansion mNo user assistance mClustering techniques mQuery “drift”

Clusters (1) mSynonymy association (one example): terms that frequently co-occur inside local set of documents mTerm-term (e.g., stem-stem) association matrix (normalised)

Clusters (2) mFor term t i ãTake the n largest values m i,j ãThe resulting terms t j form cluster for t i mQuery q ãFinding clusters for the |q| query terms ãKeep clusters small ãExpand original query

Global analysis mExpand query using information from whole set of documents in collection mThesaurus-like structure using all documents ãApproach to automatically built thesaurus  (e.g. similarity thesaurus based on co-occurrence frequency) ãApproach to select terms for query expansion

Evaluation of relevance feedback strategies mUse q i and compute precision and recall graph mUse q i+1 and compute precision recall graph ãUse all documents in the collection  Spectacular improvements  Also due to relevant documents ranked higher  Documents known to user  Must evaluate with respect to documents not seen by user mThree techniques

Evaluation of relevance feedback strategies mFreezing ãFull-freezing  Top n documents are frozen (ones used in RF)  Remaining documents are re-ranked  Precision-recall on whole ranking  Change in effectiveness thus come from unseen documents  With many iteration, higher contribution of frozen documents may lead to decrease in effectiveness ãModified freezing  Rank position of the last marked relevant document

Evaluation of relevance feedback strategies mTest and control group ãRandom splitting of documents: test documents and group documents  Query reformulation performed on test documents  New query run against the control documents ãRF performed only on control group ãDifficulty in splitting the collection  Distribution of relevant documents

Evaluation of relevance feedback strategies mResidual ranking ãDocuments used in assessing relevance are removed ãPrecision-recall on “residual collection” ãConsider effect of unseen documents ãResults not comparable with original ranking (fewer relevant documents)

Issues mInterface ãAllow user to quickly identify relevant and non-relevant documents ãWhat happen with 2D and 3D visualisation? mGlobal analysis ãOn the web? ãYahoo! mLocal analysis ãComputation cost (on-line) mInteractive query expansion ãUser choose the terms to be added

Negative relevance feedback mDocuments explicitly marked as non-relevant by users ãImplementation ãClarity ãUsability