WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.

WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

INTRODUCTION Evaluation necessary Why evaluate ? What to evaluate? How to evaluate?

WHY EVALUATE Need to know the advantages and disadvantages of using a particular IRS. The user should be be able to decide whether he / she wants to use an IRS based on evaluation results. The user should also be able to decide whether it is cost-effective to use a particular IRS based on evaluation results.

WHAT TO EVALUATE What can be measured and should reflect the ability of the IRS to satisfy user needs. Coverage of system – to what extent the IRS includes relevant material Time lag – average interval between the time the user query request is made and the time taken to obtain an answer set. Form of presentation of output Effort involved on part of user in getting answers to his / her query request. Recall of the IRS - % of relevant materials actually retrieved in the answer to a query request. Precision of the IRS - % of retrieved material that is actually relevant.

HOW TO EVALUATE? Various methods available.

EVALUATION 2 main processes in IR :  User query request/query request/ information query/query retrieval strategy / search request  Answer set / Hits Need to know whether the documents retrieved in the answer set fulfills the user query request. Evaluation process known as retrieval performance evaluation. Evaluation is based on 2 main components : Test reference collection Evaluation measure.

EVALUATION Test reference collection consists of :  A collection of documents  A set of example information requests  A set of relevant documents (provided by specialists) for each information request 2 interrelated measures – RECALL and PRECISION

RETRIEVAL PERFORMANCE EVALUATION Relevance Recall and Precision Parameters defined : I = information request R = set of relevant documents |R| = number of documents in this set A = document answer set retrieved by the information request |A| = number of documents in this set |Ra| = number of documents in the intersection of sets R and A

RETRIEVAL PERFORMANCE EVALUATION Recall = fraction of the relevant documents (set R) which have been retrieved |Ra| R = ----- |R| Precision = fraction of the retrieved documents (set A) which is relevant |Ra| P = ----- |A|

Relevant docs in answer set |Ra| Relevant docs |R| Answer set |A| Collection Precision and Recall for a given example information request

RETRIEVAL PERFORMANCE EVALUATION Recall and precision are expressed as % Sorted by degree of relevance or ranking. User will see a ranked list.

RETRIEVAL PERFORMANCE EVALUATION a. 10 documents in an IRS with a collection of 100 documents has been identified by specialists as being relevant to a particular query request - d3, d5, d9, d25, d39, d44, d56, d71, d89, d123 b. A query request was submitted and the following documents were retrieved and ranked according to relevance.

RETRIEVAL PERFORMANCE EVALUATION 1. d123* 2. d84 3. d56* 4. d6 5. d8 6. d9* 7. d511 8. d129 9. d187 10. d25* 11. d38 12. d48 13. d250 14. d113 15. d3*

RETRIEVAL PERFORMANCE EVALUATION c. Only 5 documents retrieved (d123, d56, d9, d25, d3) are relevant to the query and matches the ones in (a).

d123 ranked 1 st R=1/10 x 100% = 10% P=1/1 x 100% = 100% d56 ranked 3 rd R=2/10 x 100% = 20% P=2/3 x100% = 66% d9 ranked 6 th R=3/10 x 100% = 30% P=3/6 x 100% = 50% d25 ranked 10 th R=4/10 x 100% = 40% P=4/10 x 100% = 40% d3 ranked 15 th R=5/10 x 100% = 50% P=5/15 x 100% = 33%

A = relevant documents Â = non-relevant documents C = retrieved documents Ĉ = not retrieved documents N = total number of documents in the system RelevantNon-relevant Retrieved A  CÂ  C Not retrievedA  ĈÂ  Ĉ Contigency table

RETRIEVAL PERFORMANCE EVALUATION Contingency table N = 100 A=10, Ā =90 C=15, Ĉ =85 RelevantNon- Relevant Retrieved515-5=10 Not- Retrieved 10-5=5100-10-10=80 Recall =5/10X100% = 50%, Precision = 5/15X100% = 33%

OTHER ALTERNATIVE MEASURES Harmonic mean – a single measure which combines R & P E measures - a single measure which combines R & P, user specifies whether interested in R or P User-oriented measures - based on a the user’s interpretation of which documents are relevant and which documents are not relevant Expected search length Satisfaction – focuses only on relevant docs Frustration – focuses on non-relevant docs

REFERENCE COLLECTION Experimentations on IR done on test collection – eg. of test collection 1. Yearly conference known as TREC – Text Retrieval Conference Dedicated to experimentation with a large test collection of over 1 million documents, testing is time consuming. For each TREC conference, a set of reference experiments designed and research groups use these reference experiments to compare their IRS TREC NIST site – http://trec.nist.gov

REFERENCE COLLECTION Collection known as TIPSTER TIPSTER/TREC test collection Collection composed of Documents A set of example information requests ot topics A set of relevant documents for each example information request

OTHER TEST COLLECTIONS ADI – documents on information science CACM – computer science INSPEC – abstracts on electronics, computer and physics ISI – library science Medlars – medical articles developed by E.A. Fox for his PhD thesis at Cornell University, Ithaca, New York in 1883 – Extending the Boolean and vector space models of information retrieval with p-norm queries and multiple concept types – http://www.ncstrl.org http://www.ncstrl.org

WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.

Similar presentations

Presentation on theme: "WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.

Similar presentations

Presentation on theme: "WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION."— Presentation transcript:

Similar presentations

About project

Feedback