Information Retrieval Quality of a Search Engine.

Information Retrieval Quality of a Search Engine

Is it good ? How fast does it index Number of documents/hour (Average document size) How fast does it search Latency as a function of index size Expressiveness of the query language

Measures for a search engine All of the preceding criteria are measurable The key measure: user happiness …useless answers won’t make a user happy

Happiness: elusive to measure Commonest approach is given by the relevance of search results How do we measure it ? Requires 3 elements: 1.A benchmark document collection 2.A benchmark suite of queries 3.A binary assessment of either Relevant or Irrelevant for each query-doc pair

Evaluating an IR system Standard benchmarks TREC: National Institute of Standards and Testing (NIST) has run large IR testbed for many years Other doc collections: marked by human experts, for each query and for each doc, Relevant or Irrelevant  On the Web everything is more complicated since we cannot mark the entire corpus !!

General scenario Relevant Retrieved collection

Precision: % docs retrieved that are relevant [issue “junk” found] Precision vs. Recall Relevant Retrieved collection Recall: % docs relevant that are retrieved [issue “info” found]

How to compute them Precision: fraction of retrieved docs that are relevant Recall: fraction of relevant docs that are retrieved Precision P = tp/(tp + fp) Recall R = tp/(tp + fn) RelevantNot Relevant Retrievedtp (true positive) fp (false positive) Not Retrievedfn (false negative) tn (true negative)

Some considerations Can get high recall (but low precision) by retrieving all docs for all queries! Recall is a non-decreasing function of the number of docs retrieved Precision usually decreases

Precision vs. Recall Relevant Highest precision, very low recall Retrieved Precision: fraction of retrieved docs that are relevant Recall: fraction of relevant docs that are retrieved

Relevant Lowest precision and recall Retrieved Precision: fraction of retrieved docs that are relevant Recall: fraction of relevant docs that are retrieved Precision vs. Recall

Relevant Low precision and very high recall Retrieved Precision: fraction of retrieved docs that are relevant Recall: fraction of relevant docs that are retrieved Precision vs. Recall

Relevant Very high precision and recall Retrieved Precision: fraction of retrieved docs that are relevant Recall: fraction of relevant docs that are retrieved Precision vs. Recall

Precision-Recall curve We measures Precision at various levels of Recall Note: it is an AVERAGE over many queries precision recall x x x x

A common picture precision recall x x x x

Interpolated precision If you can increase precision by increasing recall, then you should get to count that…

Other measures Precision at fixed recall most appropriate for web search: 10 results 11-point interpolated average precision The standard measure for TREC: you take the precision at 11 levels of recall varying from 10% to 100% by 10% of retrieved docs each step, using interpolation, and average them

F measure Combined measure (weighted harmonic mean) : People usually use balanced F 1 measure i.e., with  = 1 or  = ½ thus 1/F = ½ (1/P + 1/R) Use this if you need to optimize a single measure that balances precision and recall.

Information Retrieval Quality of a Search Engine.

Similar presentations

Presentation on theme: "Information Retrieval Quality of a Search Engine."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Information Retrieval Quality of a Search Engine.

Similar presentations

Presentation on theme: "Information Retrieval Quality of a Search Engine."— Presentation transcript:

Similar presentations

About project

Feedback