Download presentation
Presentation is loading. Please wait.
Published byAlicia Carroll Modified over 9 years ago
1
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval
2
Menu Information Retrieval (IR) basics An example Evaluation: Precision and Recall
3
Information Retrieval (IR) The field of information retrieval deals with the representation, storage, organization of, access to information items. Search Engines: Search a large collection of documents to find the ones that satisfy an information need, e,g, find relevant documents Indexing Document representation Comparison with query Evaluation/feedback
4
IR basics Indexing Manual indexing: using controlled vocabularies, e,g, libraries, early version of Yahoo Automatic indexing: indexing program assigns keywords, phrases or other features, e.g. words from text of document Popular Retrieval Models Boolean: exact match Vector Space: best match Citation analysis models: best match Probabilistic models: best match COMP 307 1:4
5
Vector space Model Any text object can be represented by a term vector. Similarity is determined by distance in a vector space. Example Doc1: 0.3, 0.1, 0.4 Doc2: 0.8, 0.5, 0.6 Query: 0.0, 0.2,0.0 Vector Space Similarity: Cosine of the angle between the two vectors COMP 307 1:5
6
Text representation Term weights Term weights reflect the estimated importance of each term The more often a word occurs in a document, the better that term is in describing what the document is about But terms that appear in many documents in the collection are not very useful for distinguishing a relevant document from a non-relevant one COMP 307 1:6
7
Term weights: TF.IDF Term weight X i = TF * IDF TF: Term Frequency IDF: Inverse Document Frequency
8
An example Document1: cat eat mouse, mouse eat cocolate Document 2: cat eat mouse Document 3: mouse eat chocolate mouse Indexing terms: Vector representation for each document Query: cat Vector representation of query Which document is more relevant?
9
example
10
Evaluation Test collection TREC Parameters Recall: Percentage of all relevant documents that are found by a search Precision: Percentage of retrieved documents that are relevant
11
Discussion How to compare two IR systems Which is more important in Web search: precision or recall? What leads to Google’s success for IR
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.