Modern Information Retrieval Chapter 2 Modeling
Can keywords be used to represent a document or a query? keywords as query and matching as query processing cannot generate good results, in general ranking algorithm, document relevance and IR model
Taxonomy of IR models
Ad hoc and filtering retrieval ad hoc retrieval: static document collection, queries submitted filtering retrieval: static queries, document streaming user profile describes user ’ s preference keywords, relevance feedback and dynamic keywords adjustment
Formal characterization of IR models
Classic IR Index terms deciding on the importance of a term is difficult consider a term ’ s semantics as well as its distribution in all documents weight ’ s are used to quantify the importance of the index terms for describing the document contents
mutual independence assumption simplifies the task of fast ranking computation
Boolean model index term weights are binary query as a Boolean expression not, and, or as connectives Users might find it difficult to specify their information needs
advantages and disadvantages each document is either relevant or non- relevant given = (0,1,0), is document d j an answer?
Vector model Allows partial matching and ranking by a similarity measure
Computing index term weights term frequency, tf factor: how well the term describes the document contents inverse document frequency, idf factor: how well the term represents the document
the vector model is a popular retrieval model due to its simplicity and performance