INFORMATION RETRIEVAL -Monday March 21, 2006 -Arjun Dasgupta
IR DOMAINS Ranking functions in Traditional IR Ranking functions used in the WWW Ranking functions used in databases
RANKING FUNCTOINS IN TRADITIONAL IR Started as a branch of library science Definitions: Corpus: a collection of documents Document: a set/bag of words Keyword Query: small set of keywords
An Example…. Query over the web = {Microsoft, Corporation} An obvious approach would be to retrieve all documents containing “Microsoft” and “Corporation” ordered by the “ranking function” Ranking function determines the order in which query results are presented
What constitutes a good ranking function??? Relative frequency of occurrence of keywords Proximity of keywords (this cannot be done in a bag of words model) Specificity or importance Links to/from other documents Popularity/ relevance of page with respect to the query