Download presentation
Presentation is loading. Please wait.
Published byRandolph Sims Modified over 9 years ago
1
Information Retrieval Models - 1 Boolean
2
Introduction IR systems usually adopt index terms to process queries Index terms: A keyword or group of selected words Any word (more general) Stemming might be used: Connect: connecting, connection, connections, connected An inverted file is built for the chosen index terms
3
Introduction Matching a query to documents based on index terms is imprecise … so it’s no surprise users can get unsatisfactory results. How much training do end-users typically have? As a result, they’re frustrated with web results, too Need to locate but also rank documents, based on the concept of relevancy.
4
Introduction A ranking is an ordering of the documents retrieved that reflect the relevance of the documents to the user (thru the query) Ranking is based on fundamental premises regarding the notion of relevancy, such as Common sets of index terms Sharing of weighted terms Likelihood of relevance Each set of premises leads to distinct IR models
5
Boolean Retrieval Index terms are either present or absent: no middle ground The weights are either 0 (not present) or 1 (present), represented in set theory w i,j {0,1} In IR, relevancy is considered as a degree of similarity between a document (or set of documents) and the query’s term (or terms) Sim(d j, q) Similarity of document #j to query q)
6
Boolean Sets Demo on board
7
Boolean Retrieval Boolean model is better suited for data retrieval; compare the SQL query “list * from libraryDB where author=‘Smith’” Question: What about a lot of matches? Distinguish between matches (author=“smith” and title=“Learning Swedish”) Can we use the binary model and modify it for ranking? Alternatives? [You bet!]
8
IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector probabilistic Set Theoretic Fuzzy Extended Boolean Probabilistic Inference Network Belief Network Algebraic Generalized Vector Lat. Semantic Index Neural Networks Browsing Flat Structure Guided Hypertext
9
IR Models The IR model, the logical view of the docs, and the retrieval task are distinct aspects of the system
10
Basic Concepts: Classic IR Models Inherent properties of documents: words, aka keywords*, aka index terms Represent the document through “sets of keywords” (or index terms; the main themes) Use nouns because nouns are believed to carry the most (semantic) meaning Search engines, however, assume that all words are index terms (“full text representation”)
11
Classic IR Models - Basic Concepts Not all terms are equally useful for representing the document contents: less frequent terms allow identifying a narrower set of docs. The importance of the index terms is represented by weights Recall the Boolean models {0,1} All other models use a value between {0..1} Degrees of similarity
12
Classic IR Models - Basic Concepts Let k i be an index term, d j be a document, w i,j is a weight associated with (k i, d j ) The weight w ij quantifies the importance of the index term for describing the document contents.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.