Download presentation
Presentation is loading. Please wait.
1
Tutorial#3
2
Retrieval models Retrieval models match query with documents to:
separate documents into relevant and non-relevant class rank the documents according to the relevance. Boolean model Vector space model (VSM) Probabilistic models
3
Boolean model Boolean model is most common exact-match model
queries are logic expressions with document features as operands In pure Boolean model, retrieved documents are not ranked.
4
Example D7 OR D1,D2,D5 AND D2,D4,D5,D6,D8 D7 OR D2,D5
7
Vector space model (VSM)
Documents and queries are represented as vectors. dj = (w1,j,w2,j,...,wt,j) q = (w1,q,w2,q,...,wt,q) Each dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is non-zero.
8
K5 K4 K3 K2 K1 K0 Q0 Q1 Q2 Q3 Q4 K5 K4 K3 K2 K1 K0 D0 D1 D2 D3 D4
9
Vector space model (VSM)
Several different ways of computing these values, also known as (term) weights, have been developed. One of the best known schemes is (tf-idf) weighting:
10
(tf-idf) weighting
11
Vector space model (VSM)
12
Example documents: D0:'How to Bake Bread Without Recipes',
D1:'The Classic Art of Viennese Pastry', D2:'Numerical Recipes: The Art of Scientific Computing', D3:'Breads, Pastries, Pies and Cakes : Quantity Baking Recipes', D4:'Pastry: A Book of Best French Recipe‘ Keywords : ['bak','recipe','bread','cake','pastr','pie']
13
will generate a matrix 6 terms x 5 documents
'pie' 'pastr' 'cake' 'bread' 'recipe' 'bak' 1 D0 D1 D2 D3 D4
14
Query: "baking bread“ will generate a matrix 6 terms x 5 documents
'pie' 'pastr' 'cake' 'bread' 'recipe' 'bak' 1 D0 D1 D2 D3 D4
15
VSM Implementation VSMranker.java ranks documents for a query
Provides functions to develop different user interfaces Stand alone usage needs document and query TDMs java -cp ../java VSMranker cacm.tdm query.tdm 7 Retrieves top 7 documents for CACM queries
16
Ex#3 (solve in tutorial time)
17
References:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.