VECTOR SPACE MODEL Its Applications and implementations

VECTOR SPACE MODEL Its Applications and implementations
in Information Retrieval Lecture -3 11/14/2019 Dr.G.Das

Slides for Lecture -3 The Vector Space Model (VSM) is a way of representing documents through the words that they contain It is a standard technique in Information Retrieval The VSM allows decisions to be made about which documents are similar to each other and to keyword queries 11/14/2019 Dr.G.Das

Slides for Lecture 3 The Vector Space Model
Documents and queries both are vectors Di =(wdi1,wdi2,………….wdij) each wij is the weight of the term j in document i similarity metric is measured as equal to cosine of the angle between them 11/14/2019 Dr.G.Das

Slides for lecture 3 Documents and Queries are represented as vectors.
Position 1 corresponds to term 1, position 2 to term 2, position t to term t 11/14/2019 Dr.G.Das

Cosine is a normalized dot product
Slides for Lecture 3 Cosine Similarity measure Similarity(dvector, qvector) = cos ө ( x.y = |x |. |y | cos ө) j=1mΣwij * qij (j=1mΣwij2)1/2(j=1m Σ qij2)1/2 Cosine is a normalized dot product 11/14/2019 Dr.G.Das

Slides for lecture 3 TF-IDF Normalization
Normalize the term weights (so longer documents are not unfairly given more weight) The longer the document, the more likely it is for a given term to appear in it, and the more often a given term is likely to appear in it. So, we want to reduce the importance attached to a term appearing in a document based on the length of the document 11/14/2019 Dr.G.Das

Slides for lecture 3 Cosine is the normalized dot product
Documents are ranked according to decreasing order of Cosine value sim( dvector, qvector) =1 when dvector = qvector sim( dvector, qvector) = 0 when dvector and qvector share no terms 11/14/2019 Dr.G.Das

Slides for lecture 3 A user enters a query
The query is compared to all documents using a similarity measure A vector distance between the query and documents is used to rank the retrieved pages The user is shown the documents in decreasing order of similarity to the query term 11/14/2019 Dr.G.Das

Slides for lecture 3 How to weight terms?
Higher the weight term = higher impact on cosine What terms are important? >if term is present in query then its presence in the document is relevant to the query >Infrequent in other documents >Frequent in document A So the cosine needs to be modified in this respect 11/14/2019 Dr.G.Das

Slides for lecture 3 Modeling and Implementation
Example: suppose a query is fired by the user specifying three particular terms T1, T2,T3 Query q = (T1, T2, T3) let’s there be n documents with a total of m terms Now for implementation

Slides for lecture-3 Document Ranking A user enters a query
The query is compared to all documents using a similarity measure The user is shown the documents in decreasing order of similarity to the query term 11/14/2019 Dr.G.Das

Slides for lecture 3 Example: Term T1 T2 T3 ……………………………… Tm
Document d1 d2 d1 ……………………………….. d1 d2 d4 d7 …………………………………d7 d3 d8 d9 ………………………………….d10 d9 d10 d6 …………………………………d11 d65 d7 d d76 we can arrange the documents in a descending order of the corresponding score that is computed from tf * idf 11/14/2019 Dr.G.Das

Slides for lecture 3 tf * idf measure: term frequency (tf)
inverse document frequency (idf) 11/14/2019 Dr.G.Das

Slides for lecture 3 Slides for lecture 3
In the case of multiple values we take the smallest list of documents among corresponding to the Query values T1, T2,T3 T1 T2 T FA and TA algorithms are used for merging of the lists d1 d2 d FA - Fagin’s Algorithm d2 d3 d TA –Threshold Algorithm d3 d4 d2 d5 d2 d4 d d6 d7 11/14/2019 Dr.G.Das

Slides for lecture 3 After the intersected list for the multi values has been found , then take the Tf*IDF score for each and add them for the corresponding terms arrange them in decreasing order of the total T T T3 (tf*idf + tf*idf + tf*idf) d (this will be ranked higher) d 11/14/2019 Dr.G.Das

Slides for lecture 3 DATBASE CONTEXT
All distinct values are words or terms A tuple is taken as a Document Important Points: Vector space does not force any broad conditions No search engine uses vector space model It is implemented but using some constraints 11/14/2019 Dr.G.Das

Slides for lecture 3 Advantages Disadvantages
>Ranked Retrieval >terms are taken >Terms are weighted independent according to importance >Weighting is not very formal 11/14/2019 Dr.G.Das

Slides for Lecture -3 Thank you Slides Made By : Arjun Saraswat
11/14/2019 Dr.G.Das

Slides for lecture -3 References:
11/14/2019 Dr.G.Das

VECTOR SPACE MODEL Its Applications and implementations

Similar presentations

Presentation on theme: "VECTOR SPACE MODEL Its Applications and implementations"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

VECTOR SPACE MODEL Its Applications and implementations

Similar presentations

Presentation on theme: "VECTOR SPACE MODEL Its Applications and implementations"— Presentation transcript:

Similar presentations

About project

Feedback