Download presentation
Presentation is loading. Please wait.
1
Project Description 3 Latent Semantic Index
2
Compute TFIDF(token_i, document_j) = tf(ti; dj) log |Tr|/|Tr(ti) The token in each file is sorted and attached the TFIDF value
3
1. Tr(ti)= the # of documents in Tr in which ti occurs at least once, =1 + log(N(ti; dj)) if N(ti; dj) > 0 2. tf(ti; dj) =0 otherwise 3. N(ti, dj) = the frequency of ti in dj.
4
Project 1. Tr(ti)= the # of documents in Tr in which ti occurs at least once, =1 + log(N(ti; dj)) if N(ti; dj) > 0 2. tf(ti; dj) =0 otherwise 3. N(ti, dj) = the frequency(normalization) of ti in dj.
5
Important point about Token TFIDF(token_i, document_j) = tf(ti; dj) log |Tr|/|Tr(ti) Correction(only consider (threshold2??) >=Tr(ti) >= threshold1 Discuss come properties about this numerical values Stemization( call system dictionary)
6
Create a Token Database Organize all Inverted files of the following documents http: //kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.html into a database
7
LSI example token document !aDumb D10.901.2 D2000
8
High Dimension LSI example token document Dumb!aDumb D1 D2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.