Download presentation
Presentation is loading. Please wait.
Published byTeresa Lawson Modified over 9 years ago
1
Vector Space Classification 1.Vector space text classification 2.Rochhio Text Classification
2
Vector Space Classification
3
Using Projection to handle 2D and #D graphs
4
Rocchio Text Classification
5
5 Illustration of Rocchio Text Categorization
6
6 Rocchio Text Categorization Algorithm (Training) Assume the set of categories is {c 1, c 2,…c n } For i from 1 to n let p i = (init. prototype vectors) For each training example D Let d be the frequency normalized TF/IDF term vector for doc x Let i = j: (c j = c(x)) (sum all the document vectors in c i to get p i ) Let p i = p i + d
7
7 Rocchio Text Categorization Algorithm (Test) Given test document x Let d be the TF/IDF weighted term vector for x Let m = –2 (init. maximum cosSim) For i from 1 to n: (compute similarity to prototype vector) Let s = cosSim(d, p i ) if s > m let m = s let r = c i (update most similar class prototype) Return class r
8
8 Rocchio Anomaly Prototype models have problems with polymorphic (disjunctive) categories. Sec.14.2
9
Properties
10
Rocchio classification Rocchio forms a simple representation for each class: the centroid/prototype Classification is based on similarity to / distance from the prototype/centroid It does not guarantee that classifications are consistent with the given training data It is little used outside text classification – It has been used quite effectively for text classification – But in general worse than Naïve Bayes Again, cheap to train and test documents 10 Sec.14.2
11
References Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack; Information retrieval ; MIT Press, 2010. Rocchio, J. J. 1971. Relevance feedback in information retrieval. In Salton (1971b), pp. 313-323.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.