Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vector Space Classification 1.Vector space text classification 2.Rochhio Text Classification.

Similar presentations


Presentation on theme: "Vector Space Classification 1.Vector space text classification 2.Rochhio Text Classification."— Presentation transcript:

1 Vector Space Classification 1.Vector space text classification 2.Rochhio Text Classification

2 Vector Space Classification

3 Using Projection to handle 2D and #D graphs

4 Rocchio Text Classification

5 5 Illustration of Rocchio Text Categorization

6 6 Rocchio Text Categorization Algorithm (Training) Assume the set of categories is {c 1, c 2,…c n } For i from 1 to n let p i = (init. prototype vectors) For each training example  D Let d be the frequency normalized TF/IDF term vector for doc x Let i = j: (c j = c(x)) (sum all the document vectors in c i to get p i ) Let p i = p i + d

7 7 Rocchio Text Categorization Algorithm (Test) Given test document x Let d be the TF/IDF weighted term vector for x Let m = –2 (init. maximum cosSim) For i from 1 to n: (compute similarity to prototype vector) Let s = cosSim(d, p i ) if s > m let m = s let r = c i (update most similar class prototype) Return class r

8 8 Rocchio Anomaly Prototype models have problems with polymorphic (disjunctive) categories. Sec.14.2

9 Properties

10 Rocchio classification Rocchio forms a simple representation for each class: the centroid/prototype Classification is based on similarity to / distance from the prototype/centroid It does not guarantee that classifications are consistent with the given training data It is little used outside text classification – It has been used quite effectively for text classification – But in general worse than Naïve Bayes Again, cheap to train and test documents 10 Sec.14.2

11 References Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack; Information retrieval ; MIT Press, 2010. Rocchio, J. J. 1971. Relevance feedback in information retrieval. In Salton (1971b), pp. 313-323.


Download ppt "Vector Space Classification 1.Vector space text classification 2.Rochhio Text Classification."

Similar presentations


Ads by Google