Presentation is loading. Please wait.

Presentation is loading. Please wait.

Implementation Details of the Text Classification Project

Similar presentations


Presentation on theme: "Implementation Details of the Text Classification Project"— Presentation transcript:

1 Implementation Details of the Text Classification Project
Prerak Sanghvi Computer Science and Engineering Department State University of New York at Buffalo Spring 2001

2 Feature Selection Step
We select keywords from text by using some way of scoring words. Here, Information Gain is being used. For each unique word, the number of documents in each class, in which the word occurs, is noted.

3 Feature Selection Step - Algorithm
for each document d in training set for each word w if w has been encountered before increment the document count for Category(d) in record for w else create a new data record for w using the record for w, calculate Information Gain Select NUM_KEYWORDS with highest Information Gain.

4 Feature Selection Word Cat1 Cat2 Cat3 Cat4 …. Cat20 Nation 5 15 4 3 1
God 12 13 7 9 Soccer 6 2 19 News 10

5 Information Gain G (t) = - i=1 to m Pr (ci) log Pr (ci) +
Pr(t) i=1 to m Pr (ci|t) log Pr (ci|t) + Pr(t) i=1 to m Pr (ci|t) log Pr (ci|t) Pr (ci) = 1/ 20 Pr (t) = (i=1 to m Catm(t)) / (i=1 to m j=1 to w Catm(j)) Pr (ci|t) = Catm (t) / i=1 to m Catm(t)

6 Classification Algorithm
KeyWord Cat1 Cat2 Cat3 Cat4 …. Cat20 Nation 5 15 4 3 1 God 12 13 7 9 Soccer 6 2 19 News 10


Download ppt "Implementation Details of the Text Classification Project"

Similar presentations


Ads by Google