Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Comparison of Event Models for Naïve Bayes Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.

Similar presentations


Presentation on theme: "A Comparison of Event Models for Naïve Bayes Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University."— Presentation transcript:

1 A Comparison of Event Models for Naïve Bayes Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University

2 Naïve Bayes for Text Commonly and successfully used Two implementations with different formulations: multi-variate Bernoulli & Multinomial

3 Documents are binary word vectors Generative model: biased coin flips Used by: –[Robertson & Sparck-Jones 76] –[Lewis 92] –[Kalt & Croft 96] –[Larkey & Croft 96] –[Sahami 96] –[Koller & Sahami 97] cable corn damp drawer dropped Multi-variate Bernoulli activity

4 Multinomial Documents have word occurrence counts Generative model: urn drawing Used by: –[Guthrie & Walker 94] –[Lewis & Gale 94] –[Kalt & Croft 96] –[Li & Yamanishi 97] –[Mitchell 97] –[McCallum et al. 98] –[Nigam et al. 98] corn activity dropped maize in while

5 Classifier Representation Multi-variate BernoulliMultinomial

6 Building Classifiers Multi-variate Bernoulli Corn tastes good. Multinomial Eat corn; drink corn. Maize is pretty.

7 Document Representation Corn prices rose today while corn futures dropped in surprising trading activity. Corn... Multinomial Multi-variate Bernoulli

8 Classification with Bayes’ Rule Multi-variate Bernoulli -Binary over all words Multinomial -Evidence from occurring words

9 Feature Selection Select words with highest average mutual information with class: Multi-variate Bernoulli: over document events. Multinomial: over word events. Corn tastes good. Eat corn; drink corn. Maize is pretty. Corntastes good. Eatcorn; drinkcorn. Oil is expensive. Candy corn is sweet. Candy Oil is expensive. cornis sweet.

10 Experiments 5 domains; between 2-100 classes Vary vocabulary using mutual information Compare Multi-variate Bernoulli and Multinomial

11 Industry Sector Hierarchy 71 classes; 6500 documents

12 Yahoo ‘Science’ Hierarchy 95 classes; 13500 documents

13 Twenty Newsgroups 20 classes; 20000 documents

14 WebKB Homepages 4 classes; 4200 documents

15 Reuters-21578 Categories interestmoney-fxship Binary classification; 12900 documents

16 Discussion Multinomial better for tasks needing large vocabularies Vocabulary size matters Multinomial accounts for amount of evidence (number of words) Limited dependencies easier with Multi-variate Bernoulli Take care when adding non-text features to Multinomial

17 Naïve Bayes Classification Simplistic independence assumption Surprisingly good results in many domains

18 corn maize activity


Download ppt "A Comparison of Event Models for Naïve Bayes Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University."

Similar presentations


Ads by Google