A Comparison of Event Models for Naïve Bayes Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.

A Comparison of Event Models for Naïve Bayes Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University

Naïve Bayes for Text Commonly and successfully used Two implementations with different formulations: multi-variate Bernoulli & Multinomial

Documents are binary word vectors Generative model: biased coin flips Used by: –[Robertson & Sparck-Jones 76] –[Lewis 92] –[Kalt & Croft 96] –[Larkey & Croft 96] –[Sahami 96] –[Koller & Sahami 97] cable corn damp drawer dropped Multi-variate Bernoulli activity

Multinomial Documents have word occurrence counts Generative model: urn drawing Used by: –[Guthrie & Walker 94] –[Lewis & Gale 94] –[Kalt & Croft 96] –[Li & Yamanishi 97] –[Mitchell 97] –[McCallum et al. 98] –[Nigam et al. 98] corn activity dropped maize in while

Classifier Representation Multi-variate BernoulliMultinomial

Building Classifiers Multi-variate Bernoulli Corn tastes good. Multinomial Eat corn; drink corn. Maize is pretty.

Document Representation Corn prices rose today while corn futures dropped in surprising trading activity. Corn... Multinomial Multi-variate Bernoulli

Classification with Bayes’ Rule Multi-variate Bernoulli -Binary over all words Multinomial -Evidence from occurring words

Feature Selection Select words with highest average mutual information with class: Multi-variate Bernoulli: over document events. Multinomial: over word events. Corn tastes good. Eat corn; drink corn. Maize is pretty. Corntastes good. Eatcorn; drinkcorn. Oil is expensive. Candy corn is sweet. Candy Oil is expensive. cornis sweet.

Experiments 5 domains; between 2-100 classes Vary vocabulary using mutual information Compare Multi-variate Bernoulli and Multinomial

Industry Sector Hierarchy 71 classes; 6500 documents

Yahoo ‘Science’ Hierarchy 95 classes; 13500 documents

Twenty Newsgroups 20 classes; 20000 documents

WebKB Homepages 4 classes; 4200 documents

Reuters-21578 Categories interestmoney-fxship Binary classification; 12900 documents

Discussion Multinomial better for tasks needing large vocabularies Vocabulary size matters Multinomial accounts for amount of evidence (number of words) Limited dependencies easier with Multi-variate Bernoulli Take care when adding non-text features to Multinomial

Naïve Bayes Classification Simplistic independence assumption Surprisingly good results in many domains

corn maize activity

A Comparison of Event Models for Naïve Bayes Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.

Similar presentations

Presentation on theme: "A Comparison of Event Models for Naïve Bayes Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Comparison of Event Models for Naïve Bayes Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.

Similar presentations

Presentation on theme: "A Comparison of Event Models for Naïve Bayes Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University."— Presentation transcript:

Similar presentations

About project

Feedback