Download presentation
Presentation is loading. Please wait.
Published byConrad Powers Modified over 9 years ago
1
A Comparison of Event Models for Naïve Bayes Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University
2
Naïve Bayes for Text Commonly and successfully used Two implementations with different formulations: multi-variate Bernoulli & Multinomial
3
Documents are binary word vectors Generative model: biased coin flips Used by: –[Robertson & Sparck-Jones 76] –[Lewis 92] –[Kalt & Croft 96] –[Larkey & Croft 96] –[Sahami 96] –[Koller & Sahami 97] cable corn damp drawer dropped Multi-variate Bernoulli activity
4
Multinomial Documents have word occurrence counts Generative model: urn drawing Used by: –[Guthrie & Walker 94] –[Lewis & Gale 94] –[Kalt & Croft 96] –[Li & Yamanishi 97] –[Mitchell 97] –[McCallum et al. 98] –[Nigam et al. 98] corn activity dropped maize in while
5
Classifier Representation Multi-variate BernoulliMultinomial
6
Building Classifiers Multi-variate Bernoulli Corn tastes good. Multinomial Eat corn; drink corn. Maize is pretty.
7
Document Representation Corn prices rose today while corn futures dropped in surprising trading activity. Corn... Multinomial Multi-variate Bernoulli
8
Classification with Bayes’ Rule Multi-variate Bernoulli -Binary over all words Multinomial -Evidence from occurring words
9
Feature Selection Select words with highest average mutual information with class: Multi-variate Bernoulli: over document events. Multinomial: over word events. Corn tastes good. Eat corn; drink corn. Maize is pretty. Corntastes good. Eatcorn; drinkcorn. Oil is expensive. Candy corn is sweet. Candy Oil is expensive. cornis sweet.
10
Experiments 5 domains; between 2-100 classes Vary vocabulary using mutual information Compare Multi-variate Bernoulli and Multinomial
11
Industry Sector Hierarchy 71 classes; 6500 documents
12
Yahoo ‘Science’ Hierarchy 95 classes; 13500 documents
13
Twenty Newsgroups 20 classes; 20000 documents
14
WebKB Homepages 4 classes; 4200 documents
15
Reuters-21578 Categories interestmoney-fxship Binary classification; 12900 documents
16
Discussion Multinomial better for tasks needing large vocabularies Vocabulary size matters Multinomial accounts for amount of evidence (number of words) Limited dependencies easier with Multi-variate Bernoulli Take care when adding non-text features to Multinomial
17
Naïve Bayes Classification Simplistic independence assumption Surprisingly good results in many domains
18
corn maize activity
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.