Download presentation
Presentation is loading. Please wait.
Published byDaniel Sanders Modified over 6 years ago
1
Tackling the Poor Assumptions of Naive Bayes Text Classifiers Pubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teevan, David R.Karger Liang Lan 11/19/2007
2
Outline Introduce the Multinomial Naive Bayes Model for Text Classification. The Poor Assumption of Multinomial Naive Bayes Model. Solutions to some problem of the Naive Bayes Classifier.
3
Multinomial Naive Bayes Model for Text Classification
Given: A description of the document d: f = (f1,…..,fn) fi is the frequency count of word i occurring in document d A fixed number of classes: C = {1, 2,…, m}, Parameter Vector for each class The parameter vector for a class c is ci is the probability of word i occurs in class c Determine: The class label of d. θ
4
Introduce the Multinomial Naive Bayes Model for Text Classification
The likelihood of a document is a product of the parameters of the words that appear in the document. Selecting the class with the largest posterior probability
5
Parameter Estimation for Naive Bayes Model
The parameters θci must be estimated from the training data. Then, we get the MNB classifier. For simplicity, we use uniform prior estimate. lMNB(d) = argmaxc(fiwci)
6
The Poor Assumption of Multinomial Naive Bayes Model
Two systemic errors (Occurring in any naive bayes classifier ) 1. Skewed Data Bias ( uneven training size) 2. Weight Magnitude Errors (Caused by the independence assumption) The Multinomial does not model the text well
7
Correcting the skewed data bias
More training examples for one class than another --- can cause the classifier to prefer one class over the other. Using Complement Naive Bayes Nci is the number of times word i occurred in documents in classes other than c. ~
8
Correcting the Weight Magnitude Errors
Caused by the independence assumption Ex. “San Francisco” , “Boston” Normalizing the Weight Vectors We call this Weight-normalized Complement Naive Bayes(WCNB).
9
Modeling Text Better Transforming Term Frequency
Transforming by Document Frequency Transforming Based on Length
10
Transforming Term Frequency
The term distribution had heavier tails than predicted by the multinomial model, instead appearing like a power-law distribution. The probability is also proportional to So we can use the multinomial model to generate probabilities proportional to a class of power law distribution via a simple transform,
11
Transforming by Document Frequency
Common words are unlikely to be related to the class of a document, but random variations can create apparent fictitious correlation. Discount the weight of the common words. Inverse document frequency (a common IR transform)– to discount terms by their document frequency.
12
Transforming Based on Length
The jump for larger term frequency is disproportionally large with the length of the document. Discount the influence of long documents by transforming the term frequency:
13
The New Naive Bayes procedure
14
The result of experiment comparing MNB to TWCNB and the SVM shows that the TWCNB’s performance is substantially better than MNB, and approach the SVM’s performance.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.