Naive Bayes for Document Classification

Naive Bayes for Document Classification
Illustrative Example

Document Classification
Given a document, find its class (e.g. headlines, sports, economics, fashion…) We assume the document is a “bag-of-words”. d ~ { t1, t2, t3, … tnd } Using Naive Bayes with multinomial distribution:

Binomial Distribution
n independent trials (a Bernouilli trial), each of which results in success with probability of p binomial distribution gives the probability of any particular combination of numbers of successes for the two categories. e.g. You flip a coin 10 times with PHeads=0.6 What is the probability of getting 8 H, 2T? P(k) = with k being number of successes (or to see the similarity with multinomial, consider first class is selected k times, ...)

Multinomial Distribution
Generalization of Binomial distribution n independent trials, each of which results in one of the k outcomes. multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories k. e.g. You have balls in three colours in a bin (3 balls of each color => pR=PG=PB), from which you draw n=9 balls with replacement. What is the probability of getting 8 Red, 1 Green, 0 Blue. P(x1,x2,x3) =

Naive Bayes w/ Multinomial Model
from McCallum and Nigam, 1995 Advanced

Naive Bayes w/ Multivariate Binomial
from McCallum and Nigam, 1995 Advanced

Smoothing For each term, t, we need to estimate P(t|c)
Tct is the count of term t in all documents of class c 7

Smoothing Because an estimate will be 0 if a term does not appear with a class in the training data, we need smoothing: Laplace Smoothing |V| is the number of terms in the vocabulary 8

Two topic classes: “China”, “not China”
Training set docID c = China? 1 Chinese Beijing Chinese Yes 2 Chinese Chinese Shangai 3 Chinese Macao 4 Tokyo Japan Chinese No Test set 5 Chinese Chinese Chinese Tokyo Japan ? Two topic classes: “China”, “not China” V = {Beijing, Chinese, Japan, Macao, Tokyo, Shangai} N = 4 9

Probability Estimation Classification
Training set docID c = China? 1 Chinese Beijing Chinese Yes 2 Chinese Chinese Shangai 3 Chinese Macao 4 Tokyo Japan Chinese No Test set 5 Chinese Chinese Chinese Tokyo Japan ? Probability Estimation Classification 10

Summary: Miscellaneous
Naïve Bayes is linear in the time is takes to scan the data When we have many terms, the product of probabilities with cause a floating point underflow, therefore: For a large training set, the vocabulary is large. It is better to select only a subset of terms. For that is used “feature selection”. However, accuracy is not badly affected by irrelevant attributes, if data is large. 11 11

Naive Bayes for Document Classification

Similar presentations

Presentation on theme: "Naive Bayes for Document Classification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Naive Bayes for Document Classification

Similar presentations

Presentation on theme: "Naive Bayes for Document Classification"— Presentation transcript:

Similar presentations

About project

Feedback