Naïve Bayes Classifier Christina Wallin, Period 3 Computer Systems Research Lab 2008-2009.

Naïve Bayes Classifier Christina Wallin, Period 3 Computer Systems Research Lab 2008-2009

Goal -create and test the effectiveness of a naïve Bayes classifier on the 20 Newsgroup database -compare the effectiveness of a simple naïve Bayes classifier and one optimized -possible optimizations are using a Porter stemmer to make the program recognize words such as “runs” and “running” as the same word since they have the same stem

What is it? -Classification method based on independence assumption -Machine learning -trained with test cases as to what the classes are, and then can classify texts -classification based on the probability that a word will be in a specific class of text

Previous Research Algorithm has been around for a while (first use is in 1966) At first, it was thought to be less effective because of its simplicity and false independence assumption, but a recent review of the uses of the algorithm has found that it is actually rather effective( "Idiot's Bayes--Not So Stupid After All?" by David Hand and Keming Yu)

Procedures So far, a program which inputs a text file Then, it parses that file and removes all of the punctuation and capitalization so that “The.” would be the same as “the” Makes a dictionary of all of the words present and their frequency With PyLab, graphs the 20 most frequent words

Results 20 most frequent words in sci.space from 20 Newsgroup 20 most frequent words in rec.sports.baseball from 20 Newsgroup

Results Approx the same length stories sci.space more dense and less to the point Most frequent word, ‘the’, the same

Naïve Bayes Classifier Christina Wallin, Period 3 Computer Systems Research Lab 2008-2009.

Similar presentations

Presentation on theme: "Naïve Bayes Classifier Christina Wallin, Period 3 Computer Systems Research Lab 2008-2009."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Naïve Bayes Classifier Christina Wallin, Period 3 Computer Systems Research Lab 2008-2009.

Similar presentations

Presentation on theme: "Naïve Bayes Classifier Christina Wallin, Period 3 Computer Systems Research Lab 2008-2009."— Presentation transcript:

Similar presentations

About project

Feedback