Download presentation
Presentation is loading. Please wait.
Published byMoses Cannon Modified over 8 years ago
1
Naïve Bayes Classifier Christina Wallin, Period 3 Computer Systems Research Lab 2008-2009
2
Goal -create and test the effectiveness of a naïve Bayes classifier on the 20 Newsgroup database -compare the effectiveness of a simple naïve Bayes classifier and one optimized -possible optimizations are using a Porter stemmer to make the program recognize words such as “runs” and “running” as the same word since they have the same stem
3
What is it? -Classification method based on independence assumption -Machine learning -trained with test cases as to what the classes are, and then can classify texts -classification based on the probability that a word will be in a specific class of text
4
Previous Research Algorithm has been around for a while (first use is in 1966) At first, it was thought to be less effective because of its simplicity and false independence assumption, but a recent review of the uses of the algorithm has found that it is actually rather effective( "Idiot's Bayes--Not So Stupid After All?" by David Hand and Keming Yu)
5
Procedures So far, a program which inputs a text file Then, it parses that file and removes all of the punctuation and capitalization so that “The.” would be the same as “the” Makes a dictionary of all of the words present and their frequency With PyLab, graphs the 20 most frequent words
6
Results 20 most frequent words in sci.space from 20 Newsgroup 20 most frequent words in rec.sports.baseball from 20 Newsgroup
7
Results Approx the same length stories sci.space more dense and less to the point Most frequent word, ‘the’, the same
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.