Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning to Classify Documents Edwin Zhang Computer Systems Lab

Similar presentations


Presentation on theme: "Learning to Classify Documents Edwin Zhang Computer Systems Lab"— Presentation transcript:

1 Learning to Classify Documents Edwin Zhang Computer Systems Lab 2009-2010

2 Abstract Classifying documents
Will use a Bayesian method and calculate conditional probability Use a set of Training Documents Choose a set of features

3 Introduction Learning to Classify Documents Use a Bayesian Method
Code in Java

4 Background Naïve Bayes Classifier/Bayesian Method
computes the conditional probability p(T|D) for a given document D for every topic Assigns the document D to the topic with the largest conditional probability

5 Development Program has two steps: Learning Prediction
training documents conditional probability features selection

6 Development Prediction
Predicting what a unknown document is talking about based on prediction section

7 Development (continued)
Created Document, Category classes Document class deals with the documents, has two functions Category class deals with the categories, has three classes Each category contains an array of documents Each document contains an array of terms. Right now, my program: Reads in documents Creates array of categories, which has array of documents Has two categories right now

8 Development (continued)
What I still need to do: Get documents to read in so that my program can learn Develop and program a learning formula Test my program's learning Add more categories

9 Expected Results Initially, the program may have trouble classifying documents into the correct category As the program learns more and improves its formulas, it will get better at classifying documents into the correct categories.

10 Works Cited http://www.nltk.org/book My dad
Chai, Kian Ming Adam, Hai Leong Chieu, and Hwee Tou Ng. ACM Poral. Assocation of Computing Machinery, Web. 14 Jan < &coll=Portal&dl=ACM&CFID= &CFTOKEN= >.

11 Works Cited (continued)
Eyheramendy, Susana, and David Madigan. "A Flexible Bayesian Generalized Linear Model for Dichotomous Response Data with an Application to Text Categorization." Lecture Notes-Monograph Series 54 (2007): JSTOR. Web. 25 Oct < Lavine, Michael, and Mike West. "A Bayesian Method for Classification and Discrimination." Canadian Journal of Statistics 20.4 (1992): JSTOR. Web. 14 Jan <


Download ppt "Learning to Classify Documents Edwin Zhang Computer Systems Lab"

Similar presentations


Ads by Google