Text Classification: An Implementation Project Prerak Sanghvi Computer Science and Engineering Department State University of New York at Buffalo
Algorithm to be used I intend to use a BackPropagation Artificial Neural Network Inputs are in terms of whether a particular keyword is present or not in a document Output is in terms of the category into which the document should be classified
What are the keywords? This falls under a broader class of problems, known as Feature Selection. Some technique in Feature Selection will be used to automatically or semi-automatically pick the keywords.
Organization of the Project The project will really consist of two phases, each of which is equally important for good results: –Feature Selection –Implementation of the ANN
Artificial Neural Network k2k2 k3k3 keywords k1k1 knkn Hidden layer classification
Example data set One of the several corpora available on the web will be used
After ANN Once the technique to extract the feature set from the data set is implemented, any algorithm can be used to make the classification. After ANN is successfully implemented, other algorithms, especially Naïve Bayes classification method will be implemented. Comparison of results from different methods will be compared. Another possibility is the coupling of two methods to improve the overall performance.