Download presentation
Presentation is loading. Please wait.
Published byJulia Harrison Modified over 6 years ago
1
CATEGORIZATION OF NEWS ARTICLES USING NEURAL TEXT CATEGORIZER
FUZZ-IEEE 2009, Korea, August 20-24, 2009 Taeho Jo Inha University Reporter:洪紹祥 Adviser:鄭淑真
2
OUTLINE Introduction Previous Works Framework Empirical Results
Conclusions
3
INTRODUCTION(1/2) Text categorization is necessary for managing textual text as efficiently as possible. Text categorization is requires the two manual preliminary: The predefined of categorization The preparation of sample labeled documents.
4
INTRODUCTION(2/2) Traditional Text Categorization requires encoding documents into numerical vector Cause two main problems: Huge dimensionality. Sparse distribution. Solve the two main problem Encoded into String Vector Different from numerical Vector, words are given as feature value. Propose a neural network, called NTC(Neural Text Categorizer)
5
PREVIOUS WORKS(1/2) Popular approaches for Text categorization
KNN(K Nearest Neighbor) NB(Naïve Bayes) SVM(Support Vector Machine) Neural Networks Causes two Main problems Huge dimensionality. Sparse distribution. Using String kernel in SVM Failed to improve the performance.
6
PREVIOUS WORKS(2/2) String kernel
Receives two raw texts as inputs and computes their syntactical similarity between them Advantage Don’t need to be encoded into numerical vector. More transparent than numerical vector . Easier to trace why each document is classified. Disadvantages Cost too much time for computing the similarity.
7
FRAMEWORK(1/2) Bag of Words
8
FRAMEWORK(2/2)
9
EMPIRICAL RESULTS(1/3) The collection of news articles, called NewsPage. News articles 500 dimensional numerical vectors. 50 dimensional string vectors.
10
EMPIRICAL RESULTS(2/3) The configuration of participating approaches
11
EMPIRICAL RESULTS(3/3) The Results of This Set of Experiments
12
CONCLUSIONS The four contributions are considered as the significance of this research. According to the results of the set of experiments, this research proposes the practical approach. It solved the two main problems, the huge dimensionality the sparse distribution Created a new neural network, called NTC. Provides the potential easiness for tracing why each document is classified so.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.