Download presentation
Presentation is loading. Please wait.
Published byChad Sparks Modified over 9 years ago
1
Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Combining Unsupervised Feature Selection Strategy for Automatic Text Categorization
2
Ping-Tsun Chang Introduction In recent researches –The limit of using statistic or computational approach for natural language understanding –The develop of machine learning technique is almost reached its bound –Natural language is infinite and nonlinear! Unsupervised Feature Selection
3
Ping-Tsun Chang Text Categorization Background Knowledge Problem Definition: Text Categorization is a problem to assign a unknown lebel to a large amount of document by a large amount of text data. Sensing Segmentation Classification Post-Processing Feature ExtractionDecision
4
Ping-Tsun Chang Background Knowledge Machine Learning Using Computer help us to induction from complex and large amount of pattern data Bayesian Learning Instance-Based Learning –K-Nearest Neighbors Neural Networks Support Vector Machine
5
Ping-Tsun Chang Background Knowledge Feature Selection Information Gain Mutual Information CHI-Square
6
Ping-Tsun Chang Baysian Classifier Recent Researches –Naïve Bayes classifiers are competitive with other techniques in accuracy –Fast: single pass and quickly classify new documents –ATHENA: EDBT 2000
7
Ping-Tsun Chang Machine Learning Approaches: kNN Classifier d ?
8
Ping-Tsun Chang Machine Learning Approaches: Support Vector Machine Basic hypotheses : Consistent hypotheses of the Version Space Project the original training data in space X to a higher dimension feature space F via a Mercel operator K
9
Ping-Tsun Chang What is Certainly? Rule for SVM Rule for kNN
10
Ping-Tsun Chang Algorithm for Two-Stage Automatic Text Categorization ALGORITHM Two-Stage-Text-Categorization (input: document d) returns category C Statistic: Trained classifier: Traditional-Classifier The feature set: F The new feature set by user feedback: U i for related catehory C i For new document d C ← Traditional-Classifier (d) If NOT satisfy the rule of uncertainly Return C Else For all category C i If d have the feature in F C ← C i Return C End If C j ←User-Input U j ← U j + User-Selected C ←C j END If Return C
11
Ping-Tsun Chang Determine threshold of the Rule
12
Ping-Tsun Chang Experienments
13
References [1] Dunja Mladenic, J. Stefen Institute, Text-Learning and Related Intelligent Agents: A Survey, IEEE Transactions on Intelligent Systems, pp. 44-54, 1999. [2] Yiming Yang, Improving Text Categorization Methods for Event Tracking, In Proceedings of the 23 th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’00), 2000. [3] Yiming Yang, Combining Multiple Learning Strategies for Effective Cross Vaildation, In Proceedings of the 17 th International Conference on Machine Learning (ICML ’00),2000. [4] V. Vapnik, The Nature of Statiscal Learning Theory. Springer, New York, 1995. [5] Thorsten Joachims, Text Categorization with Support Vector Machines: Learning with Many Relevent Features. In European Conference on Machine Learning(ECML ’98), pages 137-142, Berlin, 1998, Springer. [6] Yiming Yang, A re-examination of Text Categorization Methods, In Proceedings of the 22 th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’99), 1999. [7] Lee-Feng Chien. Pat-tree-based keyword extraction for Chinese information retrieval. In Proceedings of the 20 th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’97), pages 50-58, 1997. [8] Jyh-Jong Tsay and Jing-Doo Wang, Improving Automatic Chinese Text Categorization by Error Correction. In Proceedings of Information Retrieval of Asian Languages(IRAL ’00), 2000. [9] James Tin-Yau Kwok, Automated Text Classification Using Support Vector Machine, International Conference on Neural Information Processing(ICNIP ’98), 1998. [10] Daphne Koller and Simon Tone, Support Vector Machine Active Learning with Applications to Text Classification, In Proceedings of International Conference on Machine Learning(ICML ’00), 2000. [11] Central News Agency, URL: http://www.cna.com.tw [12] Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods, Cambridge University Press, 2000. [13] D. E. Appelt, D. J. Israel. Introduction to Information Extraction Technology. Tutorial for International Joint Conference on Artificial Intelligence, Stockholm, August 1999.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.