Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Combining Unsupervised Feature Selection.

Slides:



Advertisements
Similar presentations
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Advertisements

ONLINE ARABIC HANDWRITING RECOGNITION By George Kour Supervised by Dr. Raid Saabne.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
An Overview of Machine Learning
Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Text Mining with Machine Learning.
A Survey on Text Classification
Publication Venues Main Neural Network Conferences –NIPS (Neural Information Processing Systems) –IJCNN (Intl Joint Conf on Neural Networks) Main Neural.
COMP 328: Midterm Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
ApMl (All Purpose Machine Learning) Toolkit David W. Miller and Helen Howell Semantic Web Final Project Spring 2002 Department of Computer Science University.
Text Classification: An Implementation Project Prerak Sanghvi Computer Science and Engineering Department State University of New York at Buffalo.
Presented by Zeehasham Rasheed
Creating and Visualizing Document Classification J. Gelernter, D. Cao, R. Lu, E. Fink, J. Carbonell.
CS Instance Based Learning1 Instance Based Learning.
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
Introduction to machine learning
Introduction to Data Mining Engineering Group in ACL.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
ENN: Extended Nearest Neighbor Method for Pattern Recognition
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Data mining and machine learning A brief introduction.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Smart RSS Aggregator A text classification problem Alban Scholer & Markus Kirsten 2005.
Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Filtering and Recommendation INST 734 Module 9 Doug Oard.
Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Software Release and Support.
Universit at Dortmund, LS VIII
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Image Classification 영상분류
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Ceyhun Karbeyaz Çağrı Toraman Anıl Türel Ahmet Yeniçağ Text Categorization For Turkish News.
Externally Enhanced Classifiers and Application in Web Page Classification Join work with Chi-Feng Chang and Hsuan-Yu Chen Jyh-Jong Tsay National Chung.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Learning from observations
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
AUTOMATIC TARGET RECOGNITION AND DATA FUSION March 9 th, 2004 Bala Lakshminarayanan.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Text Document Categorization by Term Association Maria-luiza Antonie Osmar R. Zaiane University of Alberta, Canada 2002 IEEE International Conference on.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Data Mining and Decision Support
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
Support-Vector Networks C Cortes and V Vapnik (Tue) Computational Models of Intelligence Joon Shik Kim.
Musical Genre Categorization Using Support Vector Machines Shu Wang.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Pattern Recognition NTUEE 高奕豪 2005/4/14. Outline Introduction Definition, Examples, Related Fields, System, and Design Approaches Bayesian, Hidden Markov.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY & ROCCHIO CLASSIFICATION Kezban Demirtas
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.
CS 9633 Machine Learning Support Vector Machines
Machine Learning Models
Sentiment analysis algorithms and applications: A survey
Eick: Introduction Machine Learning
School of Computer Science & Engineering
What is Pattern Recognition?
Face Recognition and Detection Using Eigenfaces
Prepared by: Mahmoud Rafeek Al-Farra
Computer Vision Chapter 4
Information Retrieval
Presentation transcript:

Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Combining Unsupervised Feature Selection Strategy for Automatic Text Categorization

Ping-Tsun Chang Introduction In recent researches –The limit of using statistic or computational approach for natural language understanding –The develop of machine learning technique is almost reached its bound –Natural language is infinite and nonlinear! Unsupervised Feature Selection

Ping-Tsun Chang Text Categorization Background Knowledge Problem Definition: Text Categorization is a problem to assign a unknown lebel to a large amount of document by a large amount of text data. Sensing Segmentation Classification Post-Processing Feature ExtractionDecision

Ping-Tsun Chang Background Knowledge Machine Learning Using Computer help us to induction from complex and large amount of pattern data Bayesian Learning Instance-Based Learning –K-Nearest Neighbors Neural Networks Support Vector Machine

Ping-Tsun Chang Background Knowledge Feature Selection Information Gain Mutual Information CHI-Square

Ping-Tsun Chang Baysian Classifier Recent Researches –Naïve Bayes classifiers are competitive with other techniques in accuracy –Fast: single pass and quickly classify new documents –ATHENA: EDBT 2000

Ping-Tsun Chang Machine Learning Approaches: kNN Classifier d ?

Ping-Tsun Chang Machine Learning Approaches: Support Vector Machine Basic hypotheses : Consistent hypotheses of the Version Space Project the original training data in space X to a higher dimension feature space F via a Mercel operator K

Ping-Tsun Chang What is Certainly? Rule for SVM Rule for kNN

Ping-Tsun Chang Algorithm for Two-Stage Automatic Text Categorization ALGORITHM Two-Stage-Text-Categorization (input: document d) returns category C Statistic: Trained classifier: Traditional-Classifier The feature set: F The new feature set by user feedback: U i for related catehory C i For new document d C ← Traditional-Classifier (d) If NOT satisfy the rule of uncertainly Return C Else For all category C i If d have the feature in F C ← C i Return C End If C j ←User-Input U j ← U j + User-Selected C ←C j END If Return C

Ping-Tsun Chang Determine threshold of the Rule

Ping-Tsun Chang Experienments

References [1] Dunja Mladenic, J. Stefen Institute, Text-Learning and Related Intelligent Agents: A Survey, IEEE Transactions on Intelligent Systems, pp , [2] Yiming Yang, Improving Text Categorization Methods for Event Tracking, In Proceedings of the 23 th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’00), [3] Yiming Yang, Combining Multiple Learning Strategies for Effective Cross Vaildation, In Proceedings of the 17 th International Conference on Machine Learning (ICML ’00),2000. [4] V. Vapnik, The Nature of Statiscal Learning Theory. Springer, New York, [5] Thorsten Joachims, Text Categorization with Support Vector Machines: Learning with Many Relevent Features. In European Conference on Machine Learning(ECML ’98), pages , Berlin, 1998, Springer. [6] Yiming Yang, A re-examination of Text Categorization Methods, In Proceedings of the 22 th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’99), [7] Lee-Feng Chien. Pat-tree-based keyword extraction for Chinese information retrieval. In Proceedings of the 20 th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’97), pages 50-58, [8] Jyh-Jong Tsay and Jing-Doo Wang, Improving Automatic Chinese Text Categorization by Error Correction. In Proceedings of Information Retrieval of Asian Languages(IRAL ’00), [9] James Tin-Yau Kwok, Automated Text Classification Using Support Vector Machine, International Conference on Neural Information Processing(ICNIP ’98), [10] Daphne Koller and Simon Tone, Support Vector Machine Active Learning with Applications to Text Classification, In Proceedings of International Conference on Machine Learning(ICML ’00), [11] Central News Agency, URL: [12] Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods, Cambridge University Press, [13] D. E. Appelt, D. J. Israel. Introduction to Information Extraction Technology. Tutorial for International Joint Conference on Artificial Intelligence, Stockholm, August 1999.