CATEGORIZATION OF NEWS ARTICLES USING NEURAL TEXT CATEGORIZER

Slides:



Advertisements
Similar presentations
PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
Advertisements

Naïve-Bayes Classifiers Business Intelligence for Managers.
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Classification with Multiple Decision Trees
Data Mining Classification: Alternative Techniques
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Nonlinear Unsupervised Feature Learning How Local Similarities Lead to Global Coding Amirreza Shaban.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
On feature distributional clustering for text categorization Bekkerman, El-Yaniv, Tishby and Winter The Technion. June, 27, 2001.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin
Scalable Text Mining with Sparse Generative Models
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
1 IFT6255: Information Retrieval Text classification.
An Introduction to Support Vector Machines Martin Law.
ENN: Extended Nearest Neighbor Method for Pattern Recognition
Data mining and machine learning A brief introduction.
Smart RSS Aggregator A text classification problem Alban Scholer & Markus Kirsten 2005.
An Example of Course Project Face Identification.
Universit at Dortmund, LS VIII
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Externally Enhanced Classifiers and Application in Web Page Classification Join work with Chi-Feng Chang and Hsuan-Yu Chen Jyh-Jong Tsay National Chung.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Trevor Crum 04/23/2014 *Slides modified from Shamil Mustafayev’s 2013 presentation * 1.
Saliency Aggregation: A Data- driven Approach Long Mai Yuzhen Niu Feng Liu Department of Computer Science, Portland State University Portland, OR,
Introduction to String Kernels Blaz Fortuna JSI, Slovenija.
An Introduction to Support Vector Machine (SVM)
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Machine Learning for Spam Filtering 1 Sai Koushik Haddunoori.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
On Utillizing LVQ3-Type Algorithms to Enhance Prototype Reduction Schemes Sang-Woon Kim and B. John Oommen* Myongji University, Carleton University*
Text Document Categorization by Term Association Maria-luiza Antonie Osmar R. Zaiane University of Alberta, Canada 2002 IEEE International Conference on.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin, Sanghack Lee, Ngot Bui.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Speaker : Shau-Shiang Hung ( 洪紹祥 ) Adviser : Shu-Chen Cheng ( 鄭淑真 ) Date : 99/05/04 1 Qirui Zhang, Jinghua Tan, Huaying Zhou, Weiye Tao, Kejing He, "Machine.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Redpoll A machine learning library based on hadoop Jeremy CS Dept. Jinan University, Guangzhou.
Link Distribution on Wikipedia [0407]KwangHee Park.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Question Classification using Support Vector Machine Dell Zhang National University of Singapore Wee Sun Lee National University of Singapore SIGIR2003.
A Comprehensive Comparative Study on Term Weighting Schemes for Text Categorization with SVM Lan Man 3 Nov, 2004.
Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01.
1 An introduction to support vector machine (SVM) Advisor : Dr.Hsu Graduate : Ching –Wen Hong.
Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Shamil Mustafayev 04/16/
Introduction to Machine Learning Prof. Nir Ailon Lecture 5: Support Vector Machines (SVM)
11 Automated multi-label text categorization with VG-RAM weightless neural networks Presenter: Guan-Yu Chen A. F. DeSouza, F. Pedroni, E. Oliveira, P.
DeepWalk: Online Learning of Social Representations
Semi-Supervised Clustering
Sentiment analysis algorithms and applications: A survey
Web Services and Application of Multi-Agent Paradigm for DL
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Estimating Link Signatures with Machine Learning Algorithms
An Introduction to Support Vector Machines
CSSE463: Image Recognition Day 20
Artificial Intelligence Lecture No. 28
Advanced Artificial Intelligence Classification
Other Classification Models: Support Vector Machine (SVM)
Presentation transcript:

CATEGORIZATION OF NEWS ARTICLES USING NEURAL TEXT CATEGORIZER FUZZ-IEEE 2009, Korea, August 20-24, 2009 Taeho Jo Inha University Reporter:洪紹祥 Adviser:鄭淑真

OUTLINE Introduction Previous Works Framework Empirical Results Conclusions

INTRODUCTION(1/2) Text categorization is necessary for managing textual text as efficiently as possible. Text categorization is requires the two manual preliminary: The predefined of categorization The preparation of sample labeled documents.

INTRODUCTION(2/2) Traditional Text Categorization requires encoding documents into numerical vector Cause two main problems: Huge dimensionality. Sparse distribution. Solve the two main problem Encoded into String Vector Different from numerical Vector, words are given as feature value. Propose a neural network, called NTC(Neural Text Categorizer)

PREVIOUS WORKS(1/2) Popular approaches for Text categorization KNN(K Nearest Neighbor) NB(Naïve Bayes) SVM(Support Vector Machine) Neural Networks Causes two Main problems Huge dimensionality. Sparse distribution. Using String kernel in SVM Failed to improve the performance.

PREVIOUS WORKS(2/2) String kernel Receives two raw texts as inputs and computes their syntactical similarity between them Advantage Don’t need to be encoded into numerical vector. More transparent than numerical vector . Easier to trace why each document is classified. Disadvantages Cost too much time for computing the similarity.

FRAMEWORK(1/2)   Bag of Words

FRAMEWORK(2/2)  

EMPIRICAL RESULTS(1/3) The collection of news articles, called NewsPage. News articles 500 dimensional numerical vectors. 50 dimensional string vectors.

EMPIRICAL RESULTS(2/3) The configuration of participating approaches

EMPIRICAL RESULTS(3/3) The Results of This Set of Experiments

CONCLUSIONS The four contributions are considered as the significance of this research. According to the results of the set of experiments, this research proposes the practical approach. It solved the two main problems, the huge dimensionality the sparse distribution Created a new neural network, called NTC. Provides the potential easiness for tracing why each document is classified so.