11 Automated multi-label text categorization with VG-RAM weightless neural networks Presenter: Guan-Yu Chen A. F. DeSouza, F. Pedroni, E. Oliveira, P.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Text Categorization.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Data Mining Classification: Alternative Techniques
Label Distribution Learning and Its Applications
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
ECE 8527 Homework Final: Common Evaluations By Andrew Powell.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Presented by Zeehasham Rasheed
Recommender systems Ram Akella November 26 th 2008.
What is adaptive web technology?  There is an increasingly large demand for software systems which are able to operate effectively in dynamic environments.
Introduction to Machine Learning Approach Lecture 5.
A k-Nearest Neighbor Based Algorithm for Multi-Label Classification Min-Ling Zhang
Disambiguation of References to Individuals Levon Lloyd (State University of New York) Varun Bhagwan, Daniel Gruhl (IBM Research Center) Varun Bhagwan,
Computer Organization ANGELITO I. CUNANAN JR. 1. What is Computer?  An electronic device used for storing and processing data.  It is a machine that.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Implementation in C+CUDA of Multi-Label Text Categorizers In automated multi-label text categorization problems with large numbers of labels, the training.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
The identification of interesting web sites Presented by Xiaoshu Cai.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Personalized Web Search by Mapping User Queries to Categories Fang Liu Presented by Jing Zhang CS491CXZ February 26, 2004.
Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.
1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Review Everything you need to know for the 1 st Quarter Test.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Externally Enhanced Classifiers and Application in Web Page Classification Join work with Chi-Feng Chang and Hsuan-Yu Chen Jyh-Jong Tsay National Chung.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Information Retrieval Part 2 Sissi 11/17/2008. Information Retrieval cont..  Web-Based Document Search  Page Rank  Anchor Text  Document Matching.
Speaker : Shau-Shiang Hung ( 洪紹祥 ) Adviser : Shu-Chen Cheng ( 鄭淑真 ) Date : 99/05/04 1 Qirui Zhang, Jinghua Tan, Huaying Zhou, Weiye Tao, Kejing He, "Machine.
Date: 2011/1/11 Advisor: Dr. Koh. Jia-Ling Speaker: Lin, Yi-Jhen Mr. KNN: Soft Relevance for Multi-label Classification (CIKM’10) 1.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Program Design. Simple Program Design, Fourth Edition Chapter 1 2 Objectives In this chapter you will be able to: Describe the steps in the program development.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
Semi-Supervised Clustering
CATEGORIZATION OF NEWS ARTICLES USING NEURAL TEXT CATEGORIZER
Source: Procedia Computer Science(2015)70:
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
Probabilistic Data Management
Prepared by: Mahmoud Rafeek Al-Farra
Text Categorization Assigning documents to a fixed set of categories
Introduction to Systems Analysis and Design Stefano Moshi Memorial University College System Analysis & Design BIT
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
CSSE463: Image Recognition Day 13
Presented By: Harshul Gupta
Presentation transcript:

11 Automated multi-label text categorization with VG-RAM weightless neural networks Presenter: Guan-Yu Chen A. F. DeSouza, F. Pedroni, E. Oliveira, P. M. Ciarelli, W. F. Henrique, L. Veronese, C. Badue. Neurocomputing, 2009, pp

22 Outline 1.Introduction 2.Multi-label text categorization 3.VG-RAN WNN 4.ML-KNN 5.Experimental evaluation 6.Conclusions & future work

33 1. Introduction (1/2) Most works on text categorization in the literature are focused on single label text categorization problems, where each document may have only a single label. However, in real-world problems, multi-label categorization is frequently necessary.

4 1. Introduction (2/2) 2 methods: –Virtual Generalizing Random Access Memory Weightless Neural Networks (VG-RAM WNN), –Multi-Label K-Nearest Neighbors (ML-KNN). 4 metrics: –Hamming loss, One-error, Coverage, & Average precision. 2 problems: –Categorization of free-text descriptions of economic activities, –Categorization of Web pages.

5 2. Multi-label text categorization

6 2.1 Evaluation metrics (1/5) Hamming loss (hloss j ) evaluate show many times the test document d j is misclassified: –A category not belonging to the document is predicted, –A category belonging to the document is not predicted. where |C| is the number of categories and Δ is the symmetric difference between the set of predicted categories P j and the set of appropriate categories C j of the test document d j.

7 2.1 Evaluation metrics (2/5) One-error (one-error j ) evaluates if the top ranked category is present in the set of proper categories C j of the test document d j : where [arg max f (d j,c)] returns the top ranked category for the test document d j.

8 2.1 Evaluation metrics (3/5) Coverage (coverage j ) measures how far we need to go down the rank of categories in order to cover all the possible categories assigned to a test document : where max r(d j,c) returns the maximum rank for the set of appropriate categories of the test document d j.

9 2.1 Evaluation metrics (4/5) Average precision (average-precision j ) evaluates the average of precisions computed after truncating the ranking of categories after each category c i belongs to C j in turn: where R jk is the set of ranked categories that goes from the top ranked category until a ranking position k where there is a category c i belongs to C j for d j, and precision j (R jk ) is the number of pertinent categories in R jk divided by |R jk |.

Evaluation metrics (5/5)

11 3. VG-RAN WNN (1/5) Virtual Generalizing Random Access Memory Weightless Neural Networks, VG-RAM WNN. RAM-based neural networks (N-tuple categorizers or Weightless neural networks, WNN) do not store knowledge in their connections but in Random Access Memories (RAM) inside the neurons. These neurons operate with binary input values and use RAM as lookup tables. –Each neurons’ synapses collect a vector of bits from the network’s inputs that is used as the RAM address. –The value stored at this address is the neuron’s output. Training can be made in one shot and basically consists of storing the desired output in the address as sociated with the input vector of the neuron.

12 3. VG-RAN WNN (2/5)

13 3. VG-RAN WNN (3/5)

14 3. VG-RAN WNN (4/5)

15 3. VG-RAN WNN (5/5) A threshold τ may be used with the function f(d j, c i ) to define the set of categories to be assigned to the test document.

16 4. ML-KNN Multi-Label K-Nearest Neighbors, ML-KNN. –(Zhang & Zhou, 2007) The ML-KNN categorizer is derived from the popular KNN algorithm. It is based on the estimate of the probability of a category to be assigned to a test document d j considering the occurrence of that category on the k nearest neighbors of d j. If that category is assigned to the majority (more than 50%) of the k neighbors of d j, then that category is also assigned to d j, and not assigned otherwise.

17 5. Experimental evaluation (1/3) Event Associative Machine (MAE) –An open source framework for modeling VG- RAM neural networks developed at the Universidade Federaldo Espírito Santo. Neural Representation Modeler (NRM) –Developed by the Neural Systems Engineering Group at Imperial College London. –Commercialized by Novel Technical Solutions.

18 5. Experimental evaluation (2/3) 3 differences between MAE and NRM: –Open source, –Runs on UNIX (and Linux), –Uses a textual language to describe WNNs. MAE Neural Architecture Description Language (NADL) –A built-in graphical user interface. –An interpreter of the MAE Control Script Language (CDL).

19 5. Experimental evaluation (3/3)

Categorization of free-text descriptions of economic activities (1/3) In Brazil, social contracts contain the statement of purpose of the company. –Classificacão Nacional de Atividades Econômicas, CNAE (National Classification of Economic Activities).

Categorization of free-text descriptions of economic activities (2/3)

Categorization of free-text descriptions of economic activities (3/3)

Categorization of web pages (1/3) Yahoo directory (

Categorization of web pages (2/3)

Categorization of web pages (3/3)

Conclusions In the categorization of free-text descriptions of economic activities, VG-RAM WNN outperformed ML-KNN in terms of the four multi-label evaluation metrics adopted. In the categorization of Web pages, VG-RAM WNN outperformed ML-KNN in terms of hamming loss, coverage and average precision, and showed similar categorization performance in terms of one-error.

Future work To compare VG-RAM WNN performance against other multi-label text categorization methods. To examine correlated VG-RAM WNN and other mechanisms for taking advantage of the correlation between categories. To evaluate the categorization performance of VG- RAM WNN using different multi-label categorization problems (image annotation & gene function prediction).