Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.

Slides:

Advertisements

Similar presentations

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Advertisements

Random Forest Predrag Radenković 3237/10

On-line learning and Boosting

Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.

Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.

1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.

2D1431 Machine Learning Boosting.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Reducing Multiclass to Binary LING572 Fei Xia Week 9: 03/04/08.

Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.

ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Text Classification from Labeled and Unlabeled Documents using EM Kamal Nigam Andrew K. McCallum Sebastian Thrun Tom Mitchell Machine Learning (2000) Presented.

Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.

Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.

Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)

Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research

Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at

Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.

Using Error-Correcting Codes for Efficient Text Categorization with a Large Number of Categories Rayid Ghani Center for Automated Learning & Discovery.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Ensembles of Classifiers Evgueni Smirnov

CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.

Machine Learning CS 165B Spring 2012

Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.

Efficient Model Selection for Support Vector Machines

Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.

Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.

Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.

Combinatorial Algorithms Reference Text: Kreher and Stinson.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.

Codes Codes are used for the following purposes: - to detect errors - to correct errors after detection Error Control Coding © Erhan A. Ince Types: -Linear.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Ensemble Based Systems in Decision Making Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: IEEE CIRCUITS AND SYSTEMS MAGAZINE 2006, Q3 Robi.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensemble Methods: Bagging and Boosting

CLASSIFICATION: Ensemble Methods

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Introducing the Separability Matrix for ECOC coding

§6 Linear Codes § 6.1 Classification of error control system § 6.2 Channel coding conception § 6.3 The generator and parity-check matrices § 6.5 Hamming.

Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.

DIGITAL COMMUNICATIONS Linear Block Codes

ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.

Learning with AdaBoost

Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.

Bayesian Averaging of Classifiers and the Overfitting Problem Rayid Ghani ML Lunch – 11/13/00.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.

Classification Ensemble Methods 1

Using Error-Correcting Codes for Efficient Text Categorization with a Large Number of Categories Rayid Ghani Advisor: Tom Mitchell.

Validation methods.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.

Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.

Classification using Co-Training

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.

Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

Learning to Detect and Classify Malicious Executables in the Wild by J

Semi-Supervised Clustering

Sofus A. Macskassy Fetch Technologies

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Bayesian Averaging of Classifiers and the Overfitting Problem

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University This presentation can be accessed at

Outline Review of ECOC Previous Work Types of Codes Experimental Results Semi-Theoretical Model Drawbacks Conclusions & Work in Progress

Overview of ECOC Decompose a multiclass problem into multiple binary problems The conversion can be independent or dependent of the data (it does depend on the number of classes) Any learner that can learn binary functions can then be used to learn the original multivalued function

Training ECOC Given m distinct classes Create an m x n binary matrix M. Each class is assigned ONE row of M. Each column of the matrix divides the classes into TWO groups. Train the Base classifiers to learn the n binary problems.

Testing ECOC To test a new instance Apply each of the n classifiers to the new instance Combine the predictions to obtain a binary string(codeword) for the new point Classify to the class with the nearest codeword (usually hamming distance is used as the distance measure)

ECOC-Picture AB C

Previous Work Combine with Boosting – ADABOOST.OC (Schapire, 1997), (Guruswami & Sahai, 1999) Local Learners Text Classification (Berger, 1999)

Experimental Setup Generate the code BCH Codes Choose a Base Learner Naive Bayes Classifier as used in text classification tasks (McCallum & Nigam 1998)

Dataset Industry Sector Dataset Consists of company web pages classified into 105 economic sectors Standard stoplist No Stemming Skip all MIME headers and HTML tags Experimental approach similar to McCallum et al. (1998) for comparison purposes.

Results Classification Accuracies on five random train-test splits of the Industry Sector dataset with a vocabulary size of ECOC is 88% accurate!

Results Industry Sector Data Set Naïve Bayes Shrinkage 1 ME 2 ME/ w Prior 3 ECOC 63-bit 66.1%76%79%81.1%88.5% ECOC reduces the error of the Naïve Bayes Classifier by 66% 1.(McCallum et al. 1998) 2,3. (Nigam et al. 1999)

The Longer the Better! Table 2: Average Classification Accuracy on 5 random train-test splits of the Industry Sector dataset with a vocabulary size of words selected using Information Gain. Longer codes mean larger codeword separation The minimum hamming distance of a code C is the smallest distance between any pair of distance codewords in C If minimum hamming distance is h, then the code can correct  (h-1)/2 errors

Size Matters?

Size does NOT matter!

Semi-Theoretical Model Model ECOC by a Binomial Distribution B(n,p) n = length of the code p = probability of each bit being classified incorrectly # of BitsH min E max P ave Accuracy

Types of Codes Data-Independent Data-Dependent Algebraic Random Hand-Constructed Adaptive

What is a Good Code? Row Separation Column Separation (Independence of errors for each binary classifier) Efficiency (for long codes)

Choosing Codes RandomAlgebraic Row SepOn Average For long codes Guaranteed Col SepOn Average For long codes Can be Guaranteed EfficiencyNoYes

Experimental Results CodeMin Row HD Max Row HD Min Col HD Max Col HD Error Rate 15-Bit BCH % 19-Bit Hybrid % 15-bit Random 2 (1.5) %

Interesting Observations NBC does not give good probabilitiy estimates- using ECOC results in better estimates.

Drawbacks Can be computationally expensive Random Codes throw away the real- world nature of the data by picking random partitions to create artificial binary problems

Conclusion Improves Classification Accuracy considerably! Can be used when training data is sparse Algebraic codes perform better than random codes for a given code lenth Hand-constructed codes are not the answer

Conclusion Improves Classification Accuracy considerably! Can be used when training data is sparse Algebraic codes perform better than random codes for a given code lenth Hand-constructed codes are not the answer

Future Work Combine ECOC with Co-Training Automatically construct optimal / adaptive codes Sufficient and Necessary conditions for optimal behavior

Future Work Combine ECOC with Co-Training or Shrinkage Methods Sufficient and Necessary conditions for optimal behavior