Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.

Slides:

Advertisements

Similar presentations

Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.

For Wednesday Read chapter 19, sections 1-3 No homework.

Data Mining Classification: Alternative Techniques

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Active Learning to Classify

Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Discriminative and generative methods for bags of features

Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.

Final review LING572 Fei Xia Week 10: 03/13/08 1.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Text Classification With Support Vector Machines

Reducing Multiclass to Binary LING572 Fei Xia Week 9: 03/04/08.

Multi-Class Object Recognition Using Shared SIFT Features

Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.

ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Text Classification from Labeled and Unlabeled Documents using EM Kamal Nigam Andrew K. McCallum Sebastian Thrun Tom Mitchell Machine Learning (2000) Presented.

Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.

Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.

1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.

Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at

Using Error-Correcting Codes for Efficient Text Categorization with a Large Number of Categories Rayid Ghani Center for Automated Learning & Discovery.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Online Learning Algorithms

Semi-Supervised Learning

Ensembles of Classifiers Evgueni Smirnov

Advanced Multimedia Text Classification Tamara Berg.

Final review LING572 Fei Xia Week 10: 03/11/

Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.

Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.

Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Introducing the Separability Matrix for ECOC coding

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.

Authentication protocol providing user anonymity and untraceability in wireless mobile communication systems Computer Networks Volume: 44, Issue: 2, February.

Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.

Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

Using Error-Correcting Codes for Efficient Text Categorization with a Large Number of Categories Rayid Ghani Advisor: Tom Mitchell.

KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.

KNN & Naïve Bayes Hongning Wang

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Semi-Supervised Clustering

Sofus A. Macskassy Fetch Technologies

CSC 594 Topics in AI – Natural Language Processing

A Simple Artificial Neuron

Instance Based Learning

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Presentation transcript:

Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal

Text Categorization Numerous Applications Search Engines/Portals Customer Service …. Domains: Topics Genres Languages $$$ Making

How do people deal with a large number of classes?  Use fast multiclass algorithms (Naïve Bayes) Builds one model per class  Use Binary classification algorithms (SVMs) and break an n class problems into n binary problems  What happens with a 1000 class problem?  Can we do better?

ECOC to the Rescue!  An n-class problem can be solved by solving log 2 n problems  More efficient than one-per-class  Does it actually perform better?

What is ECOC?  Solve multiclass problems by decomposing them into multiple binary problems  Use a learner to learn the binary problems

Training ECOC ABCDABCD f 1 f 2 f 3 f 4 f 5 X Testing ECOC

ECOC - Picture ABCDABCD A D C B f 1 f 2 f 3 f 4 f 5

ECOC - Picture ABCDABCD A D C B f 1 f 2 f 3 f 4 f 5

ECOC - Picture ABCDABCD A D C B f 1 f 2 f 3 f 4 f 5

ECOC - Picture ABCDABCD A D C B f 1 f 2 f 3 f 4 f 5 X

Classification Performance EfficiencyEfficiency NB ECOC Preliminary Results This Proposal ECOC reduces the error of the Naïve Bayes Classifier by 66% with no increase in computational cost

Proposed Solutions  Design codewords that minimize cost and maximize “performance”  Investigate the assignment of codewords to classes  Learn the decoding function  Incorporate unlabeled data into ECOC

Use unlabeled data  Current learning algorithms using unlabeled data (EM, Co-Training) don’t work well with a large number of categories  ECOC works great with a large number of classes but there is no framework for usaing unlabeled data

Use Unlabeled Data  ECOC decomposes multiclass problems into binary problems  Co-Training works great with binary problems  ECOC + Co-Train = Learn each binary problem in ECOC with Co-Training (and variants of Co-Training such as Co-EM)

Summary

Testing ECOC  To test a new instance Apply each of the n classifiers to the new instance Combine the predictions to obtain a binary string(codeword) for the new point Classify to the class with the nearest codeword (usually hamming distance is used as the distance measure)

The Decoding Step  Standard: Map to the nearest codeword according to hamming distance  Can we do better?

The Real Question?  Tradeoff between “learnability” of binary problems and the error-correcting power of the code

Codeword assignment  Standard Procedure: Assign codewords to classes randomly  Can we do better?

Goal of Current Research  Improve classification performance without increasing cost Design short codes that perform well Develop algorithms that increase performance without affecting code length

Previous Results  Performance increases with length of code  Gives the same percentage increase in performance over NB regardless of training set size  BCH Codes > Random Codes > Hand- constructed Codes

Others have shown that ECOC  Works great with arbitrary long codes  Longer codes = More Error-Correcting Power = Better Performance  Longer codes = More Computational Cost

ECOC to the Rescue!  An n-class problem can be solved by solving log 2 n problems  More efficient than one-per-class  Does it actually perform better?

Previous Results Industry Sector Data Set Naïve Bayes Shrinkage 1 ME 2 ME/ w Prior 3 ECOC 63-bit 66.1%76%79%81.1%88.5% ECOC reduces the error of the Naïve Bayes Classifier by 66% with no increase in computational cost 1.(McCallum et al. 1998) 2,3. (Nigam et al. 1999) (Ghani 2000)

Design codewords  Maximize Performance (Accuracy, Precision, Recall, F1?)  Minimize length of codes  Search in the space of codewords through gradient descent G=Error + Code_Length

Codeword Assignment  Generate the confusion matrix and use that to assign the most confusable classes the codewords that are farthest apart  Pros Focusing on confusable classes more can help  Cons Individual binary problems can be very hard

The Decoding Step  Weight the individual classifiers according to their training accuracies and do weighted majority decoding.  Pose the decoding as a separate learning problem and use regression/Neural Network