Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.

Slides:



Advertisements
Similar presentations
Co Training Presented by: Shankar B S DMML Lab
Advertisements

Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Active Learning to Classify
TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Final review LING572 Fei Xia Week 10: 03/13/08 1.
Text Classification With Support Vector Machines
The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.
Text Learning Tom M. Mitchell Aladdin Workshop Carnegie Mellon University January 2003.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer Yimeng Zhang March 1 st, 2007.
Semi Supervised Learning Qiang Yang –Adapted from… Thanks –Zhi-Hua Zhou – ople/zhouzh/ –LAMDA.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research.
ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.
Text Classification from Labeled and Unlabeled Documents using EM Kamal Nigam Andrew K. McCallum Sebastian Thrun Tom Mitchell Machine Learning (2000) Presented.
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at
Holistic Web Page Classification William W. Cohen Center for Automated Learning and Discovery (CALD) Carnegie-Mellon University.
Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Using Error-Correcting Codes for Efficient Text Categorization with a Large Number of Categories Rayid Ghani Center for Automated Learning & Discovery.
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learing.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
August 16, 2015EECS, OSU1 Learning with Ambiguously Labeled Training Data Kshitij Judah Ph.D. student Advisor: Prof. Alan Fern Qualifier Oral Presentation.
Semi-Supervised Learning
Ensembles of Classifiers Evgueni Smirnov
Final review LING572 Fei Xia Week 10: 03/11/
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Semi-Supervised Learning over Text Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 2006 Modified by Charles Ling.
Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.
Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
1 COMP3503 Semi-Supervised Learning COMP3503 Semi-Supervised Learning Daniel L. Silver.
Cross-training: Learning probabilistic relations between taxonomies Sunita Sarawagi Soumen Chakrabarti Shantanu Godbole IIT Bombay.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Bootstrapping Information Extraction with Unlabeled Data Rayid Ghani Accenture Technology Labs Rosie Jones Carnegie Mellon University & Overture (With.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Report on Semi-supervised Training for Statistical Parsing Zhang Hao
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Semi-automatic Product Attribute Extraction from Store Website
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Using Error-Correcting Codes for Efficient Text Categorization with a Large Number of Categories Rayid Ghani Advisor: Tom Mitchell.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Learning from Labeled and Unlabeled Data Tom Mitchell Statistical Approaches to Learning and Discovery, and March 31, 2003.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
Classification using Co-Training
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
Ensembles of Classifiers Evgueni Smirnov. Outline 1 Methods for Independently Constructing Ensembles 1.1 Bagging 1.2 Randomness Injection 1.3 Feature-Selection.
Wrapper Learning: Cohen et al 2002; Kushmeric 2000; Kushmeric & Frietag 2000 William Cohen 1/26/03.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Combining Labeled and Unlabeled Data with Co-Training
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Presentation transcript:

Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project

Supervised Learning with Labeled Data Labeled data is required in large quantities and can be very expensive to collect.

Why use Unlabeled data?  Very Cheap in the case of text Web Pages Newsgroups Messages  May not be equally useful as labeled data but is available in enormous quantities

Goal  Make learning more efficient and easy by reducing the amount of labeled data required for text classification with a large number of categories

ECOC very accurate and efficient for text categorization with a large number of classes Co-Training useful for combining labeled and unlabeled data with a small number of classes

Related research with unlabeled data  Using EM in a generative model (Nigam et al. 1999)  Transductive SVMs (Joachims 1999)  Co-Training type algorithms (Blum & Mitchell 1998, Collins & Singer 1999, Nigam & Ghani 2000)

What is ECOC?  Solve multiclass problems by decomposing them into multiple binary problems ( Dietterich & Bakiri 1995 )  Use a learner to learn the binary problems

Training ECOC ABCDABCD f 1 f 2 f 3 f 4 f 5 X Testing ECOC

The Co-training algorithm  Loop (while unlabeled documents remain): Build classifiers A and B Use Naïve Bayes Classify unlabeled documents with A & B Use Naïve Bayes Add most confident A predictions and most confident B predictions as labeled training examples [Blum & Mitchell 1998]

The Co-training Algorithm Naïve Bayes on B Naïve Bayes on A Learn from labeled data Estimate labels Estimate labels Select most confident Select most confident Add to labeled data [Blum & Mitchell, 1998]

One Intuition behind co- training  A and B are redundant  A features independent of B features  Co-training like learning with random classification noise Most confident A prediction gives random B Small misclassification error for A

ECOC + CoTraining = ECoTrain  ECOC decomposes multiclass problems into binary problems  Co-Training works great with binary problems  ECOC + Co-Train = Learn each binary problem in ECOC with Co-Training

SPORTS SCIENCE ARTS HEALTH POLITICS LAW

What happens with sparse data?

ECOC+CoTrain - Results Algorithm300L+ 0U Per Class 50L + 250U Per Class 5L + 295U Per Class Naïve BayesUses No Unlabeled Data ECOC 15bit EMUses Unlabeled Data - 105Class Problem Co-Train ECoTrain (ECOC + Co- Training) Uses Unlabeled Data

Datasets  Hoovers-255 Collection of 4285 corporate websites Each company is classified into one of 255 categories Baseline 2%  Jobs-65 (from WhizBang) Job Postings (Two feature sets – Title, Description) 65 categories Baseline 11%

Results DatasetNaïve Bayes (No UnLabeled Data) ECOC (No UnLabeled Data) EMCo- Trainin g ECOC + Co- Trainin g 10% Labeled 100% Labeled 10% Labeled 100% Labeled 10% Labeled Jobs Hoovers

Results

What Next?  Use improved version of co-training (gradient descent) Less prone to random fluctuations Uses all unlabeled data at every iteration  Use Co-EM (Nigam & Ghani 2000) - hybrid of EM and Co-Training

Summary

 Use ECOC for efficient text classification with a large number of categories  Reduce code length without sacrificing performance  Fix code length and Increase Performance  Generalize to domain-independent classification tasks involving a large number of categories

The Feature Split Setting …My research advisor… …Professor Blum… …My grandson… Tom Mitchell Fredkin Professor of AI… Avrim Blum My research interests are… Johnny I like horsies! Classifier AClassifier B  ??

The Co-training setting …My advisor… …Professor Blum… …My grandson… Tom Mitchell Fredkin Professor of AI… Avrim Blum My research interests are… Johnny I like horsies! Classifier AClassifier B

Learning from Labeled and Unlabeled Data: Using Feature Splits  Co-training [Blum & Mitchell 98]  Meta-bootstrapping [Riloff & Jones 99]  coBoost [Collins & Singer 99]  Unsupervised WSD [Yarowsky 95]  Consider this the co-training setting

Learning from Labeled and Unlabeled Data: Extending supervised learning  MaxEnt Discrimination [Jaakkola et al. 99]  Expectation Maximization [Nigam et al. 98]  Transductive SVMs [Joachims 99]

Using Unlabeled Data with EM Estimate labels of the unlabeled documents Use all documents to build a new naïve Bayes classifier Naïve Bayes

Co-training vs. EM  Co-training Uses feature split Incremental labeling Hard labels  EM Ignores feature split Iterative labeling Probabilistic labels Which differences matter?

Hybrids of Co-training and EM YesNo Incrementalco-trainingself-training Iterativeco-EMEM Uses Feature Split? Labeling Naïve Bayes on A Naïve Bayes on B Label allLearn from all Naïve Bayes on A & B Label all Add only best Label allLearn from all

Text Classification with naïve Bayes  “Bag of Words” document representation  Naïve Bayes classification:  Estimate parameters of generative model:

Meanwhile the black fish swam far away. Experience the great thrill of our roller coaster. The speaker, Dr. Mary Rosen, will discuss effects… The Feature Split Setting Classifier A Classifier B  ??

Learning from Unlabeled Data using Feature Splits  coBoost [Collins & Singer 99]  Meta-bootstrapping [Riloff & Jones 99]  Unsupervised WSD [Yarowsky 95]  Co-training [Blum & Mitchell 98]

Intuition behind Co-training  A and B are redundant  A features independent of B features  Co-training like learning with random classification noise Most confident A prediction gives random B Small misclassification error for A

Extending Supervised Learning with Unlabeled Data  Transductive SVMs [Joachims 99]  MaxEnt Discrimination [Jaakkola et al. 99]  Expectation-Maximization [Nigam et al. 98]

Using Unlabeled Data with EM Estimate labels of unlabeled documents Use all documents to rebuild naïve Bayes classifier Naïve Bayes [Nigam, McCallum, Thrun & Mitchell, 1998] Initially learn from labeled only

Co-EM Naïve Bayes on A Naïve Bayes on B Estimate labels Build naïve Bayes with all data Estimate labels Build naïve Bayes with all data Use Feature Split? EMco-EMLabel All co-trainingLabel Few NoYes Initialize with labeled data