Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.

Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project

Supervised Learning with Labeled Data Labeled data is required in large quantities and can be very expensive to collect.

Why use Unlabeled data?  Very Cheap in the case of text Web Pages Newsgroups Email Messages  May not be equally useful as labeled data but is available in enormous quantities

Goal  Make learning more efficient and easy by reducing the amount of labeled data required for text classification with a large number of categories

ECOC very accurate and efficient for text categorization with a large number of classes Co-Training useful for combining labeled and unlabeled data with a small number of classes

Related research with unlabeled data  Using EM in a generative model (Nigam et al. 1999)  Transductive SVMs (Joachims 1999)  Co-Training type algorithms (Blum & Mitchell 1998, Collins & Singer 1999, Nigam & Ghani 2000)

What is ECOC?  Solve multiclass problems by decomposing them into multiple binary problems ( Dietterich & Bakiri 1995 )  Use a learner to learn the binary problems

Training ECOC 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0 0 1 ABCDABCD f 1 f 2 f 3 f 4 f 5 X 1 1 1 1 0 Testing ECOC

The Co-training algorithm  Loop (while unlabeled documents remain): Build classifiers A and B Use Naïve Bayes Classify unlabeled documents with A & B Use Naïve Bayes Add most confident A predictions and most confident B predictions as labeled training examples [Blum & Mitchell 1998]

The Co-training Algorithm Naïve Bayes on B Naïve Bayes on A Learn from labeled data Estimate labels Estimate labels Select most confident Select most confident Add to labeled data [Blum & Mitchell, 1998]

One Intuition behind co- training  A and B are redundant  A features independent of B features  Co-training like learning with random classification noise Most confident A prediction gives random B Small misclassification error for A

ECOC + CoTraining = ECoTrain  ECOC decomposes multiclass problems into binary problems  Co-Training works great with binary problems  ECOC + Co-Train = Learn each binary problem in ECOC with Co-Training

SPORTS SCIENCE ARTS HEALTH POLITICS LAW

What happens with sparse data?

ECOC+CoTrain - Results Algorithm300L+ 0U Per Class 50L + 250U Per Class 5L + 295U Per Class Naïve BayesUses No Unlabeled Data 766740.3 ECOC 15bit76.568.549.2 EMUses Unlabeled Data - 105Class Problem 68.251.4 Co-Train 67.650.1 ECoTrain (ECOC + Co- Training) Uses Unlabeled Data 72.056.1

Datasets  Hoovers-255 Collection of 4285 corporate websites Each company is classified into one of 255 categories Baseline 2%  Jobs-65 (from WhizBang) Job Postings (Two feature sets – Title, Description) 65 categories Baseline 11%

Results DatasetNaïve Bayes (No UnLabeled Data) ECOC (No UnLabeled Data) EMCo- Trainin g ECOC + Co- Trainin g 10% Labeled 100% Labeled 10% Labeled 100% Labeled 10% Labeled Jobs-6550.168.259.371.258.254.164.5 Hoovers- 255 15.232.024.836.59.110.227.6

Results

What Next?  Use improved version of co-training (gradient descent) Less prone to random fluctuations Uses all unlabeled data at every iteration  Use Co-EM (Nigam & Ghani 2000) - hybrid of EM and Co-Training

Summary

 Use ECOC for efficient text classification with a large number of categories  Reduce code length without sacrificing performance  Fix code length and Increase Performance  Generalize to domain-independent classification tasks involving a large number of categories

The Feature Split Setting …My research advisor… …Professor Blum… …My grandson… Tom Mitchell Fredkin Professor of AI… Avrim Blum My research interests are… Johnny I like horsies! Classifier AClassifier B  ??

The Co-training setting …My advisor… …Professor Blum… …My grandson… Tom Mitchell Fredkin Professor of AI… Avrim Blum My research interests are… Johnny I like horsies! Classifier AClassifier B

Learning from Labeled and Unlabeled Data: Using Feature Splits  Co-training [Blum & Mitchell 98]  Meta-bootstrapping [Riloff & Jones 99]  coBoost [Collins & Singer 99]  Unsupervised WSD [Yarowsky 95]  Consider this the co-training setting

Learning from Labeled and Unlabeled Data: Extending supervised learning  MaxEnt Discrimination [Jaakkola et al. 99]  Expectation Maximization [Nigam et al. 98]  Transductive SVMs [Joachims 99]

Using Unlabeled Data with EM Estimate labels of the unlabeled documents Use all documents to build a new naïve Bayes classifier Naïve Bayes

Co-training vs. EM  Co-training Uses feature split Incremental labeling Hard labels  EM Ignores feature split Iterative labeling Probabilistic labels Which differences matter?

Hybrids of Co-training and EM YesNo Incrementalco-trainingself-training Iterativeco-EMEM Uses Feature Split? Labeling Naïve Bayes on A Naïve Bayes on B Label allLearn from all Naïve Bayes on A & B Label all Add only best Label allLearn from all

Text Classification with naïve Bayes  “Bag of Words” document representation  Naïve Bayes classification:  Estimate parameters of generative model:

Meanwhile the black fish swam far away. Experience the great thrill of our roller coaster. The speaker, Dr. Mary Rosen, will discuss effects… The Feature Split Setting Classifier A Classifier B  ??

Learning from Unlabeled Data using Feature Splits  coBoost [Collins & Singer 99]  Meta-bootstrapping [Riloff & Jones 99]  Unsupervised WSD [Yarowsky 95]  Co-training [Blum & Mitchell 98]

Intuition behind Co-training  A and B are redundant  A features independent of B features  Co-training like learning with random classification noise Most confident A prediction gives random B Small misclassification error for A

Extending Supervised Learning with Unlabeled Data  Transductive SVMs [Joachims 99]  MaxEnt Discrimination [Jaakkola et al. 99]  Expectation-Maximization [Nigam et al. 98]

Using Unlabeled Data with EM Estimate labels of unlabeled documents Use all documents to rebuild naïve Bayes classifier Naïve Bayes [Nigam, McCallum, Thrun & Mitchell, 1998] Initially learn from labeled only

Co-EM Naïve Bayes on A Naïve Bayes on B Estimate labels Build naïve Bayes with all data Estimate labels Build naïve Bayes with all data Use Feature Split? EMco-EMLabel All co-trainingLabel Few NoYes Initialize with labeled data

Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.

Similar presentations

Presentation on theme: "Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.

Similar presentations

Presentation on theme: "Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project."— Presentation transcript:

Similar presentations

About project

Feedback