Download presentation
Presentation is loading. Please wait.
Published byMadeleine Walters Modified over 9 years ago
1
Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project
2
Supervised Learning with Labeled Data Labeled data is required in large quantities and can be very expensive to collect.
3
Why use Unlabeled data? Very Cheap in the case of text Web Pages Newsgroups Email Messages May not be equally useful as labeled data but is available in enormous quantities
4
Goal Make learning more efficient and easy by reducing the amount of labeled data required for text classification with a large number of categories
5
ECOC very accurate and efficient for text categorization with a large number of classes Co-Training useful for combining labeled and unlabeled data with a small number of classes
6
Related research with unlabeled data Using EM in a generative model (Nigam et al. 1999) Transductive SVMs (Joachims 1999) Co-Training type algorithms (Blum & Mitchell 1998, Collins & Singer 1999, Nigam & Ghani 2000)
7
What is ECOC? Solve multiclass problems by decomposing them into multiple binary problems ( Dietterich & Bakiri 1995 ) Use a learner to learn the binary problems
8
Training ECOC 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0 0 1 ABCDABCD f 1 f 2 f 3 f 4 f 5 X 1 1 1 1 0 Testing ECOC
9
The Co-training algorithm Loop (while unlabeled documents remain): Build classifiers A and B Use Naïve Bayes Classify unlabeled documents with A & B Use Naïve Bayes Add most confident A predictions and most confident B predictions as labeled training examples [Blum & Mitchell 1998]
10
The Co-training Algorithm Naïve Bayes on B Naïve Bayes on A Learn from labeled data Estimate labels Estimate labels Select most confident Select most confident Add to labeled data [Blum & Mitchell, 1998]
11
One Intuition behind co- training A and B are redundant A features independent of B features Co-training like learning with random classification noise Most confident A prediction gives random B Small misclassification error for A
12
ECOC + CoTraining = ECoTrain ECOC decomposes multiclass problems into binary problems Co-Training works great with binary problems ECOC + Co-Train = Learn each binary problem in ECOC with Co-Training
13
SPORTS SCIENCE ARTS HEALTH POLITICS LAW
14
What happens with sparse data?
15
ECOC+CoTrain - Results Algorithm300L+ 0U Per Class 50L + 250U Per Class 5L + 295U Per Class Naïve BayesUses No Unlabeled Data 766740.3 ECOC 15bit76.568.549.2 EMUses Unlabeled Data - 105Class Problem 68.251.4 Co-Train 67.650.1 ECoTrain (ECOC + Co- Training) Uses Unlabeled Data 72.056.1
16
Datasets Hoovers-255 Collection of 4285 corporate websites Each company is classified into one of 255 categories Baseline 2% Jobs-65 (from WhizBang) Job Postings (Two feature sets – Title, Description) 65 categories Baseline 11%
18
Results DatasetNaïve Bayes (No UnLabeled Data) ECOC (No UnLabeled Data) EMCo- Trainin g ECOC + Co- Trainin g 10% Labeled 100% Labeled 10% Labeled 100% Labeled 10% Labeled Jobs-6550.168.259.371.258.254.164.5 Hoovers- 255 15.232.024.836.59.110.227.6
19
Results
20
What Next? Use improved version of co-training (gradient descent) Less prone to random fluctuations Uses all unlabeled data at every iteration Use Co-EM (Nigam & Ghani 2000) - hybrid of EM and Co-Training
21
Summary
22
Use ECOC for efficient text classification with a large number of categories Reduce code length without sacrificing performance Fix code length and Increase Performance Generalize to domain-independent classification tasks involving a large number of categories
23
The Feature Split Setting …My research advisor… …Professor Blum… …My grandson… Tom Mitchell Fredkin Professor of AI… Avrim Blum My research interests are… Johnny I like horsies! Classifier AClassifier B ??
24
The Co-training setting …My advisor… …Professor Blum… …My grandson… Tom Mitchell Fredkin Professor of AI… Avrim Blum My research interests are… Johnny I like horsies! Classifier AClassifier B
25
Learning from Labeled and Unlabeled Data: Using Feature Splits Co-training [Blum & Mitchell 98] Meta-bootstrapping [Riloff & Jones 99] coBoost [Collins & Singer 99] Unsupervised WSD [Yarowsky 95] Consider this the co-training setting
26
Learning from Labeled and Unlabeled Data: Extending supervised learning MaxEnt Discrimination [Jaakkola et al. 99] Expectation Maximization [Nigam et al. 98] Transductive SVMs [Joachims 99]
27
Using Unlabeled Data with EM Estimate labels of the unlabeled documents Use all documents to build a new naïve Bayes classifier Naïve Bayes
28
Co-training vs. EM Co-training Uses feature split Incremental labeling Hard labels EM Ignores feature split Iterative labeling Probabilistic labels Which differences matter?
29
Hybrids of Co-training and EM YesNo Incrementalco-trainingself-training Iterativeco-EMEM Uses Feature Split? Labeling Naïve Bayes on A Naïve Bayes on B Label allLearn from all Naïve Bayes on A & B Label all Add only best Label allLearn from all
30
Text Classification with naïve Bayes “Bag of Words” document representation Naïve Bayes classification: Estimate parameters of generative model:
31
Meanwhile the black fish swam far away. Experience the great thrill of our roller coaster. The speaker, Dr. Mary Rosen, will discuss effects… The Feature Split Setting Classifier A Classifier B ??
32
Learning from Unlabeled Data using Feature Splits coBoost [Collins & Singer 99] Meta-bootstrapping [Riloff & Jones 99] Unsupervised WSD [Yarowsky 95] Co-training [Blum & Mitchell 98]
33
Intuition behind Co-training A and B are redundant A features independent of B features Co-training like learning with random classification noise Most confident A prediction gives random B Small misclassification error for A
34
Extending Supervised Learning with Unlabeled Data Transductive SVMs [Joachims 99] MaxEnt Discrimination [Jaakkola et al. 99] Expectation-Maximization [Nigam et al. 98]
35
Using Unlabeled Data with EM Estimate labels of unlabeled documents Use all documents to rebuild naïve Bayes classifier Naïve Bayes [Nigam, McCallum, Thrun & Mitchell, 1998] Initially learn from labeled only
36
Co-EM Naïve Bayes on A Naïve Bayes on B Estimate labels Build naïve Bayes with all data Estimate labels Build naïve Bayes with all data Use Feature Split? EMco-EMLabel All co-trainingLabel Few NoYes Initialize with labeled data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.