Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.

Similar presentations


Presentation on theme: "Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project."— Presentation transcript:

1 Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project

2 Supervised Learning with Labeled Data Labeled data is required in large quantities and can be very expensive to collect.

3 Why use Unlabeled data?  Very Cheap in the case of text Web Pages Newsgroups Email Messages  May not be equally useful as labeled data but is available in enormous quantities

4 Goal  Make learning more efficient and easy by reducing the amount of labeled data required for text classification with a large number of categories

5 ECOC very accurate and efficient for text categorization with a large number of classes Co-Training useful for combining labeled and unlabeled data with a small number of classes

6 Related research with unlabeled data  Using EM in a generative model (Nigam et al. 1999)  Transductive SVMs (Joachims 1999)  Co-Training type algorithms (Blum & Mitchell 1998, Collins & Singer 1999, Nigam & Ghani 2000)

7 What is ECOC?  Solve multiclass problems by decomposing them into multiple binary problems ( Dietterich & Bakiri 1995 )  Use a learner to learn the binary problems

8 Training ECOC 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0 0 1 ABCDABCD f 1 f 2 f 3 f 4 f 5 X 1 1 1 1 0 Testing ECOC

9 The Co-training algorithm  Loop (while unlabeled documents remain): Build classifiers A and B Use Naïve Bayes Classify unlabeled documents with A & B Use Naïve Bayes Add most confident A predictions and most confident B predictions as labeled training examples [Blum & Mitchell 1998]

10 The Co-training Algorithm Naïve Bayes on B Naïve Bayes on A Learn from labeled data Estimate labels Estimate labels Select most confident Select most confident Add to labeled data [Blum & Mitchell, 1998]

11 One Intuition behind co- training  A and B are redundant  A features independent of B features  Co-training like learning with random classification noise Most confident A prediction gives random B Small misclassification error for A

12 ECOC + CoTraining = ECoTrain  ECOC decomposes multiclass problems into binary problems  Co-Training works great with binary problems  ECOC + Co-Train = Learn each binary problem in ECOC with Co-Training

13 SPORTS SCIENCE ARTS HEALTH POLITICS LAW

14 What happens with sparse data?

15 ECOC+CoTrain - Results Algorithm300L+ 0U Per Class 50L + 250U Per Class 5L + 295U Per Class Naïve BayesUses No Unlabeled Data 766740.3 ECOC 15bit76.568.549.2 EMUses Unlabeled Data - 105Class Problem 68.251.4 Co-Train 67.650.1 ECoTrain (ECOC + Co- Training) Uses Unlabeled Data 72.056.1

16 Datasets  Hoovers-255 Collection of 4285 corporate websites Each company is classified into one of 255 categories Baseline 2%  Jobs-65 (from WhizBang) Job Postings (Two feature sets – Title, Description) 65 categories Baseline 11%

17

18 Results DatasetNaïve Bayes (No UnLabeled Data) ECOC (No UnLabeled Data) EMCo- Trainin g ECOC + Co- Trainin g 10% Labeled 100% Labeled 10% Labeled 100% Labeled 10% Labeled Jobs-6550.168.259.371.258.254.164.5 Hoovers- 255 15.232.024.836.59.110.227.6

19 Results

20 What Next?  Use improved version of co-training (gradient descent) Less prone to random fluctuations Uses all unlabeled data at every iteration  Use Co-EM (Nigam & Ghani 2000) - hybrid of EM and Co-Training

21 Summary

22  Use ECOC for efficient text classification with a large number of categories  Reduce code length without sacrificing performance  Fix code length and Increase Performance  Generalize to domain-independent classification tasks involving a large number of categories

23 The Feature Split Setting …My research advisor… …Professor Blum… …My grandson… Tom Mitchell Fredkin Professor of AI… Avrim Blum My research interests are… Johnny I like horsies! Classifier AClassifier B  ??

24 The Co-training setting …My advisor… …Professor Blum… …My grandson… Tom Mitchell Fredkin Professor of AI… Avrim Blum My research interests are… Johnny I like horsies! Classifier AClassifier B

25 Learning from Labeled and Unlabeled Data: Using Feature Splits  Co-training [Blum & Mitchell 98]  Meta-bootstrapping [Riloff & Jones 99]  coBoost [Collins & Singer 99]  Unsupervised WSD [Yarowsky 95]  Consider this the co-training setting

26 Learning from Labeled and Unlabeled Data: Extending supervised learning  MaxEnt Discrimination [Jaakkola et al. 99]  Expectation Maximization [Nigam et al. 98]  Transductive SVMs [Joachims 99]

27 Using Unlabeled Data with EM Estimate labels of the unlabeled documents Use all documents to build a new naïve Bayes classifier Naïve Bayes

28 Co-training vs. EM  Co-training Uses feature split Incremental labeling Hard labels  EM Ignores feature split Iterative labeling Probabilistic labels Which differences matter?

29 Hybrids of Co-training and EM YesNo Incrementalco-trainingself-training Iterativeco-EMEM Uses Feature Split? Labeling Naïve Bayes on A Naïve Bayes on B Label allLearn from all Naïve Bayes on A & B Label all Add only best Label allLearn from all

30 Text Classification with naïve Bayes  “Bag of Words” document representation  Naïve Bayes classification:  Estimate parameters of generative model:

31 Meanwhile the black fish swam far away. Experience the great thrill of our roller coaster. The speaker, Dr. Mary Rosen, will discuss effects… The Feature Split Setting Classifier A Classifier B  ??

32 Learning from Unlabeled Data using Feature Splits  coBoost [Collins & Singer 99]  Meta-bootstrapping [Riloff & Jones 99]  Unsupervised WSD [Yarowsky 95]  Co-training [Blum & Mitchell 98]

33 Intuition behind Co-training  A and B are redundant  A features independent of B features  Co-training like learning with random classification noise Most confident A prediction gives random B Small misclassification error for A

34 Extending Supervised Learning with Unlabeled Data  Transductive SVMs [Joachims 99]  MaxEnt Discrimination [Jaakkola et al. 99]  Expectation-Maximization [Nigam et al. 98]

35 Using Unlabeled Data with EM Estimate labels of unlabeled documents Use all documents to rebuild naïve Bayes classifier Naïve Bayes [Nigam, McCallum, Thrun & Mitchell, 1998] Initially learn from labeled only

36 Co-EM Naïve Bayes on A Naïve Bayes on B Estimate labels Build naïve Bayes with all data Estimate labels Build naïve Bayes with all data Use Feature Split? EMco-EMLabel All co-trainingLabel Few NoYes Initialize with labeled data


Download ppt "Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project."

Similar presentations


Ads by Google