Download presentation
Presentation is loading. Please wait.
Published byStewart Burke Modified over 6 years ago
1
Combining Labeled and Unlabeled Data with Co-Training
Avrim Blum Tom Mitchell Madison, 1998
2
Semi-Supervised Learning
Most of ML assumes labeled training data But labels can be expensive to maintain Semi-supervised learning: Make use of unlabeled data in combination of small amount of labeled data
3
Some Co-Training Slides from a Tutorial by Avrim Blum
4
Co-training Many problems have two different sources of info you can use to determine label. E.g., classifying webpages: can use words on page or words on links pointing to the page. My Advisor Prof. Avrim Blum x2- Text info x1- Link info x - Link info & Text info
5
Co-training Idea: Use small labeled sample to learn initial rules.
E.g., “my advisor” pointing to a page is a good indicator it is a faculty home page. E.g., “I am teaching” on a page is a good indicator it is a faculty home page. my advisor
6
Co-training Idea: Use small labeled sample to learn initial rules. E.g., “my advisor” pointing to a page is a good indicator it is a faculty home page. E.g., “I am teaching” on a page is a good indicator it is a faculty home page. Then look for unlabeled examples where one rule is confident and the other is not. Have it label the example for the other. Training 2 classifiers, one on each type of info. Using each to help train the other. hx1,x2i
7
Results: webpages 12 labeled examples, 1000 unlabeled
8
Sample Run
9
Co-Training Theorems [BM98] if x1, x2 are independent given the label, and if have alg that is robust to noise, then can learn from an initial “weakly-useful” rule plus unlabeled data. Faculty home pages Faculty with advisees “My advisor”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.