Co Training Presented by: Shankar B S DMML Lab
Bootstrapping Bootstrapping – Use initial labeled data to build a predictive labeling procedure. Use the newly labeled data to build a new predictive procedure Example: 1. EM algorithm – In each iteration the model parameters are updated. The model defines a joint probability distribution on the observed data. 2. Rule based bootstrapping
Two views – X1, X2 Two distinct hypothesis classes H1, H2 consisting of functions predicting Y from X1 and X2 respectively Bootstrap using h1єH1, h2єH2 “If X1 is conditionally independent of X2 given Y then given a weak predictor in H1 and given an algorithm which can learn H2 under random misclassification noise, then it is possible to learn a good predictor in H2” Co-Training
Example Description of a web page can be partitioned into Words occurring on that page Words occurring on the hyperlinks pointing to that page (Anchor text) Train two learning algorithms on each view. Use the predictions of each algorithm on unlabeled example to enlarge training set of the other
Co-training framework Instance space X = X1*X2, X1,X2 re two different views of same example Label ‘ l = f1(X1) = f2(X2) = f(X) f1,f2 are target functions, f is combined target function C1 and C2 are concept classes defined over X1, X2 f1єC1, f2єC2; f = (f1,f2) єC1*C2 Even if C1 and C2 are large concepts with high complexity, the set of compatible target functions might be simpler and smaller
Co-Training framework X1 = X2 = {0,1} n C1 = C2 = ‘Conjunctions over {0,1} n If first coordinate of X1 is known to be 0, then this gives a negative example of X2 If the distribution has non-zero probability only on pairs where X1= X2, then no useful information about f2 can be obtained. If X2 is conditionally independent of X1 given Y, then a new random negative example is obtained which is quite useful.
Idea 1 : Feature selection with multiple views As in Co-training suppose we have two views f1(X1) = f2(x2) = C We want to do feature selection on X1, Using X2 can reduce the number of labeled instances required Or Given a set of labeled instances X2 can be used to select better set of features
Idea 2: Feature expansion Suppose we have 2 views of same data X1 and X2 and classifier uses combined data set. If X2 is available only for some instances, we can use X1 to construct X2 for rest of the instances using the labeled training data and/or the unlabeled test data. Related to missing features problem EM algorithm KNN algorithm Median, Mean etc