SSL Chapter 4 Risk of Semi-supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers.

SSL Chapter 4 Risk of Semi-supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers

Amount of the data Is it better to have more unlabeled data?
Literature presents the positive value of unlabeled data Unlabeled data should certainly not be discarded (O’Neill, 1978)

Model selection: Correct Model
Assume Xv and Yv sampled from Xv and Yv. Suppose we know there exist parameter set s.t. P(Xv,Yv|Q) = P(Xv,Yv) => “Correct model” Extra labeled/unlabeled data will reduce the error. Labeled data is more effective.

Detailed Analysis Shahshahani & Landgrebe
Unlabeled data degrade the performance of NB with Gaussian variables. Deviations from modeling assumptions Unlabeled data should used when the labeled data alone produce poor performance. (suggestion)

Detailed Analyze Nigam et al.(2000) Reasons of poor performance
Numerical problems in the learning method Mismatch between natural clusters and actual labels There are various studies that presents the addition of unlabeled data degrades accuracy of classification.

Empirical Study Notation and assumptions Binary classification
Xv is an instance of data while Xvi is an attribute of Xv All classifiers use EM to maximize likelihood

Empirical Study Bayes classifier with increasing number of unlabeled data. Generated randomly. Xi and Xj is independent given class label Correct model

Empirical Study Tree-augmented NB is used. The model is incorrect.
Each attr. directly dependent on the class and at most another attr. The model is incorrect.

Empirical Study More complex model with TAN assumptions.
With few labeled data the performance improves. Still model is incorrect.

Empirical Study NB classifier Real data with binary classes (UCI rep.)
Better when the size of labeled data is small. Similar with previous case.

Empirical Study

Summary of first part Correct model => guarantee benefits from unlabeled data Incorrect model => may degrade performance Characteristics of the distribution of data and classes do not match. How we know that the priori is the “correct” one?

Asymptotic Bias AL : Asymptotic bias of labeled data
Au : Asymptotic bias of unlabeled data AL and Au can be different Scenario Train with labeled data s.t. the result is close to AL. Add huge amount of unlabeled data. The result may be tending to Au

Toy Problem : Gender Prediction
G: Girl B: Boy Mother craved chocolate C: Yes or No Mother’s weight gain W: More or Less W and G conditionally independent on C G->C->W P(G,C,W) = P(G) P(C|G) P(W|C)

From the independence assumption P(G = Girl | C = No) = 0.89 P(G = Boy | C = No) = 0.11 P(G = Girl | C = Yes) = 0.18 P(G = Boy | C = Yes) = 0.82 So, if C = No choose G = Girl else G = Boy

Toy Problem : Gender Prediction Incorrect Model
C <- G -> W C and W are independent P(G,C,W) = P(G)P(C|G)P(W|G) Suppose “oracle” gave us P(C|G) We need to estimate P(G) and P(W|G)

Only labeled data Unbiased mean and variance inversely proportional to the size of the DL. Even small sized DL will produce good estimates

P(G) ~ 0.5 P(W = Less | G = Girl) ~ 0.6 P(W = Less | G = Boy) ~ 0.25 P(G=Girl|C,W) P(G=Boy|C,W) C=No,W=Less 0.95 0.05 C=No,W=More 0.81 0.19 C=Yes,W=Less 0.35 0.65 0.11 0.89

Classify with the maximum a posteriori value of G The “bias” from “true” a posteriori in not zero Produce the same optimal Bayes rule with the previous case. Classifier likely to yield to minimum classification error

Toy Problem : Gender Prediction Incorrect Model + Unlabeled Data
DL / Du -> 0 P(G = Boy) = 0.5 P(W = Less | G = Girl) = 0.78 P(W = Less | G = Boy) = 0.07

The a posteriori probabilities for G P(G=Girl|C,W) P(G=Boy|C,W) C=No,W=Less 0.99 0.01 C=No,W=More 0.55 0.45 C=Yes,W=Less 0.71 0.29 0.05 0.95

3 out of 4 times classifier chooses Girl against Boy. Prediction has changed from the optimal Expected error rate increases. What Happened? Unlabeled data changed the asymptotic limit When model is incorrect the affect of unlabeled data is important

Asymptotic Analysis (Xv,Yv): Instance vector, class label
Binary classes with values -1 and +1 Assume 0-1 loss Apply Bayes rule to get Bayes Error n independent samples l: labeled u: unlabeled samples n = l + u

Asymptotic Analysis With probability (1 – h) a sample is unlabeled
With probability h a sample is labeled P(Xv,Yv | Q) is the parametric form Use EM

Asymptotic Analysis Likelihood of labeled and unlabeled data

Asymptotic Analysis Parameter estimation obtained by maximizing the
as n->infinity, it maximizes

Theorem on Asymptotic Analysis
The limiting value of Q* of maximum-likelihood estimates is

Qh* is the value of Q that maximizes the previous theorem. Ql* optimum of labeled data Qu* optimum of unlabeled data

Model is correct. P(Xv,Yv|QT ) = P(Xv,Yv) for some QT. QT =Ql*= Qu*= Qh* In this case asymptotic bias is zero.

Model is correct. Assume P(Xv,Yv) does not belong to P(Xv,Yv|Q) e(Q) is the classification error with parameter Q Assume e(Ql*) < e(Qu*)

Labeled data will train the model such that error will be e(Ql*) As we added unlabeled data the error will be closer to the e(Qu*) So using only labeled data will result a smaller classification error

SSL Chapter 4 Risk of Semi-supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers.

Similar presentations

Presentation on theme: "SSL Chapter 4 Risk of Semi-supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SSL Chapter 4 Risk of Semi-supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers.

Similar presentations

Presentation on theme: "SSL Chapter 4 Risk of Semi-supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers."— Presentation transcript:

Similar presentations

About project

Feedback