Download presentation
Presentation is loading. Please wait.
Published byDorthy Hodges Modified over 6 years ago
1
SSL Chapter 4 Risk of Semi-supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers
2
Amount of the data Is it better to have more unlabeled data?
Literature presents the positive value of unlabeled data Unlabeled data should certainly not be discarded (O’Neill, 1978)
3
Model selection: Correct Model
Assume Xv and Yv sampled from Xv and Yv. Suppose we know there exist parameter set s.t. P(Xv,Yv|Q) = P(Xv,Yv) => “Correct model” Extra labeled/unlabeled data will reduce the error. Labeled data is more effective.
4
Detailed Analysis Shahshahani & Landgrebe
Unlabeled data degrade the performance of NB with Gaussian variables. Deviations from modeling assumptions Unlabeled data should used when the labeled data alone produce poor performance. (suggestion)
5
Detailed Analyze Nigam et al.(2000) Reasons of poor performance
Numerical problems in the learning method Mismatch between natural clusters and actual labels There are various studies that presents the addition of unlabeled data degrades accuracy of classification.
6
Empirical Study Notation and assumptions Binary classification
Xv is an instance of data while Xvi is an attribute of Xv All classifiers use EM to maximize likelihood
7
Empirical Study Bayes classifier with increasing number of unlabeled data. Generated randomly. Xi and Xj is independent given class label Correct model
8
Empirical Study Tree-augmented NB is used. The model is incorrect.
Each attr. directly dependent on the class and at most another attr. The model is incorrect.
9
Empirical Study More complex model with TAN assumptions.
With few labeled data the performance improves. Still model is incorrect.
10
Empirical Study NB classifier Real data with binary classes (UCI rep.)
Better when the size of labeled data is small. Similar with previous case.
11
Empirical Study
12
Summary of first part Correct model => guarantee benefits from unlabeled data Incorrect model => may degrade performance Characteristics of the distribution of data and classes do not match. How we know that the priori is the “correct” one?
13
Asymptotic Bias AL : Asymptotic bias of labeled data
Au : Asymptotic bias of unlabeled data AL and Au can be different Scenario Train with labeled data s.t. the result is close to AL. Add huge amount of unlabeled data. The result may be tending to Au
14
Toy Problem : Gender Prediction
G: Girl B: Boy Mother craved chocolate C: Yes or No Mother’s weight gain W: More or Less W and G conditionally independent on C G->C->W P(G,C,W) = P(G) P(C|G) P(W|C)
15
Toy Problem : Gender Prediction
P(G = Boy) = 0.5 P(C = No | G = Boy) = 0.1 P(C = No | G = Boy) = 0.8 P(W = Less | C = No) = 0.7 P(W = Less | C = Yes) = 0.2 We can compute P(W = Less | G = Boy) = 0.25 P(W = Less | G = Girl) = 0.6
16
Toy Problem : Gender Prediction
From the independence assumption P(G = Girl | C = No) = 0.89 P(G = Boy | C = No) = 0.11 P(G = Girl | C = Yes) = 0.18 P(G = Boy | C = Yes) = 0.82 So, if C = No choose G = Girl else G = Boy
17
Toy Problem : Gender Prediction Incorrect Model
C <- G -> W C and W are independent P(G,C,W) = P(G)P(C|G)P(W|G) Suppose “oracle” gave us P(C|G) We need to estimate P(G) and P(W|G)
18
Toy Problem : Gender Prediction Incorrect Model
Only labeled data Unbiased mean and variance inversely proportional to the size of the DL. Even small sized DL will produce good estimates
19
Toy Problem : Gender Prediction Incorrect Model
P(G) ~ 0.5 P(W = Less | G = Girl) ~ 0.6 P(W = Less | G = Boy) ~ 0.25 P(G=Girl|C,W) P(G=Boy|C,W) C=No,W=Less 0.95 0.05 C=No,W=More 0.81 0.19 C=Yes,W=Less 0.35 0.65 0.11 0.89
20
Toy Problem : Gender Prediction Incorrect Model
Classify with the maximum a posteriori value of G The “bias” from “true” a posteriori in not zero Produce the same optimal Bayes rule with the previous case. Classifier likely to yield to minimum classification error
21
Toy Problem : Gender Prediction Incorrect Model + Unlabeled Data
DL / Du -> 0 P(G = Boy) = 0.5 P(W = Less | G = Girl) = 0.78 P(W = Less | G = Boy) = 0.07
22
Toy Problem : Gender Prediction Incorrect Model + Unlabeled Data
The a posteriori probabilities for G P(G=Girl|C,W) P(G=Boy|C,W) C=No,W=Less 0.99 0.01 C=No,W=More 0.55 0.45 C=Yes,W=Less 0.71 0.29 0.05 0.95
23
Toy Problem : Gender Prediction Incorrect Model + Unlabeled Data
3 out of 4 times classifier chooses Girl against Boy. Prediction has changed from the optimal Expected error rate increases. What Happened? Unlabeled data changed the asymptotic limit When model is incorrect the affect of unlabeled data is important
24
Asymptotic Analysis (Xv,Yv): Instance vector, class label
Binary classes with values -1 and +1 Assume 0-1 loss Apply Bayes rule to get Bayes Error n independent samples l: labeled u: unlabeled samples n = l + u
25
Asymptotic Analysis With probability (1 – h) a sample is unlabeled
With probability h a sample is labeled P(Xv,Yv | Q) is the parametric form Use EM
26
Asymptotic Analysis Likelihood of labeled and unlabeled data
27
Asymptotic Analysis Parameter estimation obtained by maximizing the
as n->infinity, it maximizes
28
Theorem on Asymptotic Analysis
The limiting value of Q* of maximum-likelihood estimates is
29
Theorem on Asymptotic Analysis
Qh* is the value of Q that maximizes the previous theorem. Ql* optimum of labeled data Qu* optimum of unlabeled data
30
Theorem on Asymptotic Analysis
Model is correct. P(Xv,Yv|QT ) = P(Xv,Yv) for some QT. QT =Ql*= Qu*= Qh* In this case asymptotic bias is zero.
31
Theorem on Asymptotic Analysis
Model is correct. Assume P(Xv,Yv) does not belong to P(Xv,Yv|Q) e(Q) is the classification error with parameter Q Assume e(Ql*) < e(Qu*)
32
Theorem on Asymptotic Analysis
Labeled data will train the model such that error will be e(Ql*) As we added unlabeled data the error will be closer to the e(Qu*) So using only labeled data will result a smaller classification error
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.