A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)
Problem Supervised Learning
Problem Semi-Supervised Learning
Problem PU Learning
Problem Unlabeled Examples Help
Problem PU Learning To distinguish the interesting instances (the positive class C + ) with other instances (the negative class C - ) by learning a classifier from a set of positive examples P and a set of unlabeled examples U There is no labeled negative example!
Applications To automatically filter web pages according to a user's preference the browsed or bookmarked pages can be used as positive examples while unlabeled examples can be easily collected from the web To automatically find machine learning literature the ICML papers can be used as positive examples while unlabeled examples can be easily collected from the ACM or IEEE digital library To automatically identify cancer patients the patients known to have cancers can be used as positive examples while unlabeled examples can be easily collected from the patient database To automatically discover future customers for direct marketing the current customers of the company can be used as positive examples while unlabeled examples can be purchased at a low cost compared with obtaining negative examples ……
Approaches Existing Approaches PNB (Denis et al. 2002); PNCT (Denis et al. 2003) S-EM (Liu et al. 2002); RC-SVM (Li & Liu 2003) PEBL (Yu et al. 2004); SVMC (Yu 2005) PN-SVM (Fung et al. 2005) W-LR (Lee & Liu 2003); B-SVM (Liu et al. 2003) Our Proposed Approach B-Pr
Our Approach A Probabilistic Model
Our Approach
Biased PrTFIDF (B-Pr) Estimate PrTFIDF (Joachims 1997) Estimmate Maximize On a held-out validation set (Lee & Liu 2003) Linear Time Complexity!
Experiments Reuters B-Pr>RC-SVM>PEBL ( p=0.55 ) RC-SVM>B-Pr>PEBL ( p=0.85 )
Experiments 20NewsGroups B-Pr>W-LR>S-EM ( p=0.3 ) B-Pr>W-LR>S-EM ( p=0.7 )
Conclusion A New Approach to Learning from Positive and Unlabeled Examples As effective as the state-of-the-art approaches Yet simpler and faster
Thank you Questions? Comments? Suggestions? ……