A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Slides:



Advertisements
Similar presentations
Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise.
Advertisements

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Extracting Key-Substring-Group Features for Text Classification Dell Zhang and Wee Sun Lee KDD2006.
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
Co Training Presented by: Shankar B S DMML Lab
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
On feature distributional clustering for text categorization Bekkerman, El-Yaniv, Tishby and Winter The Technion. June, 27, 2001.
Chapter 5: Partially-Supervised Learning
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.
Web Mining Research: A Survey
OCFS: Optimal Orthogonal Centroid Feature Selection for Text Categorization Jun Yan, Ning Liu, Benyu Zhang, Shuicheng Yan, Zheng Chen, and Weiguo Fan et.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,
Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Bing LiuCS Department, UIC1 Chapter 8: Semi-Supervised Learning Also called “partially supervised learning”
Large-Scale Text Categorization By Batch Mode Active Learning Steven C.H. Hoi †, Rong Jin ‡, Michael R. Lyu † † CSE Department, Chinese University of Hong.
A k-Nearest Neighbor Based Algorithm for Multi-Label Classification Min-Ling Zhang
Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen
Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti.
Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.
A Comparison of On-line Computer Science Citation Databases Vaclav Petricek, Ingemar J. Cox, Hui Han, Isaac G. Councill, C. Lee Giles
Recent Trends in Text Mining Girish Keswani
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
A Language Independent Method for Question Classification COLING 2004.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Machine Learning Tutorial Amit Gruber The Hebrew University of Jerusalem.
Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.
1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Question Classification using Support Vector Machine Dell Zhang National University of Singapore Wee Sun Lee National University of Singapore SIGIR2003.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Memoryless Document Vector Dongxu Zhang Advised by Dong Wang
Improving Collaborative Filtering by Incorporating Customer Reviews Hui Hui Supervisor Prof Min-Yen Kan Dr. Kazunari Sugiyama 1.
Wrapper Learning: Cohen et al 2002; Kushmeric 2000; Kushmeric & Frietag 2000 William Cohen 1/26/03.
A New Generation of Artificial Neural Networks.  Support Vector Machines (SVM) appeared in the early nineties in the COLT92 ACM Conference.  SVM have.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Semi-Supervised Learning Using Label Mean
Recent Trends in Text Mining
Semi-Supervised Clustering
Sri Venkateswara College of Engineering (SVCE), Tirupati
Yu-Feng Li 1, James T. Kwok2, Ivor W. Tsang3 and Zhi-Hua Zhou1
Data Driven Attributes for Action Detection
Chapter 8: Semi-Supervised Learning
Restricted Boltzmann Machines for Classification
Fabien LOTTE, Cuntai GUAN Brain-Computer Interfaces laboratory
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
PEBL: Web Page Classification without Negative Examples
Semi-supervised Learning
Richard Maclin University of Minnesota - Duluth
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Knowledge Transfer via Multiple Model Local Structure Mapping
Concave Minimization for Support Vector Machine Classifiers
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Web Mining Research: A Survey
Presentation transcript:

A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Problem Supervised Learning

Problem Semi-Supervised Learning

Problem PU Learning

Problem Unlabeled Examples Help

Problem PU Learning To distinguish the interesting instances (the positive class C + ) with other instances (the negative class C - ) by learning a classifier from a set of positive examples P and a set of unlabeled examples U There is no labeled negative example!

Applications To automatically filter web pages according to a user's preference the browsed or bookmarked pages can be used as positive examples while unlabeled examples can be easily collected from the web To automatically find machine learning literature the ICML papers can be used as positive examples while unlabeled examples can be easily collected from the ACM or IEEE digital library To automatically identify cancer patients the patients known to have cancers can be used as positive examples while unlabeled examples can be easily collected from the patient database To automatically discover future customers for direct marketing the current customers of the company can be used as positive examples while unlabeled examples can be purchased at a low cost compared with obtaining negative examples ……

Approaches Existing Approaches PNB (Denis et al. 2002); PNCT (Denis et al. 2003) S-EM (Liu et al. 2002); RC-SVM (Li & Liu 2003) PEBL (Yu et al. 2004); SVMC (Yu 2005) PN-SVM (Fung et al. 2005) W-LR (Lee & Liu 2003); B-SVM (Liu et al. 2003) Our Proposed Approach B-Pr

Our Approach A Probabilistic Model

Our Approach

Biased PrTFIDF (B-Pr) Estimate PrTFIDF (Joachims 1997) Estimmate Maximize On a held-out validation set (Lee & Liu 2003) Linear Time Complexity!

Experiments Reuters B-Pr>RC-SVM>PEBL ( p=0.55 ) RC-SVM>B-Pr>PEBL ( p=0.85 )

Experiments 20NewsGroups B-Pr>W-LR>S-EM ( p=0.3 ) B-Pr>W-LR>S-EM ( p=0.7 )

Conclusion A New Approach to Learning from Positive and Unlabeled Examples As effective as the state-of-the-art approaches Yet simpler and faster

Thank you Questions? Comments? Suggestions? ……