Semi Supervised Learning Qiang Yang –Adapted from… Thanks –Zhi-Hua Zhou –http://cs.nju.edu.cn/pe ople/zhouzh/ –LAMDA.

Slides:



Advertisements
Similar presentations
Co Training Presented by: Shankar B S DMML Lab
Advertisements

Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Semi-Supervised Learning
Multi-View Learning in the Presence of View Disagreement C. Mario Christoudias, Raquel Urtasun, Trevor Darrell UC Berkeley EECS & ICSI MIT CSAIL.
Maria-Florina Balcan Modern Topics in Learning Theory Maria-Florina Balcan 04/19/2006.
Chapter 5: Partially-Supervised Learning
Text Classification With Support Vector Machines
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Text Learning Tom M. Mitchell Aladdin Workshop Carnegie Mellon University January 2003.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
Active Learning of Binary Classifiers
Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer Yimeng Zhang March 1 st, 2007.
Inductive Semi-supervised Learning Gholamreza Haffari Supervised by: Dr. Anoop Sarkar Simon Fraser University, School of Computing Science.
Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research.
Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Analysis of Semi-supervised Learning with the Yarowsky Algorithm
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Bing LiuCS Department, UIC1 Chapter 8: Semi-Supervised Learning Also called “partially supervised learning”
CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learing.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Incorporating Unlabeled Data in the Learning Process
August 16, 2015EECS, OSU1 Learning with Ambiguously Labeled Training Data Kshitij Judah Ph.D. student Advisor: Prof. Alan Fern Qualifier Oral Presentation.
Semi-Supervised Learning
Ensembles of Classifiers Evgueni Smirnov
A k-Nearest Neighbor Based Algorithm for Multi-Label Classification Min-Ling Zhang
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Machine Learning CSE 681 CH2 - Supervised Learning.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
Universit at Dortmund, LS VIII
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 COMP3503 Semi-Supervised Learning COMP3503 Semi-Supervised Learning Daniel L. Silver.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.
Report on Semi-supervised Training for Statistical Parsing Zhang Hao
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
NTU & MSRA Ming-Feng Tsai
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Learning from Labeled and Unlabeled Data Tom Mitchell Statistical Approaches to Learning and Discovery, and March 31, 2003.
Classification using Co-Training
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Ensembles of Classifiers Evgueni Smirnov. Outline 1 Methods for Independently Constructing Ensembles 1.1 Bagging 1.2 Randomness Injection 1.3 Feature-Selection.
Data Mining Practical Machine Learning Tools and Techniques
Semi-Supervised Learning Using Label Mean
Semi-Supervised Clustering
Chapter 8: Semi-Supervised Learning
Combining Labeled and Unlabeled Data with Co-Training
Introductory Seminar on Research: Fall 2017
Semi-Supervised Learning
Ensemble learning.
Presentation transcript:

Semi Supervised Learning Qiang Yang –Adapted from… Thanks –Zhi-Hua Zhou – ople/zhouzh/ –LAMDA Group, –National Laboratory for Novel Software Technology, Nanjing University, China

Supervised learning is a typical machine learning setting, where labeled examples are used as training examples decision trees, neural networks, support vector machines, etc. trained model training data label training ? = yes unseen data (Jeff, Professor, 7, ?) label unknown Supervised learning

Labeled vs. Unlabeled In many practical applications, unlabeled training examples are readily available but labeled ones are fairly expansive to obtain because labeling the unlabeled examples requires human effort class = “ war” (almost) infinite number of web pages on the Internet ?

Three main paradigms for Semi-supervised Learning: Transductive learning: Unlabeled examples are exactly the test examples Active learning: Assume that a user can continue to label data The learner actively selects some unlabeled examples to query from an oracle (assume the learner has some control over the input space) Multi-view Learning Unlabeled examples may be different from the test examples Regularization (minimize error and maximize smoothness) Multi-view Learning and Co-training

SSL: Why unlabeled data can be helpful? Suppose the data is well-modeled by a mixture density: Thus, the optimal classification rule for this model is the MAP rule: where and  = {  l } The class labels are viewed as random quantities and are assumed chosen conditioned on the selected mixture component m i  {1,2,…,L} and possibly on the feature value, i.e. according to the probabilities P[c i |x i,m i ] where unlabeled examples can be used to help estimate this term [D.J. Miller & H.S. Uyar, NIPS’96]

Transductive SVM Transductive SVM : Taking into account a particular test set and trying to minimize misclassifications of just those particular examples Figure reprinted from [T. Joachims, ICML99] Concretely, using unlabeled examples to help identify the maximum margin hyperplanes

Active learning: Getting more from query The labels of the training examples are obtained by querying the oracle. Thus, for the same number of queries, more helpful information can be obtained by actively selecting some unlabeled examples to query Key: To select the unlabeled examples on which the labeling will convey the most helpful information for the learner

 Uncertainty sampling Train a single learner and then query the unlabeled instances on which the learner is the least confident [Lewis & Gale, SIGIR’94]  Committee-based sampling Generate a committee of multiple learners and select the unlabeled examples on which the committee members disagree the most [Abe & Mamitsuka, ICML’98; Seung et al., COLT’92] Active Learning: Representative approaches

To retrieve images from a (usually large) image database according to user interest very useful in digital library, digital photo album, etc. Active Learning Application: Image retrieval Where are my photos taken at Guilin?

Database Text Interface Text Interface Text-based Retrieval Engine  Every image is associated with a text annotation  User poses a keyword  The system retrieves images by matching the keyword with annotations Active Learning: Text-based image retrieval “tiger” query tiger lily white tiger

In some applications, there are two sufficient and redundant views, i.e. two attribute sets each of which is sufficient for learning and conditionally independent to the other given the class label e.g. two views for web page classification: 1) the text appearing on the page itself, and 2) the anchor text attached to hyperlinks pointing to this page, from other pages Co-training

learner 1 learner 2 X 1 view X 2 view labeled training examples unlabeled training examples labeled unlabeled examples [A. Blum & T. Mitchell, COLT98] Co-training (con’t)

 Theoretical analysis [Blum & Mitchell, COLT’98; Dasgupta, NIPS’01; Balcan et al., NIPS’04; etc.]  Experimental studies [Nigam & Ghani, CIKM’00]  New algorithms Co-training without two views [Goldman & Zhou, ICML’00; Zhou & Li, TKDE’05] Semi-supervised regression [Zhou & Li, IJCAI’05]  Applications Statistical parsing [Sarkar, NAACL01; Steedman et al., EACL03; R. Hwa et al., ICML03w] Noun phrase identification [Pierce & Cardie, EMNLP01] Image retrieval [Zhou et al., ECML’04; Zhou et al., TOIS06]

Multi-view Learning and Co- training Multi-view learning describes the setting of learning from data where observations are represented by multiple independent sets of features. An example of two views: Features can be split into two sets: –The instance space: –Each instance:

Inductive vs.Transductive Transductive: Produce label only for the available unlabeled data. –The output of the method is not a classifier. Inductive: Not only produce label for unlabeled data, but also produce a classifier.

An Example of two views Web-page classification: e.g., find homepages of faculty members. –Page text: words occurring on that page: e.g., “research interest”, “teaching” –Hyperlink text: words occurring in hyperlinks that point to that page: e.g., “my advisor”

Another Example X1 : job title X2: job description Classifying Jobs for FlipDog

Two Views : the set of target function over. : the set of target functions over. : the set of target function over. Instead of learning from, multi-view learning aims to learn a pair of functions from, such that.

Co-training Proposed by (Blum and Mitchell 1998) Combine Multi-view learning & semi-supervised learning. Related work: –(Yarowsky 1995) –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000) –(Abney, 2002) –(Sarkar, 2002) –… Used in document classification, parsing, etc.

The Yarowsky Algorithm Iteration: A Classifier trained by SL Choose instances labeled with high confidence Iteration: Add them to the pool of current labeled training data …… (Yarowsky 1995) Iteration: 2 + -

Co-training Assumption 1: compatibility The instance distribution is compatible with the target function if for any with non-zero probability,. Definition: compatibility of with :  Each set of features is sufficient for classification

Co-training Assumption 2: conditional independence Definition: A pair of views satisfy view independence when: A classification problem instance satisfies view independence when all pairs satisfy view independence.

Co-training Algorithm

Co-Training Instances contain two sufficient sets of features –i.e. an instance is x=(x 1,x 2 ) –Each set of features is called a View Two views are independent given the label: Two views are consistent: x x1x1 x2x2 (Blum and Mitchell 1998)

Co-Training Iteration: t + - Iteration: t …… C1: A Classifier trained on view 1 C2: A Classifier trained on view 2 Allow C1 to label Some instances Allow C2 to label Some instances Add self-labeled instances to the pool of training data

Agreement Maximization A side effect of the Co-Training: Agreement between two views. Is it possible to pose agreement as the explicit goal? –Yes. The resulting algorithm: Agreement Boost (Leskes 2005)

What if Co-training Assumption Not Perfectly Satisfied? Idea: Want classifiers that produce a maximally consistent labeling of the data If learning is an optimization problem, what function should we optimize?

Other Related Works Multi-view clustering (Bickel & Scheffer 2004) Modified the co-training algorithm by replacing the class variable (class label) with a mixture coefficient to obtain a multi-view clustering algorithm. Manifold co-regularization (Sindhwani et al., 2005) Extended Manifold regularization to multi-view learning. Active multi-view learning (Muslea 2002) Combine active learning and multi-view learning. More related works can be find in the workshop on Multi- view learning in ICML 2005:

Reference A. Blum and T. Mitchell, “Combining Labeled and Unlabeled Data with Co-Training,” In Proceedings of COLT D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of ACL Nigam, K., & Ghani, R, Analyzing the effectiveness and applicability of co-training. In Proceedings of CIKM Steven Abney, Bootstrapping. In Proceedings of ACL, Ulf Brefeld and Tobias Scheer. Co-EM support vector learning. In Proceedings ICML, Steen Bickel and Tobias Scheer. Multi-view clustering. In Proceedings of ICDM, Sindhwani, V.; Niyogi, P.; and Belkin, M A Co-Regularization Approach to Semi-supervised Learning with Multiple Views. In Workshop on Learning with Multiple Views at ICML Ion Muslea. Active learning with multiple views. PhD thesis, University of Southern California, 2002.