Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.

Slides:

Advertisements

Similar presentations

ECG Signal processing (2)

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Classification / Regression Support Vector Machines

An Introduction of Support Vector Machine

Support Vector Machines

Machine learning continued Image source:

Supervised Learning Recap

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008.

10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.

Discriminative and generative methods for bags of features

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Support Vector Machine

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

K nearest neighbor and Rocchio algorithm

Text Classification With Support Vector Machines

Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.

Semi-Supervised Classification by Low Density Separation Olivier Chapelle, Alexander Zien Student: Ran Chang.

Chapter 5: Linear Discriminant Functions

Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Semi-supervised Learning

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Announcements  Project teams should be decided today! Otherwise, you will work alone.  If you have any question or uncertainty about the project, talk.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Unconstrained Optimization Problem

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department

Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.

CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learing.

1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.

An Introduction to Support Vector Machines Martin Law.

Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.

Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,

Universit at Dortmund, LS VIII

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

Bridged Refinement for Transfer Learning XING Dikan, DAI Wenyua, XUE Gui-Rong, YU Yong Shanghai Jiao Tong University

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

An Introduction to Support Vector Machines (M. Law)

Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova ， Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

Project by: Cirill Aizenberg, Dima Altshuler Supervisor: Erez Berkovich.

Linear Models for Classification

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

Support Vector Machines

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University.

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Semi-Supervised Clustering

Usman Roshan CS 675 Machine Learning

Intrinsic Data Geometry from a Training Set

IMAGE PROCESSING RECOGNITION AND CLASSIFICATION

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Graph Based Multi-Modality Learning

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

3.3 Network-Centric Community Detection

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

Semi-supervised Learning Rong Jin

Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Label Propagation  A toy problem Each node in the graph is an example  Two examples are labeled  Most examples are unlabeled Compute the similarity between examples S ij Connect examples to their most similar examples  How to predicate labels for unlabeled nodes using this graph? Unlabeled example Two labeled examples w ij

Label Propagation  Forward propagation

Label Propagation  Forward propagation

Label Propagation  Forward propagation How to resolve conflicting cases What label should be given to this node ?

Label Propagation  Let S be the similarity matrix S=[S i,j ] nxn  Let D be a diagonal matrix where D i =  i  j S i,j  Compute normalized similarity matrix S’ S’=D -1/2 SD -1/2  Let Y be the initial assignment of class labels Y i = 1 when the i-th node is assigned to the positive class Y i = -1 when the i-th node is assigned to the negative class Y i = 0 when the I-th node is not initially labeled  Let F be the predicted class labels The i-th node is assigned to the positive class if F i >0 The i-th node is assigned to the negative class if F i < 0

Label Propagation  Let S be the similarity matrix S=[S i,j ] nxn  Let D be a diagonal matrix where D i =  i  j S i,j  Compute normalized similarity matrix S’ S’=D -1/2 SD -1/2  Let Y be the initial assignment of class labels Y i = 1 when the i-th node is assigned to the positive class Y i = -1 when the i-th node is assigned to the negative class Y i = 0 when the i-th node is not initially labeled  Let F be the predicted class labels The i-th node is assigned to the positive class if F i >0 The i-th node is assigned to the negative class if F i < 0

Label Propagation  One iteration F = Y +  S’Y = (I +  S’)Y  weights the propagation values  Two iteration F =Y +  S’Y +  2 S’ 2 Y = (I +  S’ +  2 S’ 2 )Y  How about the infinite iteration F = (  n=0 1  n S’ n )Y = (I -  S’) -1 Y  Any problems with such an approach?

Label Consistency Problem  Predicted vector F may not be consistent with the initially assigned class labels Y

Energy Minimization  Using the same notation S i,j : similarity between the I-th node and j-th node Y: initially assigned class labels F: predicted class labels  Energy: E(F) =  i,j S i,j (F i – F j ) 2  Goal: find label assignment F that is consistent with labeled examples Y and meanwhile minimizes the energy function E(F)

Harmonic Function  E(F) =  i,j S i,j (F i – F j ) 2 = F T (D-S)F  Thus, the minimizer for E(F) should be (D-S)F = 0, and meanwhile F should be consistent with Y.  F T = (F l T, F u T ), Y T = (Y l T, Y u T ) F l = Y l 

Optical Character Recognition  Given an image of a digit letter, determine its value 1 2  Create a graph for images of digit letters

Optical Character Recognition  #Labeled_Examples+#Unlabeled_Examples = 4000  CMN: label propagation  1NN: for each unlabeled example, using the label of its closest neighbor

Spectral Graph Transducer  Problem with harmonic function  Why this could happen ?  The condition (D-S)F = 0 does not hold for constrained cases

Spectral Graph Transducer  Problem with harmonic function  Why this could happen ?  The condition (D-S)F = 0 does not hold for constrained cases

Spectral Graph Transducer min F F T LF + c (F-Y) T C(F-Y) s.t. F T F=n, F T e = 0  C is the diagonal cost matrix, C i,i = 1 if the i-th node is initially labeled, zero otherwise  Parameter c controls the balance between the consistency requirement and the requirement of energy minimization  Can be solved efficiently through the computation of eigenvector

Empirical Studies

Problems with Spectral Graph Transducer min F F T LF + c (F-Y) T C(F-Y) s.t. F T F=n, F T e = 0  The obtained solution is different from the desirable one: minimize the energy function and meanwhile is consistent with labeled examples Y  It is difficult to extend the approach to multi-class classification

Green’s Function  The problem of minimizing energy and meanwhile being consistent with initially assigned class labels can be formulated into Green’s function problem  Minimizing E(F) = F T LF  LF = 0 Turns out L can be viewed as Laplacian operator in the discrete case LF = 0  r 2 F=0  Thus, our problem is find solution F r 2 F=0, s.t. F = Y for labeled examples We can treat the constraint that F = Y for labeled examples as boundary condition (Von Neumann boundary condition) A standard Green function problem

Why Energy Minimization? Final classification results

Label Propagation  How the unlabeled data help classification?  Consider a smaller number of unlabeled example

Label Propagation  How the unlabeled data help classification?  Consider a smaller number of unlabeled example  Classification results can be very different

Cluster Assumption  Cluster assumption Decision boundary should pass low density area  Unlabeled data provide more accurate estimation of local density

Cluster Assumption vs. Maximum Margin  Maximum margin classifier (e.g. SVM) denotes +1 denotes -1 w  x+b  Maximum margin  low density around decision boundary  Cluster assumption  Any thought about utilizing the unlabeled data in support vector machine?

Transductive SVM  Decision boundary given a small number of labeled examples

Transductive SVM  Decision boundary given a small number of labeled examples  How will the decision boundary change given both labeled and unlabeled examples?

Transductive SVM  Decision boundary given a small number of labeled examples  Move the decision boundary to place with low local density

Transductive SVM  Decision boundary given a small number of labeled examples  Move the decision boundary to place with low local density  Classification results  How to formulate this idea?

Transductive SVM: Formulation  Labeled data L:  Unlabeled data D:  Maximum margin principle for mixture of labeled and unlabeled data For each label assignment of unlabeled data, compute its maximum margin Find the label assignment whose maximum margin is maximized

Tranductive SVM Different label assignment for unlabeled data  different maximum margin

Transductive SVM: Formulation Original SVM Transductive SVM Constraints for unlabeled data A binary variables for label of each example

Computational Issue  No longer convex optimization problem. (why?)  How to optimize transductive SVM?  Alternating optimization

Alternating Optimization  Step 1: fix y n+1,…, y n+m, learn weights w  Step 2: fix weights w, try to predict y n+1,…, y n+m (How?)

Empirical Study with Transductive SVM  10 categories from the Reuter collection  3299 test documents  1000 informative words selected using MI criterion

Co-training for Semi-supervised Learning  Consider the task of classifying web pages into two categories: category for students and category for professors  Two aspects of web pages should be considered Content of web pages  “I am currently the second year Ph.D. student …” Hyperlinks  “My advisor is …”  “Students: …”

Co-training for Semi-Supervised Learning

It is easy to classify the type of this web page based on its content It is easier to classify this web page using hyperlinks

Co-training  Two representation for each web page Content representation: (doctoral, student, computer, university…) Hyperlink representation: Inlinks: Prof. Cheng Oulinks: Prof. Cheng

Co-training: Classification Scheme 1. Train a content-based classifier using labeled web pages 2. Apply the content-based classifier to classify unlabeled web pages 3. Label the web pages that have been confidently classified 4. Train a hyperlink based classifier using the web pages that are initially labeled and labeled by the classifier 5. Apply the hyperlink-based classifier to classify the unlabeled web pages 6. Label the web pages that have been confidently classified

Co-training  Train a content-based classifier

Co-training  Train a content-based classifier using labeled examples  Label the unlabeled examples that are confidently classified

Co-training  Train a content-based classifier using labeled examples  Label the unlabeled examples that are confidently classified  Train a hyperlink-based classifier Prof. : outlinks to students

Co-training  Train a content-based classifier using labeled examples  Label the unlabeled examples that are confidently classified  Train a hyperlink-based classifier Prof. : outlinks to students  Label the unlabeled examples that are confidently classified

Co-training  Train a content-based classifier using labeled examples  Label the unlabeled examples that are confidently classified  Train a hyperlink-based classifier Prof. : outlinks to  Label the unlabeled examples that are confidently classified