Semi-supervised Machine Learning Gergana Lazarova

Semi-supervised Machine Learning Gergana Lazarova
Sofia University “St. Kliment Ohridski”

Semi-Supervised Learning
Labeled examples Unlabeled examples Training data Usually, the number of unlabeled examples is much bigger than that of the labeled ones Unlabeled examples are easy to collect

Self-Training At first, only the labeled instances are used for learning After that, this classifier predicts the labels of the unclassified instances. A portion of the newly labeled examples (former unlabeled) augments the set of labeled examples and the classifier is retrained. An iterative procedure

Cluster-then-label It first clusters the instances (labeled and unlabeled) into k groups, performing unsupervised clustering algorithm. After that, for each cluster Cj - based on the labeled examples in it, a supervised algorithm is learned and used to classify the unlabeled examples, which belong to Cj.

Semi-supervised Support Vector Machines

Semi-supervised Support Vector Machines
Since unlabeled examples do not have labels, we do not know on which side of the boundary they are Hat loss function: Decision boundary

Graph-based Semi-supervised Learning
Graph-based semi-supervised learning constructs a graph from the training examples. The nodes of the graph are data points (labeled and unlabeled) and the edges represent similarities between points. Fig. 1 A semi-supervised graph

Graph-based Semi-supervised Learning
An edge between two vertices represents the similarity (wij) between them. The closer two vertices are, the higher the value of wij is. MinCut Algorithm - find a minimum set of edges whose removal blocks the whole flow from one of the classes to the other class.

Semi-supervised Multi-view Learning
Fig. 2 Semi-supervised Multi-view Learning

Multi-View Learning– examples
Fig. 3 – Multiple Sources of Information

Semi-supervised Multi-view Learning
Co-training - the algorithm augments the set of labeled examples of each classifier, based on the other learner's predictions. (1) Each view (set of features) is sufficient for classification; (2) The two views (feature sets of each instance) are conditionally independent given the class. Co-ЕМ

Multi-View Learning – error minimization
Loss function - measures the amount of loss of the prediction. Risk. The risk associated with f is defined as the expectation of the loss function Emperical Risk- the average loss of f on a labeled training set. Multi-view minimization problem

Semi-supervised Multi-view Genetic Algorithm
Minimizes the semi-supervised multi-view learning error It can be applied to multiple sources of data It works for convex and non-convex functions. Approaches based on gradient descend require a convex function. When a function is not convex, it is a hard optimization problem.

Semi-supervised Multi-view Genetic Algorithm
Individual: Fitness Function Do not change the size of the chromosome and do not mix the features of different views when applying crossover and mutation. view 1 w11 … w1s view j wj1 wjl view k wk1 wkp

Experimental Results “Diabetes” (UCI Machine Learning Repository)
Views: k = 2, x = (x(1), x(2)) MAX_ITER = 20000, N = 100 Comparison to supervised equivalents Тable 2 Comparison to supervised equivalents Algorithm % labeled examples RMSE SSMVGA 3% 0.63 Linear regression 90% 0.40 kNN 0.45 Backpropagation Steps=5000 0.54

Sentiment analysis in Bulgarian
Most of the research has been conducted in English. Sentiment analysis in Bulgarian suffers from labeled examples shortage. A Sentiment Analysis System in Bulgarian – Each instance has attributes from multiple sources of data (a Bulgarian and English view)

DataSet English reviews – amazon Bulgarian reviews -

Big Data Bulgarian view: 17099 features English view: 12391 features
Fig. 4 Big Data - Modelling

Examples (1) Rating: ** F(SSMVGA) = F(supervised) = 3.13

Examples(2) Rating : ** F(SSMVGA) = F(supervised) = 1.98

Examples(3) Rating: ***** F(SSMVGA) = F(supervised) = 1.98

Multi-view Teaching Algorithm
A semi-supervised two-view learning algorithm A modification of the standard co-training algorithm Improve only the weaker classifier Uses only the most confident examples of the stronger view Combining the views Application – object segmentation

A Semi-supervised Image Segmentation System
A “teacher” should label few points of each class, giving the algorithm the idea of the clusters The aim is to augment the training set with more labeled examples, reaching a better predictor. The first view contains the coordinates of the pixels (x, y): view1 = (X, Y) The second view contains the RGB values of the pixels (red, green, blue values ranging from 0 to 255)

DataSet Fig. 5 – Original Image, desired segmentation

Experimental Results 2 experiments:
Comparison of the multi-view teaching algorithm, based on naïve Bayes classifiers (for the underlying learners) to a supervised naïve Bayes classifier: Comparison of the multi-view teaching algorithm, based on multivariate normal distribution (MND-MVTA) and a Bayesian supervised classifier based on multivariate normal distribution (MND-SL):

Results (1) Comparison of the multi-view teaching algorithm, based on naïve Bayes classifiers (for the underlying learners) to a supervised naïve Bayes classifier The image consists of pixels. At each cross-validation step only a small amount of labeled pixels is used. Multiple tests were held depending on the number of labeled examples (4, 6, 10, 16, 20, 50 pixels). Тable 4 Accuracy based on the number of labeled examples Algorithm 4 6 10 16 20 50 NB 63.30% 76.23% 85.44% 89.57% 90.33% 92.37% MTA 68.62% 81.30% 88.14% 90.74% 91.24% 92.51%

Results (1) Comparison of the multi-view teaching algorithm, based on naïve Bayes classifiers (for the underlying learners) to a supervised naïve Bayes classifier 16 labeled examples Таблица 5 Сравнение на алгоритмите NB и MVTA MVTA NB Image 1 90.74% 89.57% Image 2 80.76% 78.82% Image 3 90.10% 89.12%

Results (2) Comparison of the multi-view teaching algorithm, based on multivariate normal distribution (MND-MVTA) and a Bayesian supervised classifier based on multivariate normal distribution (MND-SL): 16 labeled examples Таблица 6 Comparison of MND-MVTA and MND-SL Algorithm MND-MVTA MND-SL Image 1 84.36% 79.22% Image 2 79.14% 73.74% Image 3 86.02% 80.18%

Examples Multi-view Teaching Naïve Bayes Supervised

Благодаря за вниманието!
Thank you! Благодаря за вниманието! どうもありがとうございます！

Semi-supervised Machine Learning Gergana Lazarova

Similar presentations

Presentation on theme: "Semi-supervised Machine Learning Gergana Lazarova"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Semi-supervised Machine Learning Gergana Lazarova

Similar presentations

Presentation on theme: "Semi-supervised Machine Learning Gergana Lazarova"— Presentation transcript:

Similar presentations

About project

Feedback