Download presentation
Presentation is loading. Please wait.
1
Semi-Supervised Learning
Jia-Bin Huang Virginia Tech ECE-5424G / CS-5824 Spring 2019
2
Administrative HW 4 due April 10
3
Recommender Systems Motivation Problem formulation
Content-based recommendations Collaborative filtering Mean normalization
4
Problem motivation 5 0.9 ? 1.0 0.01 4 0.99 0.1 Movie Alice (1) Bob (2)
Carol (3) Dave (4) ๐ฅ 1 (romance) ๐ฅ 2 (action) Love at last 5 0.9 Romance forever ? 1.0 0.01 Cute puppies of love 4 0.99 Nonstop car chases 0.1 Swords vs. karate
5
Problem motivation ๐ 1 = 0 5 0 ๐ 2 = 0 5 0 ๐ 3 = 0 0 5 ๐ 4 = 0 0 5
Movie Alice (1) Bob (2) Carol (3) Dave (4) ๐ฅ 1 (romance) ๐ฅ 2 (action) Love at last 5 ? Romance forever Cute puppies of love 4 Nonstop car chases Swords vs. karate ๐ 1 = ๐ 2 = ๐ 3 = ๐ 4 = ๐ฅ 1 = ? ? ?
6
Optimization algorithm
Given ๐ 1 , ๐ 2 , โฏ, ๐ ๐ ๐ข , to learn ๐ฅ (๐) : min ๐ฅ (๐) ๐:๐ ๐,๐ = (๐ ๐ ) โค ๐ฅ ๐ โ ๐ฆ ๐,๐ ๐ 2 ๐=1 ๐ ๐ฅ ๐ (๐) 2 Given ๐ 1 , ๐ 2 , โฏ, ๐ ๐ ๐ข , to learn ๐ฅ (1) , ๐ฅ (2) , โฏ, ๐ฅ ( ๐ ๐ ) : min ๐ฅ (1) , ๐ฅ (2) , โฏ, ๐ฅ ( ๐ ๐ ) ๐=1 ๐ ๐ ๐:๐ ๐,๐ = (๐ ๐ ) โค ๐ฅ ๐ โ ๐ฆ ๐,๐ ๐ 2 ๐=1 ๐ ๐ ๐=1 ๐ ๐ฅ ๐ (๐) 2
7
Collaborative filtering
Given ๐ฅ 1 , ๐ฅ 2 , โฏ, ๐ฅ ๐ ๐ (and movie ratings), Can estimate ๐ 1 , ๐ 2 , โฏ, ๐ ๐ ๐ข Given ๐ 1 , ๐ 2 , โฏ, ๐ ๐ ๐ข Can estimate ๐ฅ 1 , ๐ฅ 2 , โฏ, ๐ฅ ๐ ๐
8
Collaborative filtering optimization objective
Given ๐ฅ 1 , ๐ฅ 2 , โฏ, ๐ฅ ๐ ๐ , estimate ๐ 1 , ๐ 2 , โฏ, ๐ ๐ ๐ข min ๐ 1 , ๐ 2 , โฏ, ๐ ๐ ๐ข ๐=1 ๐ ๐ข ๐:๐ ๐,๐ = (๐ ๐ ) โค ๐ฅ ๐ โ ๐ฆ ๐,๐ ๐ 2 ๐=1 ๐ ๐ข ๐=1 ๐ ๐ ๐ ๐ 2 Given ๐ 1 , ๐ 2 , โฏ, ๐ ๐ ๐ข , estimate ๐ฅ 1 , ๐ฅ 2 , โฏ, ๐ฅ ๐ ๐ min ๐ฅ (1) , ๐ฅ (2) , โฏ, ๐ฅ ( ๐ ๐ ) ๐=1 ๐ ๐ ๐:๐ ๐,๐ = (๐ ๐ ) โค ๐ฅ ๐ โ ๐ฆ ๐,๐ ๐ 2 ๐=1 ๐ ๐ ๐=1 ๐ ๐ฅ ๐ (๐) 2
9
Collaborative filtering optimization objective
Given ๐ฅ 1 , ๐ฅ 2 , โฏ, ๐ฅ ๐ ๐ , estimate ๐ 1 , ๐ 2 , โฏ, ๐ ๐ ๐ข min ๐ 1 , ๐ 2 , โฏ, ๐ ๐ ๐ข ๐=1 ๐ ๐ข ๐:๐ ๐,๐ = (๐ ๐ ) โค ๐ฅ ๐ โ ๐ฆ ๐,๐ ๐ 2 ๐=1 ๐ ๐ข ๐=1 ๐ ๐ ๐ ๐ 2 Given ๐ 1 , ๐ 2 , โฏ, ๐ ๐ ๐ข , estimate ๐ฅ 1 , ๐ฅ 2 , โฏ, ๐ฅ ๐ ๐ min ๐ฅ (1) , ๐ฅ (2) , โฏ, ๐ฅ ( ๐ ๐ ) ๐=1 ๐ ๐ ๐:๐ ๐,๐ = (๐ ๐ ) โค ๐ฅ ๐ โ ๐ฆ ๐,๐ ๐ 2 ๐=1 ๐ ๐ ๐=1 ๐ ๐ฅ ๐ (๐) 2 Minimize ๐ฅ 1 , ๐ฅ 2 , โฏ, ๐ฅ ๐ ๐ and ๐ 1 , ๐ 2 , โฏ, ๐ ๐ ๐ข simultaneously ๐ฝ= ๐ ๐,๐ = (๐ ๐ ) โค ๐ฅ ๐ โ ๐ฆ ๐,๐ ๐ 2 ๐=1 ๐ ๐ข ๐=1 ๐ ๐ ๐ ๐ ๐ 2 ๐=1 ๐ ๐ ๐=1 ๐ ๐ฅ ๐ (๐) 2
10
Collaborative filtering optimization objective
๐ฝ( ๐ฅ 1 , ๐ฅ 2 , โฏ, ๐ฅ ๐ ๐ , ๐ 1 , ๐ 2 , โฏ, ๐ ๐ ๐ข )= 1 2 ๐ ๐,๐ =1 (๐ ๐ ) โค ๐ฅ ๐ โ ๐ฆ ๐,๐ 2 + ๐ 2 ๐=1 ๐ ๐ข ๐=1 ๐ ๐ ๐ ๐ 2 + ๐ 2 ๐=1 ๐ ๐ ๐=1 ๐ ๐ฅ ๐ (๐) 2
11
Collaborative filtering algorithm
Initialize ๐ฅ 1 , ๐ฅ 2 , โฏ, ๐ฅ ๐ ๐ , ๐ 1 , ๐ 2 , โฏ, ๐ ๐ ๐ข to small random values Minimize ๐ฝ( ๐ฅ 1 , ๐ฅ 2 , โฏ, ๐ฅ ๐ ๐ , ๐ 1 , ๐ 2 , โฏ, ๐ ๐ ๐ข ) using gradient descent (or an advanced optimization algorithm). For every ๐= 1โฏ ๐ ๐ข , ๐=1, โฏ, ๐ ๐ : ๐ฅ ๐ ๐ โ ๐ฅ ๐ ๐ โ๐ผ ๐:๐ ๐,๐ =1 ( ๐ ๐ โค ๐ฅ ๐ โ ๐ฆ ๐,๐ ) ๐ ๐ ๐ +๐ ๐ฅ ๐ (๐) ๐ ๐ ๐ โ ๐ ๐ ๐ โ๐ผ ๐:๐ ๐,๐ =1 ( ๐ ๐ โค ๐ฅ ๐ โ ๐ฆ ๐,๐ ) ๐ฅ ๐ ๐ +๐ ๐ ๐ (๐) For a user with parameter ๐ and movie with (learned) feature ๐ฅ, predict a star rating of ๐ โค ๐ฅ
12
Collaborative filtering
Movie Alice (1) Bob (2) Carol (3) Dave (4) Love at last 5 Romance forever ? Cute puppies of love 4 Nonstop car chases Swords vs. karate
13
Collaborative filtering
Predicted ratings: ๐= โ ๐ฅ โค โ โ ๐ฅ โค โ โฎ โ ๐ฅ ๐ ๐ โค โ ฮ= โ ๐ โค โ โ ๐ โค โ โฎ โ ๐ ๐ ๐ข โค โ Y=X ฮ โค Low-rank matrix factorization
14
Finding related movies/products
For each product ๐, we learn a feature vector ๐ฅ (๐) โ ๐
๐ ๐ฅ 1 : romance, ๐ฅ 2 : action, ๐ฅ 3 : comedy, โฆ How to find movie ๐ relate to movie ๐? Small ๐ฅ (๐) โ ๐ฅ (๐) movie j and I are โsimilarโ
15
Recommender Systems Motivation Problem formulation
Content-based recommendations Collaborative filtering Mean normalization
16
Users who have not rated any movies
Alice (1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last 5 ? Romance forever Cute puppies of love 4 Nonstop car chases Swords vs. karate 1 2 ๐ ๐,๐ = (๐ ๐ ) โค ๐ฅ ๐ โ ๐ฆ ๐,๐ ๐ 2 ๐=1 ๐ ๐ข ๐=1 ๐ ๐ ๐ ๐ ๐ 2 ๐=1 ๐ ๐ ๐=1 ๐ ๐ฅ ๐ (๐) 2 ๐ (5) = 0 0
17
Users who have not rated any movies
Alice (1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last 5 Romance forever ? Cute puppies of love 4 Nonstop car chases Swords vs. karate 1 2 ๐ ๐,๐ = (๐ ๐ ) โค ๐ฅ ๐ โ ๐ฆ ๐,๐ ๐ 2 ๐=1 ๐ ๐ข ๐=1 ๐ ๐ ๐ ๐ ๐ 2 ๐=1 ๐ ๐ ๐=1 ๐ ๐ฅ ๐ (๐) 2 ๐ (5) = 0 0
18
Mean normalization Learn ๐ (๐) , ๐ฅ (๐) For user ๐, on movie ๐ predict:
๐ ๐ โค ๐ฅ (๐) + ๐ ๐ User 5 (Eve): ๐ 5 = ๐ โค ๐ฅ (๐) + ๐ ๐ Learn ๐ (๐) , ๐ฅ (๐)
19
Recommender Systems Motivation Problem formulation
Content-based recommendations Collaborative filtering Mean normalization
20
Review: Supervised Learning
K nearest neighbor Linear Regression Naรฏve Bayes Logistic Regression Support Vector Machines Neural Networks
21
Review: Unsupervised Learning
Clustering, K-Mean Expectation maximization Dimensionality reduction Anomaly detection Recommendation system
22
Advanced Topics Semi-supervised learning
Probabilistic graphical models Generative models Sequence prediction models Deep reinforcement learning
23
Semi-supervised Learning
Motivation Problem formulation Consistency regularization Entropy-based method Pseudo-labeling
24
Semi-supervised Learning
Motivation Problem formulation Consistency regularization Entropy-based method Pseudo-labeling
25
Classic Paradigm Insufficient Nowadays
Modern applications: massive amounts of raw data. Only a tiny fraction can be annotated by human experts Protein sequences Billions of webpages Images
26
Semi-supervised Learning
27
Active Learning
28
Semi-supervised Learning
Motivation Problem formulation Consistency regularization Entropy-based method Pseudo-labeling
29
Semi-supervised Learning Problem Formulation
Labeled data ๐ ๐ = ๐ฅ 1 , ๐ฆ 1 , ๐ฅ 2 , ๐ฆ 2 , โฏ, ๐ฅ ๐ ๐ , ๐ฆ ๐ ๐ Unlabeled data ๐ ๐ข = ๐ฅ 1 , ๐ฆ 1 , ๐ฅ 2 , ๐ฆ 2 , โฏ, ๐ฅ ๐ ๐ข , ๐ฆ ๐ ๐ข Goal: Learn a hypothesis โ ๐ (e.g., a classifier) that has small error
30
Combining labeled and unlabeled data - Classical methods
Transductive SVM [Joachims โ99] Co-training [Blum and Mitchell โ98] Graph-based methods [Blum and Chawla โ01] [Zhu, Ghahramani, Lafferty โ03]
31
Transductive SVM The separator goes through low density regions of the space / large margin
32
SVM Transductive SVM Inputs: ๐ฅ l (๐) , ๐ฆ l (๐) Inputs:
s.t. ๐ฆ l (๐) ๐ โค ๐ฅ ๐ ๐ โฅ1 Transductive SVM Inputs: ๐ฅ l (๐) , ๐ฆ l (๐) , ๐ฅ u (๐) , ๐ฆ ๐ข (๐) min ๐ ๐=1 ๐ ๐ ๐ 2 s.t ๐ฆ l (๐) ๐ โค ๐ฅ ๐ ๐ โฅ1 ๐ฆ u (๐) ๐ โค ๐ฅ ๐ โฅ1 ๐ฆ u ๐ โ{โ1, 1}
33
Transductive SVMs First maximize margin over the labeled points
Use this to give initial labels to unlabeled points based on this separator. Try flipping labels of unlabeled points to see if doing so can increase margin
34
Deep Semi-supervised Learning
35
Semi-supervised Learning
Motivation Problem formulation Consistency regularization Entropy-based method Pseudo-labeling
36
Stochastic Perturbations/ฮ -Model
Realistic perturbations ๐ฅโ ๐ฅ of data points ๐ฅโ ๐ท ๐๐ฟ should not significantly change the output of h ๐ (๐ฅ)
37
Temporal Ensembling
38
Mean Teacher
39
Virtual Adversarial Training
40
Semi-supervised Learning
Motivation Problem formulation Consistency regularization Entropy-based method Pseudo-labeling
41
EntMin Encourages more confident predictions on unlabeled data.
42
Semi-supervised Learning
Motivation Problem formulation Consistency regularization Entropy-based method Pseudo-labeling
43
Comparison
44
Varying number of labels
45
Class mismatch in Labeled/Unlabeled datasets hurts the performance
46
Lessons Standardized architecture + equal budget for tuning hyperparameters Unlabeled data from a different class distribution not that useful Most methods donโt work well in the very low labeled-data regime Transferring Pre-Trained Imagenet produces lower error rate Conclusions based on small datasets though
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.