Download presentation
Presentation is loading. Please wait.
Published byArron Baldwin Modified over 6 years ago
1
Linli Xu Martha White Dale Schuurmans University of Alberta
Optimal Reverse Prediction: A Unified Perspective on Supervised, Unsupervised and Semi-supervised Learning Linli Xu Martha White Dale Schuurmans University of Alberta TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAA
2
Motivation Lack of unification between supervised and unsupervised learning In semi-supervised learning, one needs to consider both Beneficial to unify the two principles ICML 2009
3
Overview A unified view of classical principles
Supervised learning — Least squares Unsupervised learning Dimensionality reduction: principle component analysis Clustering: k-means, normalized graph cut Differ only in the assumption on the labels: given/missing, continuous/discrete New algorithms for semi-supervised learning ICML 2009
4
Supervised Learning ICML 2009
5
Supervised Least Squares
Given: a t£n data matrix X, a t£k label matrix Y, k<n Learn: parameters W (n£k) for a model Solve: ICML 2009
6
Supervised Least Squares
Regularization Kernelization Instance weighting ICML 2009
7
Supervised Least Squares
Problem types Regression Classification Y labels: continuous Rows in Y indicate the class labels Testing given x: Threshold outputs of ICML 2009
8
Reverse Least Squares An alternative but equivalent view to supervised least squares ICML 2009
9
Reverse Least Squares Traditional forward least squares: predict the labels from the inputs Reverse least squares: Predict the inputs from the labels ICML 2009
10
Reverse Least Squares Key point: can recover forward from reverse solution if X has rank n Forward and reverse optimization problems have the same equilibrium condition Thus the forward solution can be exactly recovered form the reverse solution, by ICML 2009
11
Reverse Least Squares Key point: can recover forward from reverse solution Regularization Kernelization Instance weighting ICML 2009
12
Forward/reverse relationship
Supervised learning Under supervised least squares forward and reverse views are equivalent Each can be recovered exactly from the other But the forward and reverse losses are not identical! Measured in different units ICML 2009
13
Unsupervised Least Squares
ICML 2009
14
Unsupervised least squares
What if NO training labels given? Principle of optimism: Optimize over guesses, Z, of missing labels, Y, as well as parameters Forward Reverse ICML 2009
15
Unsupervised least squares
Note: optimistic forward least squares is vacuous For any W can choose Z = XW Obtain zero loss for any W Cannot distinguish good from bad W ICML 2009
16
Unsupervised least squares
Interestingly: optimistic reverse least squares is not vacuous Gives non-trivial results Unifies classical training principles ICML 2009
17
Unsup. Least Squares PCA
Claim: optimistic reverse least squares is equivalent to principal components analysis ICML 2009
18
Unsup. Least Squares PCA
Claim: optimistic reverse least squares is equivalent to principal components analysis Proof: Minimization over U solved by , thus ICML 2009
19
Unsup. Least Squares PCA
Claim: optimistic reverse least squares is equivalent to principal components analysis Given solution Can recover forward model Can embed new points Given x, z = W’x ICML 2009
20
Unsup. Least Squares k-means
Claim: constrained optimistic reverse prediction is equivalent to k-means clustering ICML 2009
21
Unsup. Least Squares k-means
Claim: constrained optimistic reverse prediction is equivalent to k-means clustering Proof: solving for U yields equivalent problem Now consider difference ICML 2009
22
Unsup. Least Squares k-means
Claim: constrained optimistic reverse prediction is equivalent to k-means clustering Proof: thus ICML 2009
23
Unsup. Least Squares k-means
Claim: constrained optimistic reverse prediction is equivalent to k-means clustering Proof: therefore the equivalent objective is sum of squared distances between each point and the mean of the points in the class it is assigned to ICML 2009
24
Unsup. Least Squares k-means
Note: can kernelize and add instance-weighting to k- means clustering ICML 2009
25
Unsup. Least Squares Norm-Cut
Claim: if K is doubly nonnegative then normalized graph-cut is equivalent to weighted k-means clustering where X is such that , and ICML 2009
26
Semi-supervised Learning
Reverse Prediction Semi- Supervised Supervised Unsupervised New Least Squares Learning Principle Component Analysis K-means Graph Norm Cut ICML 2009
27
Decomposition of Reverse Loss
Supervised reverse loss Unsupervised reverse loss Losses = sum of squared lengths ICML 2009
28
Decomposition of Reverse Loss
Losses = sum of squared lengths Supervised reverse loss Unsupervised reverse loss Pythagorean theorem ICML 2009
29
Decomposition of Reverse Loss
Pythagorean theorem Claim: ICML 2009
30
Decomposition of reverse loss
Can get an estimate of supervised loss using unlabeled data to reduce variance of estimate ICML 2009
31
Reverse semi-supervised training
Standard approach: Combine unsupervised (reverse) loss with supervised (forward) loss Problem: forward loss not in same units as reverse Our idea: combining supervised and unsupervised reverse loss ICML 2009
32
Least squares + PCA Combined supervised/unsupervised objective
Algorithm Objective not jointly convex, no obvious closed form sln Currently, just alternate (initialize U on supervised) Recover forward solution Testing: given x, Straightforward yet appears to be novel ICML 2009
33
Least squares + PCA Comparison to semi-supervised regression
Forward error rates (average root mean squared error, ± standard deviations) for different regression algorithms on various data sets. The values of (k, n; tL , tU ) are indicated for each data set. ICML 2009
34
Least squares + k-means
For classification/clustering impose constraints on Z Algorithm Hard optimization problem Could attempt relaxation, but so far used alternation Recover forward solution Testing: given x, compute , predict max response ICML 2009
35
Least squares + norm-cut
Given K can obtain via Algorithm Hard optimization problem, so far used alternation Recover forward solution Testing: given x, where , predict max ICML 2009
36
Least squares + k-means/norm-cut
Comparison to semi-supervised classification Forward error rates (average misclassification error in percentages, ± standard deviations) for different classification algorithms on various data sets. The values of (k, n; tL , tU ) are indicated for each data set. ICML 2009
37
Conclusion A unified framework for supervised, unsupervised and semi-supervised algorithms based on reverse least squares Future work: Other loss functions More complex data: structured outputs Possible convex algorithms ICML 2009
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.