Linli Xu Martha White Dale Schuurmans University of Alberta

Linli Xu Martha White Dale Schuurmans University of Alberta
Optimal Reverse Prediction: A Unified Perspective on Supervised, Unsupervised and Semi-supervised Learning Linli Xu Martha White Dale Schuurmans University of Alberta TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAA

Motivation Lack of unification between supervised and unsupervised learning In semi-supervised learning, one needs to consider both Beneficial to unify the two principles ICML 2009

Overview A unified view of classical principles
Supervised learning — Least squares Unsupervised learning Dimensionality reduction: principle component analysis Clustering: k-means, normalized graph cut Differ only in the assumption on the labels: given/missing, continuous/discrete New algorithms for semi-supervised learning ICML 2009

Supervised Learning ICML 2009

Supervised Least Squares
Given: a t£n data matrix X, a t£k label matrix Y, k<n Learn: parameters W (n£k) for a model Solve: ICML 2009

Regularization Kernelization Instance weighting ICML 2009

Problem types Regression Classification Y labels: continuous Rows in Y indicate the class labels Testing given x: Threshold outputs of ICML 2009

Reverse Least Squares An alternative but equivalent view to supervised least squares ICML 2009

Reverse Least Squares Traditional forward least squares: predict the labels from the inputs Reverse least squares: Predict the inputs from the labels ICML 2009

Reverse Least Squares Key point: can recover forward from reverse solution if X has rank n Forward and reverse optimization problems have the same equilibrium condition Thus the forward solution can be exactly recovered form the reverse solution, by ICML 2009

Reverse Least Squares Key point: can recover forward from reverse solution Regularization Kernelization Instance weighting ICML 2009

Forward/reverse relationship
Supervised learning Under supervised least squares forward and reverse views are equivalent Each can be recovered exactly from the other But the forward and reverse losses are not identical! Measured in different units ICML 2009

Unsupervised Least Squares
ICML 2009

Unsupervised least squares
What if NO training labels given? Principle of optimism: Optimize over guesses, Z, of missing labels, Y, as well as parameters Forward Reverse ICML 2009

Note: optimistic forward least squares is vacuous For any W can choose Z = XW Obtain zero loss for any W Cannot distinguish good from bad W ICML 2009

Interestingly: optimistic reverse least squares is not vacuous Gives non-trivial results Unifies classical training principles ICML 2009

Unsup. Least Squares PCA
Claim: optimistic reverse least squares is equivalent to principal components analysis ICML 2009

Claim: optimistic reverse least squares is equivalent to principal components analysis Proof: Minimization over U solved by , thus ICML 2009

Claim: optimistic reverse least squares is equivalent to principal components analysis Given solution Can recover forward model Can embed new points Given x, z = W’x ICML 2009

Unsup. Least Squares k-means
Claim: constrained optimistic reverse prediction is equivalent to k-means clustering ICML 2009

Claim: constrained optimistic reverse prediction is equivalent to k-means clustering Proof: solving for U yields equivalent problem Now consider difference ICML 2009

Claim: constrained optimistic reverse prediction is equivalent to k-means clustering Proof: thus ICML 2009

Claim: constrained optimistic reverse prediction is equivalent to k-means clustering Proof: therefore the equivalent objective is sum of squared distances between each point and the mean of the points in the class it is assigned to ICML 2009

Note: can kernelize and add instance-weighting to k- means clustering ICML 2009

Unsup. Least Squares Norm-Cut
Claim: if K is doubly nonnegative then normalized graph-cut is equivalent to weighted k-means clustering where X is such that , and ICML 2009

Semi-supervised Learning
Reverse Prediction Semi- Supervised Supervised Unsupervised New Least Squares Learning Principle Component Analysis K-means Graph Norm Cut ICML 2009

Decomposition of Reverse Loss
Supervised reverse loss Unsupervised reverse loss Losses = sum of squared lengths ICML 2009

Losses = sum of squared lengths Supervised reverse loss Unsupervised reverse loss Pythagorean theorem ICML 2009

Pythagorean theorem Claim: ICML 2009

Decomposition of reverse loss
Can get an estimate of supervised loss using unlabeled data to reduce variance of estimate ICML 2009

Reverse semi-supervised training
Standard approach: Combine unsupervised (reverse) loss with supervised (forward) loss Problem: forward loss not in same units as reverse Our idea: combining supervised and unsupervised reverse loss ICML 2009

Least squares + PCA Combined supervised/unsupervised objective
Algorithm Objective not jointly convex, no obvious closed form sln Currently, just alternate (initialize U on supervised) Recover forward solution Testing: given x, Straightforward yet appears to be novel ICML 2009

Least squares + PCA Comparison to semi-supervised regression
Forward error rates (average root mean squared error, ± standard deviations) for different regression algorithms on various data sets. The values of (k, n; tL , tU ) are indicated for each data set. ICML 2009

Least squares + k-means
For classification/clustering impose constraints on Z Algorithm Hard optimization problem Could attempt relaxation, but so far used alternation Recover forward solution Testing: given x, compute , predict max response ICML 2009

Least squares + norm-cut
Given K can obtain via Algorithm Hard optimization problem, so far used alternation Recover forward solution Testing: given x, where , predict max ICML 2009

Least squares + k-means/norm-cut
Comparison to semi-supervised classification Forward error rates (average misclassification error in percentages, ± standard deviations) for different classification algorithms on various data sets. The values of (k, n; tL , tU ) are indicated for each data set. ICML 2009

Conclusion A unified framework for supervised, unsupervised and semi-supervised algorithms based on reverse least squares Future work: Other loss functions More complex data: structured outputs Possible convex algorithms ICML 2009

Linli Xu Martha White Dale Schuurmans University of Alberta

Similar presentations

Presentation on theme: "Linli Xu Martha White Dale Schuurmans University of Alberta"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Linli Xu Martha White Dale Schuurmans University of Alberta

Similar presentations

Presentation on theme: "Linli Xu Martha White Dale Schuurmans University of Alberta"— Presentation transcript:

Similar presentations

About project

Feedback