Download presentation
Presentation is loading. Please wait.
Published byOliver Craig Modified over 9 years ago
1
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised, Unsupervised and Semi-supervised Learning Discussion led by Chunping Wang ECE, Duke University October 23, 2009
2
Outline Motivations Preliminary Foundations Reverse Supervised Least Squares Relationship between Unsupervised Least Squares and PCA, K-means, and Normalized Graph-cut Semi-supervised Least Squares Experiments Conclusions 1/31
3
Motivations 2/31 Lack of a foundational connection between supervised and unsupervised learning Supervised learning: minimizing prediction error Unsupervised learning: re-representing the input data For semi-supervised learning, one needs to consider both together The semi-supervised learning literature relies on intuitions: the “cluster assumption” and the “manifold assumption” A unification demonstrated in this paper leads to a novel semi- supervised principle
4
Preliminary Foundations Forward Supervised Least Squares 3/31 Data: –a input matrix X, a output matrix Y, –t instances, n features, k responses –regression: –classification: –assumption: X, Y full rank, Problem: –Find parameters W minimizing least squares loss for a model
5
Preliminary Foundations 4/31 Linear Ridge regularization Kernelization Instance weighting
6
Preliminary Foundations Principal Components Analysis - dimensionality reduction k-means – clustering Normalized Graph-cut – clustering 5/31 Weighted undirected graph nodes edges affinity matrix Graph partition problem: find a partition minimizing the total weight of edges connecting nodes in distinct subsets.
7
Preliminary Foundations Normalized Graph-cut – clustering 6/31 Partition indicator matrix Z Weighted degree matrix Total cut Normalized cut constraint objective From Xing & Jordan, 2003
8
Supervised Least Squares Regression Principle Component Analysis Unsupervised K-means Graph Norm Cut Least Squares Classification First contribution In literature 7/31
9
This paper 7/31 Supervised Least Squares Regression Principle Component Analysis Unsupervised K-means Graph Norm Cut Least Squares Classification Unification First contribution
10
Reverse Supervised Least Squares 8/31 Traditional forward least squares: predict the outputs from the inputs Reverse least squares: predict the inputs from the outputs Given reverse solutions U, the corresponding forward solutions W can be recovered exactly.
11
Reverse Supervised Least Squares 9/31 Ridge regularization Kernelization Instance weighting Reverse problem: Recover: Reverse problem: Recover:
12
Reverse Supervised Least Squares 10/31 For supervised learning with least squares loss forward and reverse perspectives are equivalent each can be recovered exactly from the other the forward and reverse losses are not identical since they are measured in different units – it is not principled to combine them directly!
13
Unsupervised Least Squares 11/31 Unsupervised learning: no training labels Y are given Principle: optimize over guessed labels Z forward reverse For any W, we can choose Z=XW to achieve zero loss It only gives trivial solutions It does not Work! It gives non-trivial solutions
14
Unsupervised Least Squares PCA 12/31 Proposition 1 Unconstrained reverse prediction is equivalent to principal components analysis. This connection has been made in Jong& Kotz, 1999, and the authors extend it to the kernelized cases Corollary 1 Kernelized reverse prediction is equivalent to kernel principal components analysis.
15
Unsupervised Least Squares PCA 13/31 Proposition 1 Unconstrained reverse prediction is equivalent to principal components analysis. Proof
16
Unsupervised Least Squares PCA 13/31 Proposition 1 Unconstrained reverse prediction is equivalent to principal components analysis. Proof Recall that The solution for Z is not unique
17
Unsupervised Least Squares PCA 14/31 Proposition 1 Unconstrained reverse prediction is equivalent to principal components analysis. Proof Consider the SVD of Z: Then The objective Solution
18
Unsupervised Least Squares k-means 15/31 Proposition 2 Constrained reverse prediction is equivalent to k-means clustering. The connection between PCA and k-means clustering has been made in Ding & He, 2004, but the authors show the connection of both to supervised (reverse) least squares. Corollary 2 Constrained kernelized reverse prediction is equivalent to kernel k-means.
19
Unsupervised Least Squares k-means 16/31 Proposition 2 Constrained reverse prediction is equivalent to k-means clustering. Proof Equivalent problem Consider the difference Diagonal matrix Counts of data in each class matrix Each row: sum of data in each class
20
Unsupervised Least Squares k-means 17/31 Proposition 2 Constrained reverse prediction is equivalent to k-means clustering. Proof means encoding
21
Unsupervised Least Squares k-means 18/31 Proposition 2 Constrained reverse prediction is equivalent to k-means clustering. Proof Therefore
22
Unsupervised Least Squares Norm-cut 19/31 Proposition 3 For a doubly nonnegative matrix K and weighting, weighted reverse prediction is equivalent to normalized graph-cut. Proof For any Z, the solution to the inner minimization Reduced objective
23
Unsupervised Least Squares Norm-cut 20/31 Proof Recall the normalized-cut (from Xing & Jordan, 2003) Proposition 3 For a doubly nonnegative matrix K and weighting, weighted reverse prediction is equivalent to normalized graph-cut. Since K doubly nonnegative, it could be a affinity matrix. The objective is equivalent to normalized graph-cut.
24
Unsupervised Least Squares Norm-cut 21/31 Corollary 3 The weighted least squares problem is equivalent to normalized graph-cut on if. With a specific K, we can relate normalized graph-cut to the reverse least squares.
25
Second contribution 22/31 Reverse Prediction Supervised Least Squares Learning Principle Component Analysis Unsupervised K-means Graph Norm Cut The figure is taken from Xu’s slides
26
22/31 Reverse Prediction Supervised Least Squares Learning Principle Component Analysis Unsupervised K-means Graph Norm Cut New Semi- Supervised The figure is taken from Xu’s slides Second contribution
27
Semi-supervised Least Squares 23/31 A principled approach: reverse loss decomposition The figure is taken from Xu’s slides Supervised reverse losses
28
Semi-supervised Least Squares 23/31 A principled approach: reverse loss decomposition The figure is taken from Xu’s slides Supervised reverse losses
29
Semi-supervised Least Squares 23/31 A principled approach: reverse loss decomposition The figure is taken from Xu’s slides Supervised reverse losses
30
Semi-supervised Least Squares 23/31 A principled approach: reverse loss decomposition The figure is taken from Xu’s slides Supervised reverse losses
31
Semi-supervised Least Squares 23/31 A principled approach: reverse loss decomposition The figure is taken from Xu’s slides Supervised reverse losses Unsupervised reverse losses
32
Semi-supervised Least Squares 24/31 Proposition 4 For any X, Y, and U where Supervised loss Unsupervised loss Squared distance Unsupervised loss depends only on the input data X; Squared distance depends on both X and Y. Note: we cannot get the true supervised loss since we don’t have all the labels Y. We may estimate it using only labeled data, or also using auxiliary unlabeled data.
33
Semi-supervised Least Squares 25/31 Corollary 4 For any U where Supervised loss estimate Unsupervised loss estimate Squared distance estimate Labeled data are scarce, but plenty of unlabeled data are available. The variance of the supervised loss estimate is strictly reduced by introducing the second term to get a better unbiased unsupervised loss estimate.
34
Semi-supervised Least Squares 26/31 A naive approach: Loss on labeled dataLoss on unlabeled data Advantages: The authors combine supervised and unsupervised reverse losses; while previous approaches combine unsupervised (reverse) loss with supervised (forward) loss, which are not in the same units. Compared to the principled approach, it admits more straightforward optimization procedures (alternating between U and Z)
35
Regression Experiments Least Squares + PCA 27/31 Two terms are not jointly convex no closed form solution Learning method: alternating (with a initial U got from supervised setting) Recovered forward solution Testing: given a new x, Can be kernelized Basic formulation
36
Regression Experiments Least Squares + PCA 28/31 Forward root mean squared error (mean± standard deviations for 10 random splits of the data) The values of (k, n; T L, T U ) are indicated for each data set. The table is taken from Xu’s paper
37
Classification Experiments Least Squares + k-means 29/31 Recovered forward solution Testing: given a new x,, predict max response Least Squares + Norm-cut
38
Classification Experiments Least Squares + k-means 30/31 Forward root mean squared error (mean± standard deviations for 10 random splits of the data) The values of (k, n; T L, T U ) are indicated for each data set. The table is taken from Xu’s paper
39
Conclusions 31/31 Two main contributions: 1.A unified framework based on reverse least squares loss is proposed for several existing supervised and unsupervised algorithms; 2.In the unified framework, a novel semi-supervised principle is proposed.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.