Download presentation
Presentation is loading. Please wait.
1
Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA
2
What does the world look like? High level image statistics Object Recognition for large-scale search Gigantic Image Collections
3
Spectrum of Label Information Human annotationsNoisy labels Unlabeled
4
Semi-Supervised Learning using Graph Laplacian V = data points E = n x n affinity matrix W Graph Laplacian: [Zhu03,Zhou04]
5
SSL using Graph Laplacian If labeled: If unlabeled: Want to find label function f that minimizes: y = labels, λ = weights Rewrite as: Straightforward solution SmoothnessAgreement with labels
6
Smooth vectors will be linear combinations of eigenvectors U with small eigenvalues: Eigenvectors of Laplacian [Belkin & Niyogi 06, Schoelkopf & Smola 02, Zhu et al 03, 08]
7
Rewrite System Let U = smallest k eigenvectors of L, α = coeffs. Optimal is now solution to k x k system:
8
Computational Bottleneck Consider a dataset of 80 million images Inverting L –Inverting 80 million x 80 million matrix Finding eigenvectors of L –Diagonalizing 80 million x 80 million matrix
9
Large Scale SSL - Related work Nystrom method: pick small set of landmark points –Compute exact solution on these –Interpolate solution to rest Others iteratively use classifiers to label data –E.g. Boosting-based method of Loeff et al. ICML’08 [see Zhu ‘08 survey] DataLandmarks
10
Our Approach
11
Overview of Our Approach DataLandmarks Density Reduce n Limit as n ∞ NystromOurs
12
Consider Limit as n ∞ Consider x to be drawn from 2D distribution p(x) Let L p (F) be a smoothness operator on p(x), for a function F(x): Analyze eigenfunctions of L p (F) where 2
13
Eigenvectors & Eigenfunctions
14
Claim: If p is separable, then: Eigenfunctions of marginals are also eigenfunctions of the joint density, with same eigenvalue p(x 1,x 2 ) p(x 1 ) p(x 2 ) Key Assumption: Separability of Input data [Nadler et al. 06,Weiss et al. 08]
15
Numerical Approximations to Eigenfunctions in 1D 300k points drawn from distribution p(x) Consider p(x 1 ) p(x) Data p(x 1 ) Histogram h(x 1 )
16
Solve for values g of eigenfunction at set of discrete locations (histogram bin centers) –and associated eigenvalues –B x B system (# histogram bins = 50) P is diag(h(x 1 )) Numerical Approximations to Eigenfunctions in 1D Affinity between discrete locations
17
1D Approximate Eigenfunctions Solve 1 st Eigenfunction of h(x 1 ) 2 nd Eigenfunction of h(x 1 ) 3 rd Eigenfunction of h(x 1 )
18
Separability over Dimension Build histogram over dimension 2: h(x 2 ) Now solve for eigenfunctions of h(x 2 ) 1 st Eigenfunction of h(x 2 ) 2 nd Eigenfunction of h(x 2 ) 3 rd Eigenfunction of h(x 2 ) Data
19
From Eigenfunctions to Approximate Eigenvectors Take each data point Do 1-D interpolation in each eigenfunction k dimensional vector (for k eigenfunctions) Very fast operation (has to be done nk times) Histogram bin 150 Eigenfunction value
20
Preprocessing Need to make data separable Rotate using PCA Not separable Separable Rotate
21
Overall Algorithm 1.Rotate data to maximize separability (currently use PCA) 2.For each dimension: –Construct 1D histogram –Solve numerically for eigenfunctions/values 3.Order eigenfunctions from all dimensions by increasing eigenvalue & take first k 4.Interpolate data into k eigenfunctions –Yields approximate eigenvectors of Normalized Laplacian 5.Solve k x k least squares system to give label function
22
Experiments on Toy Data
23
Comparison of Approaches
24
Data
25
Nystrom Comparison Too few landmark points results in highly unstable eigenvectors
26
Nystrom Comparison Eigenfunctions fail when data has significant dependencies between dimensions
27
Experiments on Real Data
28
Experiments Images from 126 classes downloaded from Internet search engines, total 63,000 images Dump truck Emu Labels (correct/incorrect) provided by Geoff Hinton, Alex Krizhevsky, Vinod Nair (U. Toronto and CIFAR)
29
Input Image Representation Pixels not a convenient representation Use Gist descriptor (Oliva & Torralba, 2001) PCA down to 64 dimensions L2 distance btw. Gist vectors rough substitute for human perceptual distance
30
Are Dimensions Independent? Joint histogram for pairs of dimensions from raw 384-dimensional Gist PCA Joint histogram for pairs of dimensions after PCA MI is mutual information score. 0 = Independent
31
Real 1-D Eigenfunctions of PCA’d Gist descriptors Eigenfunction 1 Eigenfunction 256 Input Dimension Eigenfunction value Color = Input dimension x min x max Histogram bin 150
32
Protocol Task is to re-rank images of each class Measure precision @ 15% recall Vary # of labeled examples Chance level performance is 33% Total of 63,000 images
37
80 Million Images
38
Running on 80 million images PCA to 32 dims, k=48 eigenfunctions Precompute approximate eigenvectors (~20Gb) For each class, labels propagating through 80 million images
40
Summary Semi-supervised scheme that can scale to really large problems Rather than sub-sampling the data, we take the limit of infinite unlabeled data Assumes input data distribution is separable Can propagate labels in graph with 80 million nodes in fractions of second
42
Future Work Can potentially use 2D or 3D histograms instead of 1D –Requires more data Consider diagonal eigenfunctions Sharing of labels between classes
43
Are Dimensions Independent? Joint histogram for pairs of dimensions from raw 384-dimensional Gist PCA Joint histogram for pairs of dimensions after PCA MI is mutual information score. 0 = Independent
44
Are Dimensions Independent? Joint histogram for pairs of dimensions from raw 384-dimensional Gist ICA Joint histogram for pairs of dimensions after ICA MI is mutual information score. 0 = Independent
46
Overview of Our Approach Existing large-scale SSL methods try to reduce # points We consider what happens as n ∞ Eigenvectors Eigenfunctions Assume input distribution is separable Make crude numerical approx. to Eigenfunctions Interpolate data in these approximate eigenfunctions to give approx. eigenvalues
47
Eigenfunctions Eigenfunction are limit of Eigenvectors as n ∞ Analytical forms of eigenfunctions exist only in a few cases: Uniform, Gaussian Instead, we calculate numerical approximation to eigenfunctions [Nadler et al. 06,Weiss et al. 08] [Coifman et al. 05, Nadler et al. 06, Belikin & Niyogi 07]
48
Complexity Comparison Nystrom Select m landmark points Get smallest k eigenvectors of m x m system Interpolate n points into k eigenvectors Solve k x k linear system Eigenfunction Rotate n points Form d 1-D histograms Solve d linear systems, each b x b k 1-D interpolations of n points Solve k x k linear system Key: n = # data points (big, >10 6 ) l = # labeled points (small, <100) m = # landmark points d = # input dims (~100) k = # eigenvectors (~100) b = # histogram bins (~50) Polynomial in # landmarksLinear in # data points
49
Can’t build accurate high dimensional histograms –Need too many points Currently just use 1-D histograms –2 or 3D ones possible with enough data This assumes distribution is separable –Assume p(x) = p(x 1 ) p(x 2 ) … p(x d ) For separable distributions, eigenfunctions are also separable Key Assumption: Separability of Input data [Nadler et al. 06,Weiss et al. 08]
50
Varying # Training Examples
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.