Neil Lawrence Machine Learning Group Department of Computer Science

Non-Linear Probabilistic PCA with Gaussian Process Latent Variable Models
Neil Lawrence Machine Learning Group Department of Computer Science University of Sheffield, U.K.

Overview Probabilistic Visualisation Algorithms
Probabilistic PCA Density Networks GTM Gaussian Process Latent Variable Models Optimisation Sparse algorithm Visualisation Inverse Kinematics Discussion

Notation q – dimension of latent/embedded space.
d – dimension of data space. N – number of data points. centred data, latent variables, mapping matrix, W 2 <d£q. a(i) is vector from i th row of A ai is a vector from i th column of A

Reading Notation X and Y are design matrices.
Covariance given by N-1YTY. Inner product matrix given by YYT.

Linear Embeddings Represent data, Y, with a lower dimensional embedding X. Assume a linear relationship of the form Probabilistically we implement this as …

PCA – Probabilistic Interpretation
X W Y W Y

Maximum Likelihood Solution
If Uq are first q eigenvectors of N-1YTY and the corresponding eigenvalues are q. where V is an arbitrary rotation matrix.

Non-Linear Embeddings
Probabilistic, non-linear, relationship between Y and X. Propagating distributions through non-linear mappings is problematic. Typically use point representations. Density Networks use sampling (MacKay) GTM uses a grid of points (Bishop, Svensen & Williams) GPLVM takes a different approach. Start with a novel probabilistic PCA.

PCA – Probabilistic Interpretation
X W Y W Y

Dual Probabilistic PCA
X W Y X Y

Maximum Likelihood Solution
If Uq are first q eigenvectors of d-1YYT and the corresponding eigenvalues are q. where V is an arbitrary rotation matrix.

Equivalence of PPCA Formulations
Solution for PPCA: Solution for Dual PPCA: Equivalence is from

Gaussian Processes A Gaussian Process (GP) likelihood is of the form
where K is the covariance function or kernel. If we select the linear kernel We see Dual PPCA is a product of GPs.

Dual Probabilistic PCA is a GPLVM
Log-likelihood:

Non-linear Kernel Instead of linear kernel function.
Use, for example, RBF kernel function. Leads to non-linear embeddings.

Pros & Cons of GPLVM Pros Cons Probabilistic
Missing data straightforward. Can sample from model given X. Different noise models can be handled. Kernel parameters can be optimised. Cons Speed of optimisation. Optimisation is non-convex.

GPLVM Optimisation Gradient based optimisation wrt X, , ,  (SCG).
Example data-set Oil flow data Three phases of flow (stratified annular homogenous) Twelve measurement probes 1000 data-points We sub-sampled to 100 data-points.

SCG GPLVM Oil Results 2-D Manifold in 12-D space (shading is precision).

Efficient GPLVM Optimisation
Optimising q£N matrix X is slow. There are correlations between data-points.

‘Sparsification’ If XI is a sub-set of X.
For well chosen active set, I, |I|<<N For n I we can optimise q-dimensional x(n) independently.

Algorithm We selected the active-set according the the IVM scheme
Select active set. Optimise ,  and . For all nI Optimise xn. For small data-sets optimise XI. Repeat.

Some Results As well as RBF kernel we will use `MLP kernel’.
Revisit Oil data Full training set this time. Used RBF kernel for GPLVM Compare with GTM.

Oil Data GTM GPLVM (RBF)

Different Kernels RBF Kernel MLP Kernel

Classification in Latent Space
Model Test Error GPLVM RBF 4.3 % GPLVM MLP 3.0 % GTM 2.0 % PCA 14 % Classify flow regimes in latent space.

Swiss Roll - Initialisation Aside

Local Minima PCA Initialised Isomap Initialised

Digits Data Digits data digits 0 to 4 from USPS data.
600 of each digit randomly selected. 16x16 greyscale images.

Digits RBF Kernel MLP Kernel
0 – red, 1 – green, 2 – blue, 3 – cyan, 4 – magenta.

PCA and GTM for Digits GTM PCA

Digits Classifications
Model Test Error GPLVM RBF 5.9 % GPLVM MLP 5.8 % GTM 3.7 % PCA 29 %

Demos Fantasy Digits. Face data Fantasy Brendans.
A video of Brendan Frey’s face. 1965 frames time info removed. 20x28 greyscale images. Fantasy Brendans.

Twos Data So far – Gaussian Noise. Can use Bernoulli likelihood.
Use ADF approximation. Can easily extend to EP. Practical Consequences About d times slower, need d times more storage. Twos data – 8x8 binary images modelled with Gaussian noise model. Bernoulli noise model.

Twos Data Twos data Cedar CD ROM digits. 700 examples of 8x8 twos.
Binary images.

Twos Results Gaussian Noise Model Bernoulli Noise Model

Reconstruction Experiment
Reconstruction Method Pixel Error Rate GPLVM Bernoulli noise 23.5 % GPLVM Gaussian noise 35.9 % Missing pixels not ink 51.5 %

Horse Colic Data Horse colic data Ordinal data. Many missing values.

Horse Colic Data Linear MLP RBF death – green survival – red
put down – blue RBF

Classifying Colic Outcome
Increase dimension of latent space. Corresponding decrease outcome prediction error. repeat experiments for different train/test partitions Trend is similar for each partition

Inverse Kinematics Style-Based Inverse Kinematics Keith Grochow, Steve L. Martin, Aaron Hertzmann, Zoran Popović. ACM Trans. on Graphics (Proc. SIGGRAPH 2004). Learn a GPLVM on motion capture data. Use GPLVM as ‘soft style constraint’ in combination with hard kinematics constraints.

Video styleik.mov

Why GPLVM in IK? My thoughts: Inverse Kinematics
GPLVM is probabilistic (soft constraints). GPLVM can capture non-linearities in the data. Inverse Kinematics Can be viewed as missing value problem. GPLVM handles missing values well. Grochow et al. also mix styles by mixing GPLVMs.

On-line Code Source code for GPLVM available Jun ’03.
Grochow et al. Downloaded in August ’03 Submitted Jan ’04 to SIGGRAPH. Sheffield has sub-licensed source code to University of Washington. On-line source code is good!

Kernel PCA, MDS and GPLVMs
Max likelihood ´ min Kullback Leibler divergence. PCA – minimise KL divergence between Gaussians Ky non-linear – kernel PCA Kx non-linear – GPLVM

KL Divergence as an Objective
Points towards what it means to use a non-positive definite similarity measure in metric MDS. Should provide an approach to dealing with missing data in Kernel PCA. Usable for kernel parameter optimisation in KPCA? Unifies GPLVM, PCA, Kernel PCA, metric MDS in one framework.

Interpretation Kernels are similarity measures.
Express correlations between points. GPLVM and KPCA try to match them. cf. MDS and principal co-ordinate analysis.

Conclusions GPLVM is a Probabilistic Non-Linear PCA
Can sample from it. Evaluate likelihoods. Handle non-continuous data Missing data no problem. Optimisation of X is the difficult part. We presented a sparse optimisation algorithm. Model has been ‘proven’ in real application. Put your source code on-line!

Neil Lawrence Machine Learning Group Department of Computer Science

Similar presentations

Presentation on theme: "Neil Lawrence Machine Learning Group Department of Computer Science"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neil Lawrence Machine Learning Group Department of Computer Science

Similar presentations

Presentation on theme: "Neil Lawrence Machine Learning Group Department of Computer Science"— Presentation transcript:

Similar presentations

About project

Feedback