Goodfellow: Chapter 14 Autoencoders

Goodfellow: Chapter 14 Autoencoders
Dr. Charles Tappert The information here, although greatly condensed, comes almost entirely from the chapter content.

Chapter 14 Sections Introduction 1. Undercomplete Autoencoders
2. Regularized Autoencoders 3. Representational Power, Layer Size and Depth 4. Stochastic Encoders and Decoders 5. Denoising Autoencoders 6. Learning Manifolds with Autoencoders 7. Contractive Autoencoders 8. Predictive Sparse Decomposition 9. Applications of Autoencoders

Introduction An autoencoder is a neural network trained to copy its input to its output Network has encoder and decoder functions Autoencoders should not copy perfectly But restricted by design to copy only approximately By doing so, it learns useful properties of the data Modern autoencoders use stochastic mappings Autoencoders were traditionally used for Dimensionality reduction as well as feature learning

Structure of an Autoencoder
Hidden layer (code) h g f Input Reconstruction Figure 14.1 (Goodfellow 2016)

1. Undercomplete Autoencoders
Design an autoencoder to copy approximately By doing so, it learns useful properties of the data One way makes dimension h < dimension x Undercomplete: h has smaller dimension than x Overcomplete: h has greater dimension than x Learning process minimizes loss function

1. Undercomplete Autoencoders
Principle Component Analysis (PCA) An undercomplete autoencoder, linear decoder and MSE loss function, learns same subspace as PCA Nonlinear encoder/decoder functions yield more powerful nonlinear generalizations of PCA

Avoiding Trivial Identity
Undercomplete autoencoders h has lower dimension than x f or g has low capacity (e.g., linear g) Must discard some information in h Overcomplete autoencoders h has higher dimension than x Must be regularized

2. Regularized Autoencoders
Allow overcomplete case but regularize Use a loss model that encourages properties other than copying the input to the output Sparsity of representation Robustness to noise or missing inputs Smallness of the derivative of the representation

2.1 Sparse Autoencoders Limit capacity by adding a term to the cost function penalizing the code for being larger Sparse autoencoders usually used to learn features for another task, such as classification

2.2 Denoising Autoencoder
Rather than adding a penalty to the cost function, we minimize where x tilde is a copy of x corrupted by noise

2.2 Denoising Autoencoder
h g f x˜ L C: corruption process (introduce noise) C(x˜ | x) L = - log pdecoder(x | h = f (x˜)) Figure 14.3 (Goodfellow 2016)

2.3 Regularizing by Penalizing Derivatives
Cost function with derivative penalty Forces model to learn a function that does not change much when x changes slightly

3. Representational Power, Layer Size, and Depth
Autoencoders are often trained with only a single layer encoder and a single layer decoder Using deep encoders and decoders offers the advantages of usual feedforward networks

4. Stochastic Encoders and Decoders
Modern autoencoders use stochastic mappings We can generalize the notion of the encoding and decoding functions to encoding and decoding distributions

Stochastic Autoencoders
pdecoder(x | h) pencoder(h | x) Figure 14.2 (Goodfellow 2016)

5. Denoising Autoencoders
A denoising autoencoder (DAE) is one that receives a corrupted data point as input and is trained to predict the original, uncorrupted data point as its output Learn the reconstructed distribution Choose a training sample from the training data Obtain corrupted version from corruption process Use training sample pair to estimate reconstruction

Denoising Autoencoder
h g f x˜ L C: corruption process (introduce noise) C(x˜ | x) L = - log pdecoder(x | h = f (x˜)) x Figure 14.3 (Goodfellow 2016)

5.1 Estimating the Score Score matching is a statistical alternative to maximum likelihood Fit a density model by matching score of model to score of data Some denoising autoencoders are equivalent to score matching applied to density models

5.1 Estimating the Score Following background comes from 5.11.3
A manifold is a connected region Locally it appears to be a Euclidean space Dimension of space smaller than original space Example: we experience the surface of the world as a 2D plane but it is in reality a 3D space

Denoising Autoencoders
Learn a Manifold A corrupted point is local mapped back to the original point x˜ g o f x˜ C(x˜ | x) Figure 14.4 (Goodfellow 2016)

Vector Field Learned by a Denoising Autoencoder
2D vector field around a 1D curved manifold where the data concentrates (Goodfellow 2016)

6. Learning Manifolds with Autoencoders
Like other machine learning algorithms, autoencoders exploit the idea that data concentrates around a low-dimensional manifold Autoencoders take the idea further and aim to learn the structure of the manifold

Tangent Hyperplane of a Manifold
1D manifold in 784-D pixel MNIST space Figure 14.6 (Goodfellow 2016)

Learning a Collection of 0-D Manifolds by Resisting Perturbation
1.0 Identity Optimal reconstruction 0.8 0.6 0.4 0.2 r(x) 0.0 x1 x Figure 14.7 x0 x2 Invariant to small perturbations near data points (Goodfellow 2016)

Non-Parametric Manifold Learning with Nearest-Neighbor Graphs
Nonparametric manifold NN graph Figure 14.8 (Goodfellow 2016)

Each local patch is a locally flat Gaussian “pancake”
Tiling a Manifold with Local Coordinate Systems Each local patch is a locally flat Gaussian “pancake” Figure 14.9 (Goodfellow 2016)

7. Contractive Autoencoders
The contractive autoencoder (CAE) uses a regularizer to make the derivatives of f(x) as small as possible The name contractive arises from the way the CAE warps space The input neighborhood is contracted to a smaller output neighborhood The CAE is contractive only locally

Contractive Autoencoders
@f (x) 2 ⌦(h) = A . (14.18) @x F Input point Tangent vectors Local PCA (no sharing across regions) Contractive autoencoder Figure 14.10 (Goodfellow 2016)

8. Predictive Sparse Decomposition
Predictive Sparse Decomposition (PSD) is a model that is a hybrid of sparse coding and parametric autoencoders The model consists of an encoder and decoder that are both parametric Predictive sparse coding is an example of learned approximate inference (section 19.5)

9. Applications of Autoencoders
Autoencoder applications Feature learning Good features can be obtained in the hidden layer Dimensionality reduction For example, a 2006 study resulted in better results than PCA, with the representation easier to interpret and the categories manifested as well-separated clusters Information retrieval A task that benefits more than usual from dimensionality reduction is the information retrieval task of finding entries in a database that resemble a query entry

Goodfellow: Chapter 14 Autoencoders

Similar presentations

Presentation on theme: "Goodfellow: Chapter 14 Autoencoders"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Goodfellow: Chapter 14 Autoencoders

Similar presentations

Presentation on theme: "Goodfellow: Chapter 14 Autoencoders"— Presentation transcript:

Similar presentations

About project

Feedback