PixelGAN Autoencoders Alireza Makhzani, Brendan Frey University of Toronto Liu ze Dec 30th, 2017 中国科学技术大学 University of Science and Technology of China
Outline 1. Background 2. PixelGAN Autoencoders 3. Experiments PixelCNNs Variational Autoencoders Adversarial Autoencoders 2. PixelGAN Autoencoders Limitations of VAE/AAE Structure and Training Benefits of PixelGAN Autoencoders 3. Experiments 4. Conclusion
Outline 1. Background 2. PixelGAN Autoencoders 3. Experiments PixelCNNs Variational Autoencoders Adversarial Autoencoders 2. PixelGAN Autoencoders Limitations of VAE/AAE Structure and Training Benefits of PixelGAN Autoencoders 3. Experiments 4. Conclusion
PixelCNNs
Contional PixelCNNs h h
Conditional PixelCNNs h h Learn the image statistics directly at the pixel level. Good at modelling low-level pixel statistics. Conditional PixelCNNs can learn conditional densities. Samples lack global structure. Lacking latent representation.
Variational Autoencoders Good at capturing the global structure, but samples are blurry.
Adversarial Autoencoders Code Space of MNIST: Gaussian Prior Mixture of Gaussians
Outline 1. Background 2. PixelGAN Autoencoders 3. Experiments PixelCNNs Variational Autoencoders Adversarial Autoencoders 2. PixelGAN Autoencoders Limitations of VAE/AAE Structure and Training Benefits of PixelGAN Autoencoders 3. Experiments 4. Conclusion
Limitations of VAE/AAE ✦ All the image statistics are captured by the single latent vector. VAE label, style global and local p(z) Latent Variable Deterministic (factorized Gaussians) p(x|z) None
Structure and Training Cost function of PixelGAN = Reconstruction + Adversarial Cost
Benefits of PixelGAN Autoencoders ✦ The image statistics are captured jointly by the latent vector and the autoregressive decoder. p(z) Latent Variable p(x|z) PixelCNN
Benefits of PixelGAN Autoencoders ✦ The image statistics are captured jointly by the latent vector and the autoregressive decoder. PixelGAN (Gaussian) PixelGAN (Categorical) Discrete (label) Global (low-frequency) p(z) Latent Variable Local (high-frequency) Continuous (Style) p(x|z) PixelCNN
PixelGAN (Categorical) Benefits of PixelGAN Autoencoders ✦ The image statistics are captured jointly by the latent vector and the autoregressive decoder. PixelGAN (Gaussian) PixelGAN (Categorical) Discrete (label) Global (low-frequency) p(z) Latent Variable Local (high-frequency) Continuous (Style) p(x|z) PixelCNN Semi-supervised Learning
Outline 1. Background 2. PixelGAN Autoencoders 3. Experiments PixelCNNs Variational Autoencoders Adversarial Autoencoders 2. PixelGAN Autoencoders Limitations of VAE/AAE Structure and Training Benefits of PixelGAN Autoencoders 3. Experiments 4. Conclusion
Global vs. Local Decomposition
Code Space Code Space of MNIST:
PixelGAN Autoencoders with Categorical Priors
Discrete vs. Continuous Decomposition (Clustering)
Discrete vs. Continuous Decomposition (Clustering)
Unsupervised Clustering 5% Error rate
Semi-supervised Learning
Semi-supervised Classification
Outline 1. Background 2. PixelGAN Autoencoders 3. Experiments PixelCNNs Variational Autoencoders Adversarial Autoencoders 2. PixelGAN Autoencoders Limitations of VAE/AAE Structure and Training Benefits of PixelGAN Autoencoders 3. Experiments 4. Conclusion
Unsupervised Clustering A Proposed the PixelGAN autoencoder, which is a generative autoencoder that combines a generative PixelCNN with a GAN inference network that can impose arbitrary priors on the latent code. B Showed that imposing different distributions as the prior enables us to learn a latent representation that captures the type of statistics that we care about, while the remaining structure of the image is captured by the PixelCNN decoder. C Demonstrate the application of PixelGAN autoencoders in downstream tasks such as semi-supervised learning; Discussed how these architectures have other potentials such as learning cross-domain relations between two different domains
Thank you!