Variational Autoencoders Theory and Extensions

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

Greedy Layer-Wise Training of Deep Networks
Deep Learning Bing-Chen Tsai 1/21.
CS590M 2008 Fall: Paper Presentation
Advanced topics.
Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Deep Learning.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y )Prof. K S Venkatesh.
Image Denoising and Inpainting with Deep Neural Networks Junyuan Xie, Linli Xu, Enhong Chen School of Computer Science and Technology University of Science.
24 November, 2011National Tsin Hua University, Taiwan1 Mathematical Structures of Belief Propagation Algorithms in Probabilistic Information Processing.
Varieties of Helmholtz Machine Peter Dayan and Geoffrey E. Hinton, Neural Networks, Vol. 9, No. 8, pp , 1996.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Students: Meera & Si Mentor: Afshin Dehghan WEEK 4: DEEP TRACKING.
Robustness to Adversarial Examples
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
Deep Visual Analogy-Making
Weight Uncertainty in Neural Networks
Asymptotic Behavior of Stochastic Complexity of Complete Bipartite Graph-Type Boltzmann Machines Yu Nishiyama and Sumio Watanabe Tokyo Institute of Technology,
Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.
Deep learning Tsai bing-chen 10/22.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Variational Autoencoders Presentation by Yuri Burda CS2523, University of Toronto.
Deep Learning Primer Swadhin Pradhan Reading Group Presentation 03/30/2016, UT Austin.
Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)
Ke (Kevin) Wu1,2, Philip Watters1, Malik Magdon-Ismail1
Learning Deep Generative Models by Ruslan Salakhutdinov
Deep Neural Net Scenery Generation
A practical guide to learning Autoencoders
Speeding up Gradient-Based Inference and Learning in deep/recurrent Bayes Nets with Continuous Latent Variables (by a neural network equivalence) Preprint:
Registration of Pathological Images
Generative Adversarial Networks
References [1] - Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11): ,
CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Spring Courses CSCI 5922 – Probabilistic Models (Mozer) CSCI Mind Reading Machines (Sidney D’Mello) CSCI 7000 – Human Centered Machine Learning.
Multimodal Learning with Deep Boltzmann Machines
Deep Learning Yoshua Bengio, U. Montreal
Intelligent Information System Lab
Neural networks (3) Regularization Autoencoder
Structure learning with deep autoencoders
Unsupervised Learning and Autoencoders
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Department of Electrical and Computer Engineering
ECE 599/692 – Deep Learning Lecture 9 – Autoencoder (AE)
Basics of Deep Learning No Math Required
Deep Learning.
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Representation Learning with Deep Auto-Encoder
Autoencoders hi shea autoencoders Sys-AI.
实习生汇报 ——北邮 张安迪.
Neural networks (3) Regularization Autoencoder
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Deep learning enhanced Markov State Models (MSMs)
Report Yang Zhang.
Search-Based Approaches to Accelerate Deep Learning
Cengizhan Can Phoebe de Nooijer
CSC 578 Neural Networks and Deep Learning
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Variational Autoencoders Theory and Extensions Xiao Yang Deep learning Journal Club March 29

Variational Inference Use a simple distribution to approximate a complex distribution Variational parameter: Gaussian distribution: 𝜇, 𝜎 Gaussian mixture: 𝜇 , 𝜎 , [𝑤]

Autoencoder basic denoising variational http://www.cs.unc.edu/~eunbyung/papers/manifold_variational.pdf

Why Variational Autoencoder… when we have Boltzmann Machine? Directed models are more useful these days Cannot build recurrent model using Boltzmann Machine Need to be deep, but does not help too much when we have denoising autoencoder?* Mathematically reasonable, but the result is meh. Need to manually tune hyperparameters, and not very representative *Generalized Denoising Auto-Encoders as Generative Models, Yoshua Bengio, et al., NIPS 2013

Variational Autoencoders Auto-Encoding Variational Bayes, Diederik P. Kingma, Max Welling, ICLR 2014

Theory: Variational Inference X: data Z: latent variable (hidden layer value) 𝜙: Inference network parameter (encoder: 𝑞 ϕ (z|x)) Θ: generative network parameter (decoder: 𝑝 θ (x|z))

Theory: Variational Inference Posterior distribution: Goal: use variational posterior 𝑞 ϕ (z|x) to approximate true posterior 𝑝 θ (z|x) Intractable posterior!

Theory: Variational Inference Minimize KL-divergence between the variational posterior and true posterior Finding 1: is constant Minimizing = maximizing Finding 2: KL-divergence is non-negative Variational lower bound of data likelihood

Variational Lower Bound of data likelihood Regularization term Reconstruction term

The Reparameterization Trick Problem with respect to the VLB: updating ϕ 𝑧~ 𝑞 ϕ (𝑧|𝑥) : need to differentiate through the sampling process w.r.t ϕ (encoder is probablistic)

The Reparameterization Trick Solution: make the randomness independent of encoder output, making the encoder deterministic Gaussian distribution example: Previously: encoder output = random variable 𝑧~𝑁(𝜇, 𝜎) Now encoder output = distribution parameter [𝜇, 𝜎] 𝑧=𝜇+𝜖 ∗𝜎, 𝜖~𝑁(0, 1)

Result

Result

Importance Weighted Autoencoders Importance Weighted Autoencoders, Yuri Burda, Roger Grosse & Ruslan Salakhutdinov, ICLR 2016

Different Lower bound Lowerbound for VAE Lowerbound for IWAE Difference Single 𝑧 v.s. Multiple independent 𝑧 Different weighting when sampling multiple 𝑧

Sampling difference VAE: 1 random 𝑧, sample k times: Gradient: IWAE: k random 𝑧, sample 1 time for each 𝑧 Gradient:

Sampling difference VAE gradient IWAE gradient Monte Carlo sampling Importance weighted sampling

Result

Posterior heatmap VAE IWAE, k=5 IWAE, k=50

Denoising Variational Autoencoders Denoising Criterion for Variational Auto-encoding Framework, Daniel Jiwoong Im, Sungjin Ahn, Roland Memisevic, Yoshua Bengio, http://arxiv.org/abs/1511.06406

Denoising for Variational Autoencoders? Variational autoencoder: uncertainty in the hidden layer Denoising autoencoder: noise in the input layer Combination?

Posterior for Denoising VAE Image corruption distribution (adding noise): Original variational posterior distribution (encoder network): Variational posterior distribution for denoising:

Posterior for Denoising VAE : Gaussian : Mixture of Gaussian

What does this lowerbound even mean? Maximizing 𝐿 𝑉𝐴𝐸 = Minimizing Maximizing 𝐿 𝐷𝑉𝐴𝐸 = Minimizing Tends to be more robust!

Training procedure 1. Add noise to the input, then send to the network 2. That is it. No difference for anything else Can be used for both VAE and IWAE.

Test result

Test result

Deep Convolutional Inverse Graphics Network Tejas D. Kulkarni , Will Whitney , Pushmeet Kohli , Joshua B. Tenenbaum NIPS 2015

Hidden Layer = Transformation attributes

Transformation specific training

Manipulating Image = Changing Hidden Layer Value

DRAW: A Recurrent Neural Network For Image Generation Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra ICML 2015

Variational Recurrent Network

Example

Further reading Adversarial Autoencoders A. Makhzani, et al., ICLR 2016 Adversarial learning for better posterior representation

Further reading The Variational Fair Autoencoder Christos Louizos, et al., ICLR 2016 Remove unwanted sources of variation from data

Further reading The Variational Gaussian Process Dustin Tran, et al, ICLR 2016 Generalization of the variational inference for deep network Model highly complex posterior