Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.

Slides:



Advertisements
Similar presentations
Deep Learning Early Work Why Deep Learning Stacked Auto Encoders
Advertisements

Greedy Layer-Wise Training of Deep Networks
Deep Belief Nets and Restricted Boltzmann Machines
Deep Learning Bing-Chen Tsai 1/21.
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CS590M 2008 Fall: Paper Presentation
Advanced topics.
Supervised Learning Recap
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Deep Learning.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Deep Learning Early Work Why Deep Learning Stacked Auto Encoders
Deep Belief Networks for Spam Filtering
Restricted Boltzmann Machines and Deep Belief Networks
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y )Prof. K S Venkatesh.
Autoencoders Mostafa Heidarpour
Comp 5013 Deep Learning Architectures Daniel L. Silver March,
How to do backpropagation in a brain
Deep Boltzmann Machines
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
A shallow introduction to Deep Learning
Varieties of Helmholtz Machine Peter Dayan and Geoffrey E. Hinton, Neural Networks, Vol. 9, No. 8, pp , 1996.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
CSC2515 Lecture 10 Part 2 Making time-series models with RBM’s.
Introduction to Deep Learning
Cognitive models for emotion recognition: Big Data and Deep Learning
Copyright© 2012, D-Wave Systems Inc. 1 Quantum Boltzmann Machine Mohammad Amin D-Wave Systems Inc.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.
Deep learning Tsai bing-chen 10/22.
CSC2515 Fall 2008 Introduction to Machine Learning Lecture 8 Deep Belief Nets All lecture slides will be available as.ppt,.ps, &.htm at
Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
1 Restricted Boltzmann Machines and Applications Pattern Recognition (IC6304) [Presentation Date: ] [ Ph.D Candidate,
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Variational Autoencoders Theory and Extensions
Deep Learning Primer Swadhin Pradhan Reading Group Presentation 03/30/2016, UT Austin.
Big data classification using neural network
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Learning Deep Generative Models by Ruslan Salakhutdinov
Deep Learning Amin Sobhani.
Energy models and Deep Belief Networks
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Matt Gormley Lecture 16 October 24, 2016
Restricted Boltzmann Machines for Classification
Multimodal Learning with Deep Boltzmann Machines
ICS 491 Big Data Analytics Fall 2017 Deep Learning
Deep Learning Yoshua Bengio, U. Montreal
Deep Learning Qing LU, Siyuan CAO.
Structure learning with deep autoencoders
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Department of Electrical and Computer Engineering
Deep learning Introduction Classes of Deep Learning Networks
Deep Architectures for Artificial Intelligence
ECE 599/692 – Deep Learning Lecture 9 – Autoencoder (AE)
Goodfellow: Chapter 14 Autoencoders
Deep Belief Nets and Ising Model-Based Network Construction
Representation Learning with Deep Auto-Encoder
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
LECTURE 34: Autoencoders
Autoencoders David Dohan.
CSC 578 Neural Networks and Deep Learning
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤

Introduction Deep architectures for various levels of representations –Implicitly learn representations –Layer-by-layer unsupervised training Generative model –Stack Restricted Boltzmann Machines (RBMs) –Forms a Deep belief network (DBN) Discriminative model –Stack Auto-encoders (AEs) –Multi-layered classifier

Generative Model Given a training set {x i } n, –Construct a generative model that produces samples of the same distribution –Start with sigmoid belief networks Need parameters for each component of the top-most layer: i.e. Bernoulli priors

Deep Belief Network Same as sigmoid BN, but with different top-layer structure –Use RBM to model the top layer Restricted Boltzmann Machine: (More on next slide) –Divided into hidden and visible layers (2 levels) –Connection forms a bipartite graph Called Restricted because no connection among same-layer units

Restricted Boltzmann Machines Energy-based model for hidden-visible joint distribution –Or express as a distribution of the visible variable:

RBMs (Cont’d) How posteriors factorize: notice how the energy is of the form –Then,

More on Posteriors Using the same factorization trick, we can compute the posterior: –Posterior on visible units can be derived similarly Due to factorization, Gibbs sampling is easy: This is just the sigmoid function for binomial h

Training RBMs Given parameters θ={W, b, c} Compute log-likelihood gradient for steepest ascent method –The first term is OK, but the second term is intractable, due to partition function –Use k-step Gibbs sampling to approximately sample for second term –k=1 performs well empirically

Training DBNs Every time we see a sample x, we lower the energy of the distribution at that point Start from the bottom layer and move up and train unsupervised –Each layer has its own set of parameters *Q(.) is the RBM posterior for the hidden variables

How to sample from DBNs 1.Sample a visible h l-1 from the top-level RBM (using Gibbs) 2.For k = l – 1 to 1 Sample h k-1 ~ P(. | h k ) from the DBN model 3.x = h 0 is the final sample

Discriminative Model Receive input x to classify –Unlike DBNs, which didn’t have inputs Multi-layer neural network should do –Use auto-encoders to discover compact representations –Use denoising AEs to add robustness to corruption

Auto-encoders A neural network where Input = Output –Hence its name “auto” –But has one hidden layer for input representation y z d-dimensional d'-dimensional (lower dimensional representation - d‘ < d is necessary to avoid learning identity function) x

AE Mechanism Parameterize each layer with parameter θ={W, b} Aim to “reconstruct” the input by minimizing reconstruction error –where, Can train in an “unsupervised” way –for any x in training set, train AE to reconstruct x

Denoising Auto-encoders Also need to be robust to missing data –Same structure as regular AE –But train against corrupted inputs –Arbitrarily remove a fixed portion of input component Rationale: Latent structure learning is important for re-building missing data –The hidden layer will learn the structure representation

Training Stacked DAEs Stack the DAEs to form a deep architecture –Take each DAE’s hidden layer –This hidden layer becomes the next layer Training is simple. Given training set {(x i, y i )}, –Initialize each layer (sequentially) in an unsupervised fashion –Each layer’s output is fed as inputs to the next layer –Finally tune the entire architecture with supervised learning using training set

References [Bengio, 2009] Yoshua Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning. Vol. 2, No. 1, [Vincent et al., 2008] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In proceedings of ICML 2008.