Deep Architectures for Artificial Intelligence

Slides:

Advertisements

Similar presentations

The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel

Advertisements

Greedy Layer-Wise Training of Deep Networks

Deep Belief Nets and Restricted Boltzmann Machines

Deep Learning Bing-Chen Tsai 1/21.

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)

CIAR Second Summer School Tutorial Lecture 2a Learning a Deep Belief Net Geoffrey Hinton.

Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.

CS590M 2008 Fall: Paper Presentation

Advanced topics.

Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.

CIAR Summer School Tutorial Lecture 2b Learning a Deep Belief Net

Deep Belief Networks for Spam Filtering

Restricted Boltzmann Machines and Deep Belief Networks

CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

Can computer simulations of the brain allow us to see into the mind? Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto.

How to do backpropagation in a brain

Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.

A shallow introduction to Deep Learning

Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.

Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.

CIAR Second Summer School Tutorial Lecture 1a Sigmoid Belief Nets and Boltzmann Machines Geoffrey Hinton.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.

CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.

Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.

How to learn a generative model of images Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.

Introduction to Deep Learning

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Deep learning Tsai bing-chen 10/22.

CSC2515 Fall 2008 Introduction to Machine Learning Lecture 8 Deep Belief Nets All lecture slides will be available as.ppt,.ps, &.htm at

Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.

CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.

Machine Learning Supervised Learning Classification and Regression

Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton

Learning Deep Generative Models by Ruslan Salakhutdinov

Convolutional Neural Network

Deep Learning Amin Sobhani.

Energy models and Deep Belief Networks

CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.

Matt Gormley Lecture 16 October 24, 2016

Restricted Boltzmann Machines for Classification

LECTURE ??: DEEP LEARNING

Multimodal Learning with Deep Boltzmann Machines

Deep Learning Yoshua Bengio, U. Montreal

Neural networks (3) Regularization Autoencoder

Supervised Training of Deep Networks

Deep learning and applications to Natural language processing

Deep Learning Qing LU, Siyuan CAO.

Deep Belief Networks Psychology 209 February 22, 2013.

Structure learning with deep autoencoders

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

Restricted Boltzman Machines

Department of Electrical and Computer Engineering

Deep learning Introduction Classes of Deep Learning Networks

Neuro-Computing Lecture 4 Radial Basis Function Network

Deep Belief Nets and Ising Model-Based Network Construction

Neural Network - 2 Mayank Vatsa

CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.

CSC 2535: Computation in Neural Networks Lecture 9 Learning Multiple Layers of Features Greedily Geoffrey Hinton.

Neural networks (3) Regularization Autoencoder

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Introduction to Neural Networks

Sanguthevar Rajasekaran University of Connecticut

Autoencoders David Dohan.

CSC 578 Neural Networks and Deep Learning

Presentation transcript:

Deep Architectures for Artificial Intelligence

Learning Features: The Past Traditional model of pattern recognition involves fixed kernel machines and hand-crafted features(since late 50’s) The first learning machine was the “Perceptron” Built at Cornell in 1960, the perceptron was a linear classifier on top of a simple feature extractor The vast majority of practical applications of ML today use glorified linear classifiers or glorified template matching.

Learning Features: The Future Modern approaches are based upon trainable features AND trainable classifier. Designing a feature extractor requires considerable efforts by experts

Machine Learning Supervised learning Unsupervised learning The training data consists of input information with their corresponding output information. Unsupervised learning The training data consists of input information without their corresponding output information.

Neural networks Generative model Discriminative model P(x,y1) P(x,y2) Model the distribution of input as well as output ,P(x , y) Discriminative model Model the posterior probabilities ,P(y | x) P(x,y1) P(x,y2) P(y1|x) P(y2|x)

Neural networks Two layer neural networks (Sigmoid neurons) Back-propagation Step1: Randomly initial weight Determine the output vector Step2: Evaluating the gradient of an error function Step3: Adjusting weight, Repeat The step1,2,3 until error enough low

Deep Neural Networks ANN with more than 2 hidden layer are referred as deep Given enough hidden neuron, a single hidden layer is enough to approximate any function to any degree of precision However too many neuron may quickly make the network unfeasible to train Adding layer greatly improve the network learning capacity, thus reducing the number of neuron needed

Deep Learning Deep Learning is about representing high-dimensional data Learning Representations of data means to discover and disentangle the independent explanatory factors that underlie the data distribution. The Manifold Hypothesis: Natural data lives in a low- dimensional (non-linear) manifold because variables in natural data are mutually dependent. Internal intermediate representations can be viewed as latent variables to be inferred, and deep belief networks are a particular type of latent variable models.

Hierarchy of Representations Hierarchy of representations with increasing level of abstraction Each stage is a kind of trainable feature transform Image recognition Image Pixel → edge → texton → motif → part → object Text Character → word → word group → clause → sentence → story Speech Sample → Spectral → Band → Sound → phoneme → word

How to train deep models? Purely Supervised Initialize parameters randomly Train in supervised mode typically with SGD, using backprop to compute gradients Used in most practical systems for speech and image recognition Unsupervised, layerwise + supervised classifier on top Train each layer unsupervised, one after the other Train a supervised classifier on top, keeping the other layers fixed Good when very few labeled samples are available Unsupervised, layerwise + global supervised fine-tuning Add a classifier layer, and retrain the whole thing supervised Good when label set is poor (e.g. pedestrian detection) Unsupervised pre-training often uses regularized auto-encoders

Boltzmann Machine Model one input layer and one hidden layer typically binary states for every unit stochastic (vs. deterministic) recurrent (vs. feed-forward) generative model (vs. discriminative): estimate the distribution of observations(say p(image)), while traditional discriminative networks only estimate the labels(say p(label|image)) defined Energy of the network and Probability of a unit’s state(scalar T is referred to as the “temperature”):

Restricted Boltzmann Machine Model a bipartite graph: no intralayer connections, feed-forward RBM does not have T factor, the rest are the same as BM one important feature of RBM is that the visible units and hidden units are conditionally independent, which will lead to a beautiful result later on:

Restricted Boltzmann Machine Two characters to define a Restricted Boltzmann Machine: states of all the units: obtained through probability distribution. weights of the network: obtained through training(Contrastive Divergence). As mentioned before, the objective of RBM is to estimate the distribution of input data. And this goal is fully determined by the weights, given the input. Energy defined for the RBM:

Restricted Boltzmann Machine Distribution of visible layer of the RBM(Boltzmann Distribution): Z is the partition function defined as the sum of over all possible configurations of {v,h} Probability that unit i is on(binary state 1): is the logistic/sigmoid function

Deep Belief Net Based on RBMs h2 data h1 h3 RBM DBNs based on stacks of RBMs: The top two hidden layers form an undirected associative memory(regarded as a shorthand for infinite stacks) and the remained hidden layers form a directed acyclic graph. The red arrows are NOT part of the generative model. They are just for inference purpose

Training Deep Belief Nets Previous discussion gives an intuition of training stacks of RBMs one layer at a time. This greedy learning algorithm is proved to be efficient in the sense of expected variance by Hinton. First, learn all the weights tied.

Training Deep Belief Nets Then freeze bottom layer and relearn all the other layers.

Training Deep Belief Nets Then freeze bottom two layers and relearn all the other layers.

Training Deep Belief Nets Each time we learn a new layer, the inference at the lower layers will become incorrect, but the variational bound on the log probability of the data improves, proved by Hinton. Since the inference at lower layers becomes incorrect, Hinton uses a fine- tuning procedure to adjust the weights, called wake-sleep algorithm.

Training Deep Belief Nets Wake-sleep algorithm: wake phase: do a down-top pass, sample h using the recognition weight based on input v for each RBM, and then adjust the generative weight by the RBM learning rule. sleep phase: do a top-down pass, start by a random state of h at the top layer and generate v. Then the recognition weights are modified. Analogs for wake-sleep algorithm: wake phase: if the reality is different with the imagination, then modify the generative weights to make what is imagined as close as the reality. sleep phase: if the illusions produced by the concepts learned during wake phase are different with the concepts, then modify the recognition weight to make the illusions as close as the concepts.

Useful Resources Webpages: People: Geoffrey E. Hinton’s readings (with source code available for DBN) http://www.cs.toronto.edu/~hinton/csc2515/deeprefs.html Notes on Deep Belief Networks http://www.quantumg.net/dbns.php MLSS Tutorial, October 2010, ANU Canberra, Marcus Frean http://videolectures.net/mlss2010au_frean_deepbeliefnets/ Deep Learning Tutorials http://deeplearning.net/tutorial/ Hinton’s Tutorial, http://videolectures.net/mlss09uk_hinton_dbn/ Fergus’s Tutorial, http://cs.nyu.edu/~fergus/presentations/nips2013_final.pdf CUHK MMlab project : http://mmlab.ie.cuhk.edu.hk/project_deep_learning.html People: Geoffrey E. Hinton’s http://www.cs.toronto.edu/~hinton Andrew Ng http://www.cs.stanford.edu/people/ang/index.html Yoshua Bengio www.iro.umontreal.ca/~bengioy Yann LeCun http://yann.lecun.com/ Rob Fergus http://cs.nyu.edu/~fergus/pmwiki/pmwiki.php