Plan for today Ist part IInd part

Slides:

Advertisements

Similar presentations

Angstrom Care 培苗社 Quadratic Equation II

Advertisements

© Negnevitsky, Pearson Education, Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.

© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.

Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.

Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.

Author: Julia Richards and R. Scott Hawley

Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.

Objectives: Generate and describe sequences. Vocabulary:

1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.

Conversion Problems 3.3.

Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.

Chapter 7 Sampling and Sampling Distributions

1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.

1 The tiling algorithm Learning in feedforward layered networks: the tiling algorithm writed by Marc M é zard and Jean-Pierre Nadal.

CSC321 Introduction to Neural Networks and Machine Learning Lecture 21 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

Projects Project 4 due tonight Project 5 (last project!) out later

PP Test Review Sections 6-1 to 6-6

Bellwork Do the following problem on a ½ sheet of paper and turn in.

Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.

1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.

Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.

CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.

Adding Up In Chunks.

1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.

Analyzing Genes and Genomes

DTU Informatics Introduction to Medical Image Analysis Rasmus R. Paulsen DTU Informatics TexPoint fonts.

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

Essential Cell Biology

Chapter 8 Estimation Understandable Statistics Ninth Edition

PSSA Preparation.

Essential Cell Biology

Immunobiology: The Immune System in Health & Disease Sixth Edition

Simple Linear Regression Analysis

Energy Generation in Mitochondria and Chlorplasts

Basics of Statistical Estimation

Adaptive Segmentation Based on a Learned Quality Metric

Deep Learning Bing-Chen Tsai 1/21.

CIAR Second Summer School Tutorial Lecture 2a Learning a Deep Belief Net Geoffrey Hinton.

Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.

CS590M 2008 Fall: Paper Presentation

CIAR Summer School Tutorial Lecture 2b Learning a Deep Belief Net

How to do backpropagation in a brain

CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.

How to do backpropagation in a brain

Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.

CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.

Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.

Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.

CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.

CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.

How to learn a generative model of images Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Deep learning Tsai bing-chen 10/22.

CSC2515 Fall 2008 Introduction to Machine Learning Lecture 8 Deep Belief Nets All lecture slides will be available as.ppt,.ps, &.htm at

CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.

CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton

Learning Deep Generative Models by Ruslan Salakhutdinov

CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.

Multimodal Learning with Deep Boltzmann Machines

network of simple neuron-like computing elements

CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.

CSC 2535: Computation in Neural Networks Lecture 9 Learning Multiple Layers of Features Greedily Geoffrey Hinton.

CSC 578 Neural Networks and Deep Learning

Presentation transcript:

Plan for today Ist part IInd part Brief introduction to Biological systems. Historical Background. Deep Belief learning procedure. IInd part Theoretical considerations. Different interpretation.

Biological Neurons

The Retina Most common in the Preliminary parts of The data processing Retina, ears

What is known about the learning process Activation every activity lead to the firing of a certain set of neurons. Habituation: is the psychological process in humans and other organisms in which there is a decrease in psychological and behavioral response to a stimulus after repeated exposure to that stimulus over a duration of time. In 1949 introduced Hebbian Learning: synchronous activation increases the synaptic strength; asynchronous activation decreases the synaptic strength. Hebbian Learning When activities were repeated, the connections between those neurons strengthened. This repetition was what led to the formation of memory.

A spectrum of machine learning tasks Typical Statistics Artificial Intelligence Low-dimensional data (e.g. less than 100 dimensions) Lots of noise in the data There is not much structure in the data, and what structure there is, can be represented by a fairly simple model. The main problem is distinguishing true structure from noise. High-dimensional data (e.g. more than 100 dimensions) The noise is not sufficient to obscure the structure in the data if we process it right. There is a huge amount of structure in the data, but the structure is too complicated to be represented by a simple model. The main problem is figuring out a way to represent the complicated structure so that it can be learned. Link

Artificial Neural Networks Artificial Neural Networks have been applied successfully to : speech recognition image analysis adaptive control Σ f(n) W Outputs Activation Function INPUTS W=Weight Neuron

Hebbian Learning Hebbian Learning In 1949 introduced Hebbian Learning: synchronous activation increases the synaptic strength; asynchronous activation decreases the synaptic strength. Hebbian Learning When activities were repeated, the connections between those neurons strengthened. This repetition was what led to the formation of memory. Update

The simplest model- the Perceptron The Perceptron was introduced in 1957 by Frank Rosenblatt. - d D0 D1 D2 Input Layer Output Destinations Perceptron: Activation functions: Update Learning:

The simplest model- the Perceptron Is a linear classifier. Can only perfectly classify a set of linearly separable data. Link d - How to learn multiple layers? incapable of processing the Exclusive Or (XOR) circuit. Link

Second generation neural networks (~1985) Back Propagation Compare outputs with correct answer to get error signal Back-propagate error signal to get derivatives for learning outputs hidden layers input vector

BP-algorithm Activations errors The error: Update Update Weights: 1 .5 1 .5 -5 5 .25 -5 5 Activations errors The error: Update Weights: Update

Back Propagation Advantages What is wrong with back-propagation? Multi layer Perceptron network can be trained by The back propagation algorithm to perform any mapping between the input and the output. What is wrong with back-propagation? It requires labeled training data. Almost all data is unlabeled. The learning time does not scale well It is very slow in networks with multiple hidden layers. It can get stuck in poor local optima. A temporary digression Vapnik and his co-workers developed a very clever type of perceptron called a Support Vector Machine. In the 1990’s, many researchers abandoned neural networks with multiple adaptive hidden layers because Support Vector Machines worked better.

Overcoming the limitations of back-propagation-Restricted Boltzmann Machines Keep the efficiency and simplicity of using a gradient method for adjusting the weights, but use it for modeling the structure of the sensory input. Adjust the weights to maximize the probability that a generative model would have produced the sensory input. Learn p(image) not p(label | image)

Restricted Boltzmann Machines(RBM) RBM is a Multiple Layer Perceptron Network The inference problem: Infer the states of the unobserved variables. The learning problem: Adjust the interactions between variables to make the network more likely to generate the observed data. RBM is a Graphical model Input layer Hidden layer Output layer

graphical models Restricted Boltzmann Machine: RMF: undirected Each arrow represent mutual dependencies between nodes hidden Bayesian network or belief network or Boltzmann Machine: directed acyclic hidden data HMM: the simplest Bayesian network Restricted Boltzmann Machine: symmetrically directed acyclic no intra-layer connections

Stochastic binary units (Bernoulli variables) 1 j i These have a state of 1 or 0. The probability of turning on is determined by the weighted input from other units (plus a bias)

The Energy of a joint configuration (ignoring terms to do with biases) The energy of the current state: The joint probability distribution Probability distribution over the visible vector v: Partition function The derivative of the energy function: i j

Maximum Likelihood method iteration t learning rate Parameters (weights) update: The log-likelihood: average w.r.t the data distribution computed using the sample data x average w.r.t the model distribution can’t generally be computed

Hinton's method - Contrastive Divergence Max likelihood method minimizes the Kullback-Leibber divergence: Intuitively:

Contrastive Divergence (CD) method In 2002 Hinton proposed a new learning procedure. CD follows approximately the difference of two divergences (="the gradient"). is the "distance" of the distribution from Practically: run the chain only for a small number of steps (actually one is sufficient) The update formula for the weights become: This greatly reduces both the computation per gradient step and the variance of the estimated gradient. Experiments show good parameter estimation capabilities.

A picture of the maximum likelihood learning algorithm for an RBM j j j j i i i i the fantasy (i.e. the model) t = 0 t = 1 t = 2 t = ∞ Start with a training vector on the visible units. Then alternate between updating all the hidden units in parallel and updating all the visible units in parallel. One Gibbs Sample (CD):

Multi Layer Network h3 h2 h1 data After Gibbs Sampling for Sufficiently long, the network reaches thermal equilibrium: the state of still change, but the probability of finding the system in any particular configuration does not. h2 data h1 h3 Adding another layer always improves the variation bound on the log-likelihood, unless the top level RBM is already a perfect model of the data it’s trained on.

The network for the 4 squares task 2 input units 4 logistic units 4 labels

The network for the 4 squares task 2 input units 4 logistic units 4 labels

The network for the 4 squares task 2 input units 4 logistic units 4 labels

The network for the 4 squares task 2 input units 4 logistic units 4 labels

The network for the 4 squares task 2 input units 4 logistic units 4 labels

The network for the 4 squares task 2 input units 4 logistic units 4 labels

The network for the 4 squares task 2 input units 4 logistic units 4 labels

The network for the 4 squares task 2 input units 4 logistic units 4 labels

The network for the 4 squares task 2 input units 4 logistic units 4 labels

The network for the 4 squares task 2 input units 4 logistic units 4 labels

The network for the 4 squares task 2 input units 4 logistic units 4 labels

entirely unsupervised except for the colors

Results 10 labels 2000 neurons 500 neurons 28x28 pixels output vector The Network used to recognize handwritten binary digits from MNIST database: 28x28 pixels 500 neurons output vector 2000 neurons 10 labels Class: Non Class: Images from an unfamiliar digit class (the network tries to see every image as a 2) New test images from the digit class that the model was trained on

Examples of correctly recognized handwritten digits that the neural network had never seen before Pros: Good generalization capabilities Cons: Only binary values permitted. No Invariance (neither translation nor rotation).

How well does it discriminate on MNIST test set with no extra information about geometric distortions? Generative model based on RBM’s 1.25% Support Vector Machine (Decoste et. al.) 1.4% Backprop with 1000 hiddens (Platt) ~1.6% Backprop with 500 -->300 hiddens ~1.6% K-Nearest Neighbor ~ 3.3%

A non-linear generative model for human motion CMU Graphics Lab Motion Capture Database Sampled motion from video (30 Hz). Each frame is a Vector 1x60 of the skeleton Parameters (3D joint angles). The data does not need to be heavily preprocessed or dimensionality reduced.

Conditional RBM (cRBM) Can model temporal dependences by treating the visible variables in the past as an additional biases. Add two types of connections: from the past n frames of visible to the current visible. to the current hidden. Given the past n frames, the hidden units at time t are cond. independent  we can still use the CD for training cRBMs t-2 t-1 t

THANK YOU

Structured input Independent input Much easier to learn!!! Back (3)

The Perceptron is a linear classifier 1 .01 .99 Back (3)

1 1 Back (3) A B XOR(A,B) 1 A B OR(A,B) 1 x0 x1 A B AND(A,B) 1 A B 1 A B OR(A,B) 1 x0 1 x1 A B AND(A,B) 1 A B NAND(A,B) 1 x0 x1 1 Back (3)