CSC 578 Neural Networks and Deep Learning

Slides:



Advertisements
Similar presentations
Chapter3 Pattern Association & Associative Memory
Advertisements

Computational Intelligence
Deep Learning Bing-Chen Tsai 1/21.
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CS590M 2008 Fall: Paper Presentation
CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)
Supervised Learning Recap
Kostas Kontogiannis E&CE
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
CS 678 –Relaxation and Hopfield Networks1 Relaxation and Hopfield Networks Totally connected recurrent relaxation networks Bidirectional weights (symmetric)
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
1 Neural networks 3. 2 Hopfield network (HN) model A Hopfield network is a form of recurrent artificial neural network invented by John Hopfield in 1982.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Neural Networks: Concepts (Reading:
Artificial Neural Networks
December 7, 2010Neural Networks Lecture 21: Hopfield Network Convergence 1 The Hopfield Network The nodes of a Hopfield network can be updated synchronously.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
How to do backpropagation in a brain
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
NEURAL NETWORKS FOR DATA MINING
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
IE 585 Associative Network. 2 Associative Memory NN Single-layer net in which the weights are determined in such a way that the net can store a set of.
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
CSC321: Introduction to Neural Networks and machine Learning Lecture 16: Hopfield nets and simulated annealing Geoffrey Hinton.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
Activations, attractors, and associators Jaap Murre Universiteit van Amsterdam
CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.
Chapter 18 Connectionist Models
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Chapter 6 Neural Network.
ECE 471/571 - Lecture 16 Hopfield Network 11/03/15.
Computational Intelligence Winter Term 2015/16 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets Geoffrey Hinton.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
J. Kubalík, Gerstner Laboratory for Intelligent Decision Making and Control Artificial Neural Networks II - Outline Cascade Nets and Cascade-Correlation.
Multinomial Regression and the Softmax Activation Function Gary Cottrell.
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Learning Deep Generative Models by Ruslan Salakhutdinov
Deep Feedforward Networks
CSC321 Lecture 18: Hopfield nets and simulated annealing
Ch7: Hopfield Neural Model
ECE 471/571 - Lecture 15 Hopfield Network 03/29/17.
Real Neurons Cell structures Cell body Dendrites Axon
Restricted Boltzmann Machines for Classification
CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
Structure learning with deep autoencoders
Restricted Boltzman Machines
CSC 578 Neural Networks and Deep Learning
ECE 471/571 - Lecture 19 Hopfield Network.
CSC 578 Neural Networks and Deep Learning
Perceptron as one Type of Linear Discriminants
Artificial Neural Networks
Regulation Analysis using Restricted Boltzmann Machines
Neural Networks Geoff Hulten.
Capabilities of Threshold Neurons
Lecture Notes for Chapter 4 Artificial Neural Networks
Boltzmann Machine (BM) (§6.4)
Artificial Neural Networks
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
COSC 4335: Part2: Other Classification Techniques
Computational Intelligence
Computational Intelligence
Simulated Annealing & Boltzmann Machines
EE 193/Comp 150 Computing with Biological Parts
Presentation transcript:

CSC 578 Neural Networks and Deep Learning 9. Hopfield Networks, Boltzmann Machines Noriko Tomuro

Unsupervised Neural Networks Hopfield Networks Concepts Boltzmann Machines Restricted Boltzmann Machines Deep Boltzmann Machines Noriko Tomuro

1 Hopfield Network A Hopfield network is a form of recurrent artificial neural network. It is also one of the oldest neural networks. There are many variations.  The one presented here is a discrete network which takes bipolar inputs (1 or -1). Hopfield network stores patterns -- then recovers stored patterns from partial or corrupted patterns. -- Associative Memory. Hopfield networks also have been applied to combinatorial optimization problems, e.g. Traveling Salesman Problem Noriko Tomuro

Overview of Hopfield Network Weights between units are bi-directional (thus "Feedback" or "Recurrent" network) => A network is a fully-connected network (but no self-loop weights, i.e., wii = 0). Each unit/node represents a neuron.  And the activation of a neuron is (in the case of binary network; similar to thresholded perceptron) If xi is 1, a unit is called "active", and if -1, it is called "inactive". Every neuron functions as both input and output unit. x1 x2 x3 W21= W12 W31= W13 W23= W32

Then, the network settles in to one of the stable state. A state of a network is defined by the activation of nodes -- <x1,..,xn>. Given a set of (current) weights, values of nodes are updated asynchronously (parallel relaxation). Pick a node randomly, and compute the new activation for that node -- a node fires if it becomes 1. Repeat the procedure until no node changes value. Then, the network settles in to one of the stable state.

x1 x3 x2 -2 1 (0) Pattern <-1, -1, -1> presented -1 (1) After activating x2 (2) After activating x1 (3) After activating x3 (4), (5), (6) After activating x1 through x3 again, no more state change occurs. Noriko Tomuro

Closeness to a stable state is measured by the notion of energy It is proven that Hopfield network with asynchronous update will always converge to a stable state. Depending on the weights and input values, there may be several states to which the network converges. The change of the network state is essentially a search through the possible state space. Closeness to a stable state is measured by the notion of energy 𝑤 𝑖𝑗 j is the connection weight between unit j and unit i. 𝑠 𝑖 is the state, si  {0,1}. 𝜃 𝑖 is the bias of unit i. (- 𝜃 𝑖 is the activation threshold for the unit). When a node is activated, the change in the energy is always <= 0. 𝐸=− 𝑖<𝑗 𝑤 𝑖𝑗 𝑠 𝑖 𝑠 𝑗 + 𝑖 𝜃 𝑖 𝑠 𝑖

So, searching for a stable state is a minimization problem, and the same Gradient Descent can be used to find the minimum. However, there is the danger of getting stuck in a local minima -- the convergence goes to the closest local minima.

Capacity limitation of Hopfield networks: It has been shown that Hopfield networks can only memorize limited numbers of patterns. A newer result showed the recall accuracy between vectors and nodes was 0.138 (approximately 138 vectors can be recalled from storage for every 1000 nodes) (Hertz et al., 1991). A Hopfield network with N nodes can store M patterns where M = 0.15N (for binary network), or M = N / 2 log2N (for bipolar network)

And the weight update for node xi is Training for Hopfield networks: Values in the input patterns are bipolar: e.g. <1, -1, 1> Weights are updated incrementally. We basically want the stored patterns to be the stables states. 0. Initialize network weights. 1. Do until no change in the weights occur 2. Initialize delta_w's to be 0.0. 3. For each pattern d in the training set, do 4. Present the pattern to the network. 5. For each node xi, 6. If xi's activation is different from input xi, 7. update the weights connected to xi. And the weight update for node xi is where eta is the learning rate, and xi, xj are the values in the input pattern d.

Modern Hopfield networks: Instead of using the net to store memories, we use it to construct interpretations of sensory input. The input is represented by the visible units, the interpretation is represented by the states of the hidden units, and the badness of the interpretation is represented by the energy. [Video by Geoff Hinton, 2012]

Summary: Hopfield networks suffer from spurious local minima that form on the energy hypersurface. require the input patterns to be uncorrelated. are limited in capacity of patterns that can be stored. are usually fully connected and not stacked.

https://keras.io/getting-started/sequential-model-guide/ 2 Boltzmann Machines A Boltzmann machine (also called stochastic Hopfield network with hidden units) is a type of stochastic recurrent neural network (and Markov random field). Its units produce binary results. Unlike Hopfield nets, Boltzmann machine units are stochastic.[Wikipedia] A graphical representation of an example Boltzmann machine. Each undirected edge represents dependency. In this example there are 3 hidden units and 4 visible units. https://keras.io/getting-started/sequential-model-guide/

The global energy in a Boltzmann machine is identical to that of a Hopfield network: where 𝑤 𝑖𝑗 j is the connection weight between unit j and unit i. 𝑠 𝑖 is the state, si  {0,1}. 𝜃 𝑖 is the bias of unit i. (- 𝜃 𝑖 is the activation threshold for the unit). The goal of learning for Boltzmann machine learning algorithm is to maximize the product of the probabilities that the Boltzmann machine assigns to the binary vectors in the training set. For the connection between the assigned probabilities and the energy (and temperature) consult this [Wikipedia] page. 𝐸=− 𝑖<𝑗 𝑤 𝑖𝑗 𝑠 𝑖 𝑠 𝑗 + 𝑖 𝜃 𝑖 𝑠 𝑖

Noriko Tomuro

And we want to maximize G since the probabilities are not negated. Training of Boltzmann Machines usually use KL-divergence, or log likelihood. The loss function G (for binary vectors) is where 𝑃 + (𝑣) is the distribution over the training set V 𝑃 − (𝑣) is the distribution over the visible (i.e., not hidden) units in the network And we want to maximize G since the probabilities are not negated. 𝐺= 𝑣 𝑃 + (𝑣)∙ ln 𝑃 + (𝑣) 𝑃 − (𝑣)

Restricted Boltzmann Machines It was discovered that original Boltzmann Machines stop learning correctly when the machine is scaled up to anything larger than a trivial machine. Then in 2014, an architecture called the "restricted Boltzmann machine" or "RBM“ was invented. RBM does not allow intralayer connections between hidden units. This type of architecture was shown to make inference and learning easier.

RBM Learning:

Deep Restricted Boltzmann Machines

RBM vs. Autoencoder Autoencoder is a simple 3-layer neural network where output units are directly connected back to input units. The task of training is to minimize an error or reconstruction, i.e. find the most efficient compact representation (encoding) for input data. RBM shares similar idea, but uses stochastic stochastic units with particular (usually binary of Gaussian) distribution. The task of training is to find out how visible random variables are actually connected/related to hidden random variables. https://www.quora.com/What-is-the-difference-between-autoencoders-and-a-restricted-Boltzmann-machine

https://en.wikipedia.org/wiki/Generative_adversarial_network RBM vs. GAN Generative adversarial networks (GANs) are a class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks contesting with each other in a zero-sum game framework. https://en.wikipedia.org/wiki/Generative_adversarial_network

https://stats. stackexchange https://stats.stackexchange.com/questions/338328/restricted-boltzmann-machines-vs-gan