Lecture 10 Boltzmann machine

Slides:

Advertisements

Similar presentations

Multi-Layer Perceptron (MLP)

Advertisements

The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel

Decision Support Andry Pinto Hugo Alves Inês Domingues Luís Rocha Susana Cruz.

Beyond Linear Separability

Memristor in Learning Neural Networks

Slides from: Doug Gray, David Poole

Deep Learning Bing-Chen Tsai 1/21.

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)

Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.

Neural and Evolutionary Computing - Lecture 4 1 Random Search Algorithms. Simulated Annealing Motivation Simple Random Search Algorithms Simulated Annealing.

CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

CS 678 –Relaxation and Hopfield Networks1 Relaxation and Hopfield Networks Totally connected recurrent relaxation networks Bidirectional weights (symmetric)

1 Neural networks 3. 2 Hopfield network (HN) model A Hopfield network is a form of recurrent artificial neural network invented by John Hopfield in 1982.

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

Network Goodness and its Relation to Probability PDP Class Winter, 2010 January 13, 2010.

Back-Propagation Algorithm

Before we start ADALINE

CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.

CS 4700: Foundations of Artificial Intelligence

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.

Simulated Annealing G.Anuradha. What is it? Simulated Annealing is a stochastic optimization method that derives its name from the annealing process used.

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

Chapter 7 Other Important NN Models Continuous Hopfield mode (in detail) –For combinatorial optimization Simulated annealing (in detail) –Escape from local.

Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.

CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.

Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

The Boltzmann Machine Psych 419/719 March 1, 2001.

Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.

Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.

Simulated Annealing.

CSC321: Introduction to Neural Networks and machine Learning Lecture 16: Hopfield nets and simulated annealing Geoffrey Hinton.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.

Linear Discrimination Reading: Chapter 2 of textbook.

CSC2535 Lecture 4 Boltzmann Machines, Sigmoid Belief Nets and Gibbs sampling Geoffrey Hinton.

B. Stochastic Neural Networks

Simulated Annealing G.Anuradha.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.

The Essence of PDP: Local Processing, Global Outcomes PDP Class January 16, 2013.

Constraint Satisfaction and Schemata Psych 205. Goodness of Network States and their Probabilities Goodness of a network state How networks maximize goodness.

Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:

CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.

Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg

Boltzman Machines Stochastic Hopfield Machines Lectures 11e 1.

CS623: Introduction to Computing with Neural Nets (lecture-17) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.

CSC2535 Lecture 5 Sigmoid Belief Nets

CSC321: Computation in Neural Networks Lecture 21: Stochastic Hopfield nets and simulated annealing Geoffrey Hinton.

CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets Geoffrey Hinton.

CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.

Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.

Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton

Learning Deep Generative Models by Ruslan Salakhutdinov

CSC321 Lecture 18: Hopfield nets and simulated annealing

CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.

Real Neurons Cell structures Cell body Dendrites Axon

CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.

Restricted Boltzman Machines

Boltzmann Machine (BM) (§6.4)

Xin-She Yang, Nature-Inspired Optimization Algorithms, Elsevier, 2014

CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.

Simulated Annealing & Boltzmann Machines

CSC 578 Neural Networks and Deep Learning

Presentation transcript:

Lecture 10 Boltzmann machine Soft Computing Lecture 10 Boltzmann machine

Definition of wikipedia A Boltzmann machine is a type of stochastic recurrent neural network originally invented by Geoffrey Hinton and Terry Sejnowski. Boltzmann machines can be seen as the stochastic, generative counterpart of Hopfield nets. They were an early example of neural networks capable of forming internal representations. Because they are very slow to simulate they are not very useful for most practical purposes. However, they are theoretically intriguing due to the biological plausibility of their training algorithm. 24.10.2005

Definition of BM (2) Boltzmann machine, like a Hopfield net is a network of binary units with an "energy" defined for the network. Unlike Hopfield nets though, Boltzmann machines only ever have units that take values of 1 or 0. The global energy, E, in a Boltzmann machine is identical to that of a Hopfield network, that is: Where: wij is the connection weight from unit j to unit i. si is the state (1 or 0) of unit i. θi is the threshold of unit i. 24.10.2005

Definition of BM (3) Thus, the difference in the global energy that results from a single unit i being 0 or 1, written ΔEi, is given by: A Boltzmann machine is made up of stochastic units. The probability, pi of the ith unit being on is given by: (The scalar T is referred to as the "temperature“ of the system.) 24.10.2005

Definition of BM (4) Notice that temperature T plays a crucial role in the equation, and that in the course of running the network, the value of T will start high, and gradually 'cool down' to a lower value. This is a continuous function that transforms any inputs - from –infinity to +infinity - into real numbers in the interval [0, 1]. This is the logistic function, & has a characteristic sigmoid shape: So when Net = 0, e-Net = 1, because any number raised to the power 0 is 1. This true for all temperatures. So always prob(A = 1) = 1/2. I.e. if the netinput is 0, it's as likely to fire as not. 24.10.2005

Definition of BM (5) With very low temperatures, e.g. 0.001, if you get a little bit of positive activation, the probability it will fire goes to 1. conversely, with if it goes negative, i.e. at very low temperatures - as it approaches 0 - the Boltzmann machine becomes deterministic. Otherwise, the higher the temperature, the more it diverges from this. 24.10.2005

Definition of BM (6) Units are divided into "visible" units, V, and "hidden" units, H. The visible units are those which receive information from the "environment", i.e. those units that receive binary state vectors for training. The connections in a Boltzmann machine have three restrictions on them: (No unit has a connection with itself) (All connections are symmetric) (Visible units have no connections between them) 24.10.2005

Structure of BM 24.10.2005

Alternative structure of BM 24.10.2005

Training of BM Boltzmann machines can be viewed as a type of maximum likelihood model, i.e. training involves modifying the parameters (weights) in the network to maximize the probability of the network producing the data as it was seen in the training set. In other words, the network must successfully model the probabilities of the data in the environment. There are two phases to Boltzmann machine training. One is the "positive“ phase where the visible units' states are clamped to a particular binary state vector from the training set. The other is the "negative" phase where the network is allowed to run freely, i.e. no units have their state determined by external data. A vector over the visible units is denoted Vα and a vector over the hidden units is denoted as Hβ. The probabilities P+(S) and P−(S) represent the probability for a given state, S, in the positive and negative phases respectively. Note that this means that P+ (Vα) is determined by the environment for every Vα, because the visible units are set by the environment in the positive phase. 24.10.2005

Training of BM (2) Boltzmann machines are trained using a gradient descent algorithm, so a given weight, wij is changed by subtracting the partial derivative of a cost function with respect to the weight. The cost function used for Boltzmann machines, G, is given as: This means that the cost function is lowest when the probability of a vector in the negative phase is equivalent to the probability of the same vector in the positive phase. As well, it ensures that the most probable vectors in the data have the greatest effect on the cost 24.10.2005

Training of BM (3) This cost function would seem to be complicated to perform gradient descent with. Suprisingly though, the gradient with respect to a given weight, wij, at thermal equilibrium is given by the very simple equation: Where: is the probability of units i and j both being on in the positive phase. is the probability of units i and j both being on in the negative phase. 24.10.2005

Training of BM (4) This result follows from the fact that at thermal equilibrium the probability of any state when the network is free-running is given by the Boltzmann distribution (hence the name "Boltzmann machine"). A state of thermal equilibrium is crucial for this though, hence the network must be brought to thermal equilibrium before the probabilities of two units both being on are calculated. Thermal equilibrium is achieved with simulated annealing in a Boltzmann machine. It is the necessity of simulated annealing which can make training a Boltzmann machine on a digital computer a very slow process. However, this learning rule is fairly biologically plausible because the only information needed to change the weights is provided by "local" information. That is, the connection (or synapse biologically speaking) does not need information about anything other than the two neurons it connects. This is far more biologically realistic than the information needed by a connection in many other neural network training algorithms, such as backpropagation. 24.10.2005

Training of BM (5) Simulated annealing (SA) is a generic probabilistic meta-algorithm for the global optimization problem, namely locating a good approximation to the global optimum of a given function in a large search space. It was independently invented by S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi in 1983, and by V. Cerny in 1985. The name and inspiration come from annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. The heat causes the atoms to become unstuck from their initial positions (a local minimum of the internal energy) and wander randomly through states of higher energy; the slow cooling gives them more chances of finding configurations with lower internal energy than the initial one. 24.10.2005

Training of BM (6) 24.10.2005

24.10.2005

Similarity and difference of BM and Hopfield model 24.10.2005

Sometimes machine boltzmann are used in combination with Hopfield model and perceptron as device for find global minimum of energy function or error function correspondingly. In first case during of working (recall) state of any neuron is changed according of T (temperature) and if energy function decreases then this changing is accepted and process continues. In perceptron during of learning any weight is changed and this changing is accepted or not in according with estimation of error function. This process of changing may be executed in mixture with usual process of working or learning or after it to improve result. 24.10.2005

24.10.2005

Boltzmann Machine Simulator /****************************************************************************** ========================================== Network: Boltzmann Machine with Simulated Annealing Application: Optimization Traveling Salesman Problem void InitializeApplication(NET* Net) { INT n1,n2; REAL x1,x2,y1,y2; REAL Alpha1, Alpha2; Gamma = 7; for (n1=0; n1<NUM_CITIES; n1++) { for (n2=0; n2<NUM_CITIES; n2++) { Alpha1 = ((REAL) n1 / NUM_CITIES) * 2 * PI; Alpha2 = ((REAL) n2 / NUM_CITIES) * 2 * PI; x1 = cos(Alpha1); y1 = sin(Alpha1); x2 = cos(Alpha2); y2 = sin(Alpha2); Distance[n1][n2] = sqrt(sqr(x1-x2) + sqr(y1-y2)); } } f = fopen("BOLTZMAN.txt", "w"); fprintf(f, "Temperature Valid Length Tour\n\n"); } 24.10.2005

void CalculateWeights(NET* Net) { INT n1,n2,n3,n4; INT i,j; INT Pred_n3, Succ_n3; REAL Weight; for (n1=0; n1<NUM_CITIES; n1++) { for (n2=0; n2<NUM_CITIES; n2++) { i = n1*NUM_CITIES+n2; for (n3=0; n3<NUM_CITIES; n3++) { for (n4=0; n4<NUM_CITIES; n4++) { j = n3*NUM_CITIES+n4; Weight = 0; if (i!=j) { Pred_n3 = (n3==0 ? NUM_CITIES-1 : n3-1); Succ_n3 = (n3==NUM_CITIES-1 ? 0 : n3+1); if ((n1==n3) OR (n2==n4)) Weight = -Gamma; else if ((n1 == Pred_n3) OR (n1 == Succ_n3)) Weight = -Distance[n2][n4]; } Net->Weight[i][j] = Weight; Net->Threshold[i] = -Gamma/2; } } } 24.10.2005

void PropagateUnit(NET* Net, INT i) { INT j; REAL Sum, Probability; for (j=0; j<Net->Units; j++) { Sum += Net->Weight[i][j] * Net->Output[j]; } Sum -= Net->Threshold[i]; Probability = 1 / (1 + exp(-Sum / Net->Temperature)); if (RandomEqualREAL(0, 1) <= Probability) Net->Output[i] = TRUE; else Net->Output[i] = FALSE; } 24.10.2005

void BringToThermalEquilibrium(NET* Net) { INT n,i; for (i=0; i<Net->Units; i++) { Net->On[i] = 0; Net->Off[i] = 0; } for (n=0; n<1000*Net->Units; n++) { PropagateUnit(Net, i = RandomEqualINT(0, Net->Units-1)); for (n=0; n<100*Net->Units; n++) { PropagateUnit(Net, i = RandomEqualINT(0, Net->Units-1)); if (Net->Output[i]) Net->On[i]++; else Net->Off[i]++; { Net->Output[i] = Net->On[i] > Net->Off[i]; 24.10.2005

{ Net->Temperature = 100; do { BringToThermalEquilibrium(Net); void Anneal(NET* Net) { Net->Temperature = 100; do { BringToThermalEquilibrium(Net); WriteTour(Net); Net->Temperature *= 0.99; } while (NOT ValidTour(Net)); void main() { NET Net; InitializeRandoms(); GenerateNetwork(&Net); InitializeApplication(&Net); CalculateWeights(&Net); SetRandom(&Net); Anneal(&Net); FinalizeApplication(&Net); } 24.10.2005