Laurent Itti: CS564 - Brain Theory and Artificial Intelligence

Slides:



Advertisements
Similar presentations
Bioinspired Computing Lecture 16
Advertisements

Simulated Annealing Premchand Akella. Agenda Motivation The algorithm Its applications Examples Conclusion.
Monte Carlo Methods and Statistical Physics
Optimization in mean field random models Johan Wästlund Linköping University Sweden.
CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)
1 Optimization Algorithms on a Quantum Computer A New Paradigm for Technical Computing Richard H. Warren, PhD Optimization.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
Gizem ALAGÖZ. Simulation optimization has received considerable attention from both simulation researchers and practitioners. Both continuous and discrete.
Simulated Annealing Student (PhD): Umut R. ERTÜRK Lecturer : Nazlı İkizler Cinbiş
CS 561, Session 7 1 Recall: breadth-first search, step by step.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Neural Networks: Concepts (Reading:
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
Ant Colony Optimization Optimisation Methods. Overview.
1 Simulated Annealing Terrance O ’ Regan. 2 Outline Motivation The algorithm Its applications Examples Conclusion.
Chapter 6: Multilayer Neural Networks
CS 561, Session 7 1 Recall: breadth-first search, step by step.
Energy function: E(S 1,…,S N ) = - ½ Σ W ij S i S j + C (W ii = P/N) (Lyapunov function) Attractors= local minima of energy function. Inverse states Mixture.
Itti: CS564 - Brain Theory and Artificial Intelligence. Systems Concepts 1 CS564 - Brain Theory and Artificial Intelligence University of Southern California.
December 7, 2010Neural Networks Lecture 21: Hopfield Network Convergence 1 The Hopfield Network The nodes of a Hopfield network can be updated synchronously.
Elements of the Heuristic Approach
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Genetic Algorithms and Ant Colony Optimisation
Topology Design for Service Overlay Networks with Bandwidth Guarantees Sibelius Vieira* Jorg Liebeherr** *Department of Computer Science Catholic University.
1 IE 607 Heuristic Optimization Simulated Annealing.
Chapter 7 Other Important NN Models Continuous Hopfield mode (in detail) –For combinatorial optimization Simulated annealing (in detail) –Escape from local.
Neural Networks Architecture Baktash Babadi IPM, SCS Fall 2004.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory Mixed Integer Problems Most optimization algorithms deal.
A Modified Meta-controlled Boltzmann Machine Tran Duc Minh, Le Hai Khoi (*), Junzo Watada (**), Teruyuki Watanabe (***) (*) Institute Of Information Technology-Viet.
Tonga Institute of Higher Education Design and Analysis of Algorithms IT 254 Lecture 8: Complexity Theory.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
The Boltzmann Machine Psych 419/719 March 1, 2001.
Simulated Annealing.
CSC321: Introduction to Neural Networks and machine Learning Lecture 16: Hopfield nets and simulated annealing Geoffrey Hinton.
Activations, attractors, and associators Jaap Murre Universiteit van Amsterdam
Optimization with Neural Networks Presented by: Mahmood Khademi Babak Bashiri Instructor: Dr. Bagheri Sharif University of Technology April 2007.
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall Lecture 19. Systems Concepts 1 Michael Arbib: CS564 - Brain Theory and.
B. Stochastic Neural Networks
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.
Ramakrishna Lecture#2 CAD for VLSI Ramakrishna
An Introduction to Simulated Annealing Kevin Cannons November 24, 2005.
Activations, attractors, and associators Jaap Murre Universiteit van Amsterdam en Universiteit Utrecht
CS623: Introduction to Computing with Neural Nets (lecture-12) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS623: Introduction to Computing with Neural Nets (lecture-17) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Lecture 9 Model of Hopfield
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
CSC321: Computation in Neural Networks Lecture 21: Stochastic Hopfield nets and simulated annealing Geoffrey Hinton.
Meta-controlled Boltzmann Machine toward Accelerating the Computation Tran Duc Minh (*), Junzo Watada (**) (*) Institute Of Information Technology-Viet.
CS621: Artificial Intelligence
CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets Geoffrey Hinton.
Lecture 39 Hopfield Network
Scientific Research Group in Egypt (SRGE)
Recall: breadth-first search, step by step
Simulated Annealing Chapter
CSC321 Lecture 18: Hopfield nets and simulated annealing
Heuristic Optimization Methods
Ch7: Hopfield Neural Model
CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Subject Name: Operation Research Subject Code: 10CS661 Prepared By:Mrs
Introduction to Simulated Annealing
traveling salesman problem
Lecture 39 Hopfield Network
Simulated Annealing & Boltzmann Machines
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Laurent Itti: CS564 - Brain Theory and Artificial Intelligence Lecture 8. Hopfield Networks, Constraint Satisfaction, and Optimization Reading Assignments: HBTNN: I.3 Dynamics and Adaptation in Neural Networks (Arbib) III. Associative Networks (Anderson) III. Energy Functions for Neural Networks (Goles) TMB2: 8.2 Connectionist Models of Adaptive Networks

Hopfield Networks A paper by John Hopfield in 1982 was the catalyst in attracting the attention of many physicists to "Neural Networks". In a network of McCulloch-Pitts neurons whose output is 1 iff wij sj  qi and is otherwise 0, neurons are updated synchronously: every neuron processes its inputs at each time step to determine a new output.

Hopfield Networks A Hopfield net (Hopfield 1982) is a net of such units subject to the asynchronous rule for updating one neuron at a time: "Pick a unit i at random. If wij sj  qi, turn it on. Otherwise turn it off." Moreover, Hopfield assumes symmetric weights: wij = wji

“Energy” of a Neural Network Hopfield defined the “energy”: E = - ½  ij sisjwij +  i siqi If we pick unit i and the firing rule (previous slide) does not change its si, it will not change E.

si: 0 to 1 transition DE = - ½ j (wijsj + wjisj) + qi If si initially equals 0, and  wijsj  qi then si goes from 0 to 1 with all other sj constant, and the "energy gap", or change in E, is given by DE = - ½ j (wijsj + wjisj) + qi = - ( j wijsj - qi) (by symmetry)  0.

si: 1 to 0 transition DE = j wijsj - qi < 0 If si initially equals 1, and  wijsj < qi then si goes from 1 to 0 with all other sj constant The "energy gap," or change in E, is given, for symmetric wij, by: DE = j wijsj - qi < 0 On every updating we have DE  0

Minimizing Energy On every updating we have DE  0 Hence the dynamics of the net tends to move E toward a minimum. We stress that there may be different such states — they are local minima. Global minimization is not guaranteed.

The Symmetry Condition wij = wji is crucial for DE  0 Without this condition ½  j(wij + wji) sj - qi cannot be reduced to ( j wijsj - qi), so that Hopfield's updating rule cannot be guaranteed to yield a passage to energy minimum. It might instead yield a limit cycle - which can be useful in modeling control of action. In most vision algorithms: constraints can be formulated in terms of symmetric weights, so that wij = wji is appropriate. [TMB2: Constraint Satisfaction §4.2; Stereo §7.1; Optic Flow §7.2] In a control problem: a link wij might express the likelihood that the action represented by i should precede that represented by j, and thus wij = wji is normally inappropriate.

The condition of asynchronous update is crucial Consider the above simple "flip-flop" with constant input 1, and with w12 = w21 = 1 and q1 = q2 = 0.5 The McCulloch-Pitts network will oscillate between the states (0,1) and (1,0) or will sit in the states (0,0) or (1,1) There is no guarantee that it will converge to an equilibrium.

The condition of asynchronous update is crucial However, with E = -0.5 ijsisjwij +  isiqi we have E(0,0) = 0 E(0,1) = E(1,0) = 0.5 E(1,1) = 0 and the Hopfield network will converge to the minimum at (0,0) or (1,1).

Hopfield Nets and Optimization To design Hopfield nets to solve optimization problems: given a problem, choose weights for the network so that E is a measure of the overall constraint violation. A famous example is the traveling salesman problem. [HBTNN articles:Neural Optimization; Constrained Optimization and the Elastic Net. See also TMB2 Section 8.2.] Hopfield and Tank 1986 have constructed VLSI chips for such networks which do indeed settle incredibly quickly to a local minimum of E. Unfortunately, there is no guarantee that this minimum is an optimal solution to the traveling salesman problem. Experience shows it will be "a pretty good approximation," but conventional algorithms exist which yield better performance.

The traveling salesman problem 1 There are n cities, with a road of length lij joining city i to city j. The salesman wishes to find a way to visit the cities that is optimal in two ways: each city is visited only once, and the total route is as short as possible. This is an NP-Complete problem: the only known algorithms (so far) to solve it have exponential complexity.

Exponential Complexity Why is exponential complexity a problem? It means that the number of operations necessary to compute the exact solution of the problem grows exponentially with the size of the problem (here, the number of cities). exp(1) = 2.72 exp(10) = 2.20 104 exp(100) = 2.69 1043 exp(500) = 1.40 10217 exp(250,000) = 10108,573 (Most powerful computer = 1012 operations/second)

The traveling salesman problem 2 We build a constraint satisfaction network as follows: Let neuron Nij express the decision to go straight from city i to city j. The cost of this move is simply lij. We can re-express the "visit a city only once" criterion by saying that, for city j, there is one and only one city i from which j is directly approached. Thus (iNij-1)2 can be seen as a measure of the extent to which this constraint is violated for paths passing on from city j. Thus, the cost of a particular "tour" — which may not actually be a closed path, but just a specification of a set of paths to be taken — is ij Nijlij +  j ( iNij-1)2 .

Constraint Optimization Network j

The traveling salesman problem 3 Cost to minimize: ij Nijlij +  j ( iNij-1)2 Now ( iNij-1)2 =  ikNijNkj - 2  iNij + 1 and so  j( iNij-1)2 =  ijkNijNkj - 2  ijNij + n =  ij,kl NijNklvij,kl - 2  ijNij + n where n is the number of cities vij,kl equals 1 if j = l, and 0 otherwise. Thus, minimizing ij Nijlij +  j ( iNij-1)2 is equiv to minimizing  ij,kl NijNklvij,kl +  ijNij(lij-2) since the constant n makes no difference.

The traveling salesman problem 4 minimize:  ij,kl NijNklvij,kl +  ijNij(lij-2) Compare this to the general energy expression (with si now replaced by Nij): E = -1/2  ij,kl NijNklwij,kl + ij Nijqij. Thus if we set up a network with connections wij,kl = -2 vij,kl ( = -2 if j=l, 0 otherwise) and qij = lij - 2, it will settle to a local minimum of E.

TSP Network Connections

Boltzmann Machines h The Boltzmann Machine of Hinton, Sejnowski, and Ackley 1984 uses simulated annealing to escape local minima. To motivate their solution, consider how one might get a ball-bearing traveling along the curve to "probably end up" in the deepest minimum. The idea is to shake the box "about h hard" — then the ball is more likely to go from D to C than from C to D. So, on average, the ball should end up in C's valley. [HBTNN article:Boltzmann Machines. See also TMB2 Section 8.2.]

Boltzmann’s statistical theory of gases In the statistical theory of gases, the gas is described not by a deterministic dynamics, but rather by the probability that it will be in different states. The 19th century physicist Ludwig Boltzmann developed a theory that included a probability distribution of temperature (i.e., every small region of the gas had the same kinetic energy). Hinton, Sejnowski and Ackley’s idea was that this distribution might also be used to describe neural interactions, where low temperature T is replaced by a small noise term T (the neural analog of random thermal motion of molecules).

Boltzmann Distribution At thermal equilibrium at temperature T, the Boltzmann distribution gives the relative probability that the system will occupy state A vs. state B as: where E(A) and E(B) are the energies associated with states A and B.

Simulated Annealing Kirkpatrick et al. 1983: Simulated annealing is a general method for making likely the escape from local minima by allowing jumps to higher energy states. The analogy here is with the process of annealing used by a craftsman in forging a sword from an alloy. He heats the metal, then slowly cools it as he hammers the blade into shape. If he cools the blade too quickly the metal will form patches of different composition; If the metal is cooled slowly while it is shaped, the constituent metals will form a uniform alloy. [HBTNN article: Simulated Annealing.]

Simulated Annealing in Hopfield Nets Pick a unit i at random Compute DE = j wijsj - qi that would result from flipping si Accept to flip si with probability 1/[1+exp(DE/T)] NOTE: this rule converges to the deterministic rule in the previous slides when T0 Optimization with simulated annealing: set T optimize for given T lower T (see Geman & Geman, 1984) repeat

Statistical Mechanics of Neural Networks A good textbook which includes research by physicists studying neural networks: Hertz, J., Krogh. A., and Palmer, R.G., 1991, Introduction to the Theory of Neural Computation, Santa Fe Institute Studies in the Sciences of Complexity, Addison-Wesley. The book is quite mathematical, but has much accessible material, exploiting the analogy between neuron state and atomic “spins” in a magnet. [cf. HBTNN: Statistical Mechanics of Neural Networks (Engel and Zippelius)]