CS623: Introduction to Computing with Neural Nets (lecture-17) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Discrete random variables Probability mass function Distribution function (Secs )
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
Machine Learning CMPT 726 Simon Fraser University
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
December 7, 2010Neural Networks Lecture 21: Hopfield Network Convergence 1 The Hopfield Network The nodes of a Hopfield network can be updated synchronously.
CS621: Artificial Intelligence Lecture 24: Backpropagation Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
CS623: Introduction to Computing with Neural Nets (lecture-6) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
University of Southern California Department Computer Science Bayesian Logistic Regression Model (Final Report) Graduate Student Teawon Han Professor Schweighofer,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
The Boltzmann Machine Psych 419/719 March 1, 2001.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
Prof. Pushpak Bhattacharyya, IIT Bombay1 Basics Of Entropy CS 621 Artificial Intelligence Lecture /09/05 Prof. Pushpak Bhattacharyya.
Activations, attractors, and associators Jaap Murre Universiteit van Amsterdam
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Other NN Models Reinforcement learning (RL)
CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Pushpak Bhattacharyya Computer Science and Engineering Department
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Ten MC Questions taken from the Text, slides and described in class presentation. COSC 4426 AJ Boulay Julia Johnson.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.
Prof. Pushpak Bhattacharyya, IIT Bombay CS 621 Artificial Intelligence Lecture 28 – 22/10/05 Prof. Pushpak Bhattacharyya Generating Function,
Lecture 9 State Space Gradient Descent Gibbs Sampler with Simulated Annealing.
Activations, attractors, and associators Jaap Murre Universiteit van Amsterdam en Universiteit Utrecht
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Backpropagation and Application.
CS623: Introduction to Computing with Neural Nets (lecture-12) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS623: Introduction to Computing with Neural Nets (lecture-7) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 23- Forward probability and Robot Plan; start of plan.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
CS621: Artificial Intelligence
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 9– Uncertainty.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.
CS623: Introduction to Computing with Neural Nets (lecture-9) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS623: Introduction to Computing with Neural Nets (lecture-18) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Sampling and Sampling Distributions
Lecture 1.31 Criteria for optimal reception of radio signals.
The simple linear regression model and parameter estimation
CS623: Introduction to Computing with Neural Nets (lecture-5)
Ch9: Decision Trees 9.1 Introduction A decision tree:
CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
CS621: Artificial Intelligence
CS623: Introduction to Computing with Neural Nets (lecture-15)
CS623: Introduction to Computing with Neural Nets (lecture-4)
The free-energy principle: a rough guide to the brain? K Friston
CS 621 Artificial Intelligence Lecture 25 – 14/10/05
CS623: Introduction to Computing with Neural Nets (lecture-9)
Boltzmann Machine (BM) (§6.4)
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS623: Introduction to Computing with Neural Nets (lecture-14)
CS 621 Artificial Intelligence Lecture 29 – 22/10/05
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Prof. Pushpak Bhattacharyya, IIT Bombay
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
CS623: Introduction to Computing with Neural Nets (lecture-11)
Simulated Annealing & Boltzmann Machines
Presentation transcript:

CS623: Introduction to Computing with Neural Nets (lecture-17) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Boltzmann Machine Hopfield net Probabilistic neurons Energy expression = -Σ i Σ j>i w ij x i x j where x i = activation of i th neuron Used for optimization Central concern is to ensure global minimum Based on simulated annealing

Comparative Remarks Feed forward n/w with BP Hopfield netBoltzmann m/c Mapping device: (i/p pattern --> o/p pattern), i.e. Classification Associative Memory + Optimization device Constraint satisfaction. (Mapping + Optimization device) Minimizes total sum square error EnergyEntropy (Kullback– Leibler divergence)Kullback– Leibler divergence

Comparative Remarks (contd.) Feed forward n/w with BP Hopfield netBoltzmann m/c Deterministic neurons Probabilistic neurons Learning to associate i/p with o/p i.e. equivalent to a function PatternProbability Distribution

Comparative Remarks (contd.) Feed forward n/w with BP Hopfield netBoltzmann m/c Can get stuck in local minimum (Greedy approach) Local minimum possible Can come out of local minimum Credit/blame assignment (consistent with Hebbian rule) Activation product (consistent with Hebbian rule) Probability and activation product (consistent with Hebbian rule)

Theory of Boltzmann m/c For the m/c the computation means the following: At any time instant, make the state of the k th neuron (s k ) equal to 1, with probability: 1 / (1 + exp(-ΔE k / T)) ΔE k = change in energy of the m/c when the k th neuron changes state T = temperature which is a parameter of the m/c

Theory of Boltzmann m/c (contd.) P(s k = 1) ΔEkΔEk 0 1 P(s k = 1) = 1 / (1 + exp(-ΔEk / T)) Increasing T ΔE k = α

Theory of Boltzmann m/c (contd.) ΔE k = E k final – E k initial = (s initial k - s final k ) * Σ j≠k w kj s j We observe: 1.The higher the temperature, lower is P(S k =1) 2.at T = infinity, P(S k =1) = P(S k =0) = 0.5, equal chance of being in state 0 or 1. Completely random behavior 3.If T  0, then P(S k =1)  1 4.The derivative is proportional P(S k =1)*(1 - P(S k =1))

Consequence of the form of P(S k =1) P(S α ) α exp( -E(S α )) / T Probability Distribution called as Boltzmann Distribution N - bits P(S α ) is the probability of the state S α Local “sigmoid” probabilistic behavior leads to global Boltzmann Distribution behaviour of the n/w

T P(S α ) α exp( -E(S α )) / T P E

Ratio of state probabilities Normalizing, P(S α ) = (exp(-E(S α )) / T) / (∑ β є all states exp(-E(S β )/T) P(S α ) / P(S β ) = exp -(E(S α ) - E(S β ) ) / T

Learning a probability distribution Digression: Estimation of a probability distribution Q by another distribution P D = deviation = ∑ sample space Qln Q/P D >= 0, which is a required property (just like sum square error >= 0)

To prove D >= 0 Lemma: ln (1/x) >= (1 - x) Let, x = 1 / (1 - y) ln(1 - y) = -[y + y 2 /2 + y 3 /3 + y 4 /4 + …]

Proof (contd.) (1 – x) = 1 – 1/(1 - y) = -y(1 - y) -1 = -y(1 + y + y 2 + y 3 + …) = -[y + y 2 + y 3 + y 4 + …] But, y + y 2 /2 + y 3 /3 + …<= y + y 2 + y 3 + … Lemma proved.

Proof (contd.) D = ∑ Q ln Q / P >= ∑ over the sample space Q(1 – P/Q) = ∑ (Q - P) = ∑ Q - ∑P = 1 – 1 = 0