Stochastic Neural Networks, Optimal Perceptual Interpretation, and the Stochastic Interactive Activation Model PDP Class January 15, 2010.

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

Linear Regression.
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CS590M 2008 Fall: Paper Presentation
Lecture 8: Three-Level Architectures CS 344R: Robotics Benjamin Kuipers.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Learning in Recurrent Networks Psychology 209 February 25, 2013.
An Introduction to Variational Methods for Graphical Models.
Simple Neural Nets For Pattern Classification
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
CS 589 Information Risk Management 6 February 2007.
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
Network Goodness and its Relation to Probability PDP Class Winter, 2010 January 13, 2010.
Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.
Interactive Activation: Behavioral and Brain Evidence and the Interactive Activation Model PDP Class January 8, 2010.
Stochastic Interactive Activation and Interactive Activation in the Brain PDP Class January 20, 2010.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Processing and Constraint Satisfaction: Psychological Implications The Interactive-Activation (IA) Model of Word Recognition Psychology /719 January.
October 7, 2010Neural Networks Lecture 10: Setting Backpropagation Parameters 1 Creating Data Representations On the other hand, sets of orthogonal vectors.
Perceptual Inference and Information Integration in Brain and Behavior PDP Class Jan 11, 2010.
Cognition and Perception as Interactive Activation Jay McClelland Symsys 100 April 16, 2009.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Probability Models Chapter 17.
Latent (S)SVM and Cognitive Multiple People Tracker.
Classification. An Example (from Pattern Classification by Duda & Hart & Stork – Second Edition, 2001)
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Bayesian and Connectionist Approaches to Learning Tom Griffiths, Jay McClelland Alison Gopnik, Mark Seidenberg.
The Interactive Activation Model. Ubiquity of the Constraint Satisfaction Problem In sentence processing –I saw the grand canyon flying to New York –I.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
The Boltzmann Machine Psych 419/719 March 1, 2001.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
CSC321: Introduction to Neural Networks and machine Learning Lecture 16: Hopfield nets and simulated annealing Geoffrey Hinton.
Methodological Problems in Cognitive Psychology David Danks Institute for Human & Machine Cognition January 10, 2003.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. 1.
CSC2535 Lecture 4 Boltzmann Machines, Sigmoid Belief Nets and Gibbs sampling Geoffrey Hinton.
The Essence of PDP: Local Processing, Global Outcomes PDP Class January 16, 2013.
Constraint Satisfaction and Schemata Psych 205. Goodness of Network States and their Probabilities Goodness of a network state How networks maximize goodness.
Slide Slide 1 Fundamentals of Probability. Slide Slide 2 A chance experiment is any activity or situation in which there is uncertainty about which of.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
Principled Probabilistic Inference and Interactive Activation Psych209 January 25, 2013.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Perception and Thought as Constraint Satisfaction Processes Jay McClelland Symsys 100 April 27, 2010.
1 Probability and Statistics Confidence Intervals.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
CSC321: Computation in Neural Networks Lecture 21: Stochastic Hopfield nets and simulated annealing Geoffrey Hinton.
CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets Geoffrey Hinton.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton

Network States as Perceptual Inferences
CSC321 Lecture 18: Hopfield nets and simulated annealing
Perception, interaction, and optimality
Learning Objective Using the generative form of Bayes’ equation, the learning objective is to find the most probable explanation, H, for the input patterns,
Classification with Perceptrons Reading:
Network States as Perceptual Inferences
EE513 Audio Signals and Systems
Graded Constraint Satisfaction, the IA Model, and Network States as Perceptual Inferences Psychology 209 January 15, 2019.
Presentation transcript:

Stochastic Neural Networks, Optimal Perceptual Interpretation, and the Stochastic Interactive Activation Model PDP Class January 15, 2010

Goodness Landscape of The Cube Network

The Boltzmann Machine: The Stochastic Hopfield Network Units have binary states [0,1], Update is asynchronous. The activation function is: Suppose (as is the case in the cube network) that, for T>0, it is possible to get from any state to any other state. If this condition holds, we say the network is ergodic. Although the process may start in some particular state, then if we wait long enough, the starting state no longer matters. From that time on, we say the network is at equilibrium, and under these circumstances: More generally, at equilibrium we have the Probability-Goodness Equation: or

Why is This True? (Intuitive Explanation) Consider two states, differing on the activation of a single unit The two states differ in goodness by an amount equal to the net input to the unit The probability of the unit being on is The probability of the unit being off is The rest of the states have the same probability So the ratio of the probability of the states is Which is

Simulated Annealing Start with high temperature. This means it is easy to jump from state to state. Gradually reduce temperature. In the limit of infinitely slow annealing, we can guarantee that the network will end up in the best possible state (or in one of them, if two or more are tied for best). Thus, the best possible interpretation (or one of the best, if there is a tie) can always be found, if you are patient!

Exploring Probability Distributions over States Imagine settling to a non-zero temperature, such as T = 0.5. At this temperature, there’s still some probability of being in a state that is less than perfect. Consider an ensemble of networks –At equilibrium (i.e. after enough cycles, possibly with annealing) the relative frequencies of being in the different states will approximate the relative probabilities given by the Probability- Goodness equation. You will have an opportunity to explore this situation in the homework assignment.

A Problem For the Interactive Activation Model Bayes Rule, Massaro’s model, and the logistic activation function all give rise to a pattern of data we will call ‘logistic additivity’. And data from many experiments exhibits this pattern Unfortunately, the interactive activation model does not exhibit this pattern. Does this mean that the interactive activation model is fundamentally wrong – i.e. processing is strictly feedforward (as Massaro believed)? Or is there some other inadequacy in the model?

Joint Effect of Context and Stimulus Information in Phoneme Identification (/l/ or /r/) From Massaro & Cohen (1991)

Massaro’s Model Joint effects of context and stimulus obey the fuzzy logical model of perception (next slide): t i is the stimulus support for r given input i and c j is the contextual support for r given context j. Massaro sees this model as having a strictly feed-forward organization: Evaluate stimulus Evaluate context IntegrationDecision

Massaro’s model implies ‘logistic additivity’: log(p ij /(1-p ij )) = log(t i /(1-t i )) + log(c j /(1-c j )) logit(p ij ) The p ij on this graph corresponds to the p(r|S ij ) on the preceding slide L-like R-like Different lines refer to different context conditions: r means ‘favors r’ l means ‘favors l’ n means ‘neutral’

Massaro’s argument against the IA model In the IA model, feature information gets used twice, once on the way up and then again on the way back down. Feeding the activation back in this way, he suggested, distorts the process of correctly identifying the target phoneme. It appears from simulations that in a sense he is right (see next slide). Does this mean that processing is really not interactive? If not, why not?

Ideal logistic-additive pattern (upper right) vs. mini-IA simulation results (lower right).

What was wrong with the Interactive Activation model? The original interactive activation model ‘tacked the variability on at the end’ but neural activity is intrinsically stochastic. McClelland (1991) incorporated that intrinsic variability in the computation of the net input: Now we choose the alternative with the highest activation after settling. Logistic additivity is observed. The result holds up in full-scale models and can be proven to hold under certain constraints on network architecture (Movellan & McClelland, 2001). Intrinsic Variability

Why logistic additivity holds in a Boltzmann machine version of the IA model Suppose the task is to identify the letter in position 2 in the IA model. The model is allowed to run to equilibrium… then states of the position 2 units are sampled until a state is found when one and only one of the letters in this position is active. The probability of this is given by: Define: This decomposes into:

Why logistic additivity holds in the IA Model This reduces to: This consists of a factor for the bias, a factor for the stimulus, and a factor for the context.

Conditions on Logistic Additivity In Stochastic Interactive Models (Movellan & McClelland, 2001) Logistic additivity holds in a stochastic, bi-directionally connected neural network when two sources of input do not interact with each other except via the set of units that are the basis for specifying the response. This applies to the two sources of input to the identification of the letter in any position in the interactive activation model. Simulations suggest that the exact details of the activation function and source of variability are unimportant. Would the effects of two context letters on a third letter exhibit logistic additivity?

Effects of two different letters on a third letter can violate logistic additivity

Final D favors E when First letter is R, But favors U when first letter is M. Consider the case in which the external input supports E and U equally in the middle letter position. Then: When First letter is R: p(E|D) = ~.6; p(E|N) =~.4 When First letter is M: p(E|D) = ~.4; p(E|N)=~.6

Logistic additivity as a tool for analyzing network architectures Sternberg used additivity in reaction times to analyze cognitive architectures under the discrete stage model. We can use logistic additivity in response probability to analyze the structure of network architectures within an interactive activation framework. –If two factors affect structurally separable pathways influencing the same response representations, they should have logistic- additive effects. –If two factors have effects that are not logistically additive, this indicates that the processing pathways intersect someplace other than at the response representation.

Gilcrest, A. L. Perceived lightness depends on perceived spatial arrangement. Science, 1977, 195, Experiment shows that subjects assign a surface a ‘color’ based on which other surfaces they see it as co-planar with. Thus color depends on perceived depth, violating modularity.