Network States as Perceptual Inferences

Slides:

Advertisements

Similar presentations

The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel

Advertisements

CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)

Learning in Recurrent Networks Psychology 209 February 25, 2013.

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

A saliency map model explains the effects of random variations along irrelevant dimensions in texture segmentation and visual search Li Zhaoping, University.

For stimulus s, have estimated s est Bias: Cramer-Rao bound: Mean square error: Variance: Fisher information How good is our estimate? (ML is unbiased:

Stochastic Neural Networks, Optimal Perceptual Interpretation, and the Stochastic Interactive Activation Model PDP Class January 15, 2010.

Network Goodness and its Relation to Probability PDP Class Winter, 2010 January 13, 2010.

Interactive Activation: Behavioral and Brain Evidence and the Interactive Activation Model PDP Class January 8, 2010.

Stochastic Interactive Activation and Interactive Activation in the Brain PDP Class January 20, 2010.

Processing and Constraint Satisfaction: Psychological Implications The Interactive-Activation (IA) Model of Word Recognition Psychology /719 January.

Cognition and Perception as Interactive Activation Jay McClelland Symsys 100 April 16, 2009.

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

Interactive Activation: Behavioral and Brain Evidence and the Interactive Activation Model PDP Class January 10, 2011.

Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.

The Interactive Activation Model. Ubiquity of the Constraint Satisfaction Problem In sentence processing –I saw the grand canyon flying to New York –I.

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

Perception, Thought and Language as Graded Constraint Satisfaction Processes Jay McClelland SymSys 100 April 12, 2011.

The Essence of PDP: Local Processing, Global Outcomes PDP Class January 16, 2013.

Constraint Satisfaction and Schemata Psych 205. Goodness of Network States and their Probabilities Goodness of a network state How networks maximize goodness.

Principled Probabilistic Inference and Interactive Activation Psych209 January 25, 2013.

The Emergent Structure of Semantic Knowledge

Perception and Thought as Constraint Satisfaction Processes Jay McClelland Symsys 100 April 27, 2010.

Interactive Activation Cognitive Core Class May 2, 2007.

CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.

How Do Brain Areas Work Together When We Think, Perceive, and Remember? J. McClelland Stanford University.

Bayesian inference in neural networks

Learning Deep Generative Models by Ruslan Salakhutdinov

CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.

Deep Feedforward Networks

Network States as Perceptual Inferences

Perception, interaction, and optimality

CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.

Nicolas Alzetta CoNGA: Cognition and Neuroscience Group of Antwerp

Real Neurons Cell structures Cell body Dendrites Axon

Bayesian inference in neural networks

Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.

Simple learning in connectionist networks

Brain States: Top-Down Influences in Sensory Processing

Interacting Roles of Attention and Visual Salience in V4

Hidden Markov Models Part 2: Algorithms

Volume 20, Issue 5, Pages (May 1998)

Illusory Jitter Perceived at the Frequency of Alpha Oscillations

Cognitive Processes PSY 334

Perceptual Echoes at 10 Hz in the Human Brain

Two-Dimensional Substructure of MT Receptive Fields

The free-energy principle: a rough guide to the brain? K Friston

Volume 20, Issue 5, Pages (May 1998)

Connectionist Units, Probabilistic and Biologically Inspired

Some Basic Aspects of Perceptual Inference Under Uncertainty

Volume 87, Issue 1, Pages (July 2015)

Volume 93, Issue 2, Pages (January 2017)

Backpropagation.

Confidence as Bayesian Probability: From Neural Origins to Behavior

A Switching Observer for Human Perceptual Estimation

Boltzmann Machine (BM) (§6.4)

Artificial neurons Nisheeth 10th January 2019.

Graded Constraint Satisfaction, the IA Model, and Network States as Perceptual Inferences Psychology 209 January 15, 2019.

Ralf M. Haefner, Pietro Berkes, József Fiser Neuron

Brain States: Top-Down Influences in Sensory Processing

A Switching Observer for Human Perceptual Estimation

Moran Furman, Xiao-Jing Wang Neuron

Uma R. Karmarkar, Dean V. Buonomano Neuron

James M. Jeanne, Tatyana O. Sharpee, Timothy Q. Gentner Neuron

Simple learning in connectionist networks

Timescales of Inference in Visual Adaptation

The Normalization Model of Attention

Neural Network Models in Vision

Tuned Normalization Explains the Size of Attention Modulations

CSC 578 Neural Networks and Deep Learning

Presentation transcript:

Network States as Perceptual Inferences Psychology 209 January 16, 2018

Overview Network state as a perceptual inference Goodness of a network state How networks maximize goodness The Hopfield network and Rumelhart’s continuous version The Boltzmann Machine, and the relationship between goodness and probability Sampling from the probability distribution over states The IA model: Evidence and issues The IA model Problem and solution – the MIA model Mutual constraint satisfaction in the brain

Network Goodness and How to Increase it

The Hopfield Network Assume symmetric weights. Units have binary states [+1,-1] Units set into initial states Choose a unit to update at random If net > 0, then set state to 1. Else set state to -1. Goodness always increases… until it stops changing.

Rumelhart’s Continuous Version Unit states have values between 0 and 1. Units are updated asynchronously. Update is gradual, according to the rule:

The Cube Network Positive weights have value +1 Negative weights have value -1 ‘External input’ is implemented as a positive bias of .5 to all units.

Goodness Landscape of Cube Network

The Boltzmann Machine: The Stochastic Hopfield Network Units have binary states [0,1], Update is asynchronous. The activation function is: Assuming processing is ergodic: that is, it is possible to get from any state to any other state, then when the state of the network reaches equilibrium, the relative probability and relative goodness of two states are related as follows: or More generally, at equilibrium we have the Probability-Goodness Equation:

Simulated Annealing Start with high temperature. This means it is easy to jump from state to state. Gradually reduce temperature. In the limit of infinitely slow annealing, we can guarantee that the network will be in the best possible state (or in one of them, if two or more are equally good). Thus, the best possible interpretation can always be found (if you are patient)!

Exploring Probability Distributions over States Imagine settling at a fixed non-zero temperature, such as T = 1. At this temperature, there’s still some probability of being in or switching to a state that is less good than one of the optimal states. Consider an ensemble of networks. At equilibrium (i.e. after enough cycles, possibly with annealing) the relative frequencies of being in the different states will approximate the relative probabilities given by the Probability-Goodness equation.

Findings Motivating the IA Model The word superiority effect (Reicher, 1969) Subjects identify letters in words better than single letters or letters in scrambled strings. The pseudoword advantage The advantage over single letters and scrambled strings extends to pronounceable non-words (e.g. LEAT LOAT…) The contextual enhancement effect Increasing the duration of the context or of the target letter facilitates correct identification. Reicher’s experiment: Used pairs of 4-letter words differing by one letter READ ROAD The ‘critical letter’ is the letter that differs. Critical letters occur in all four positions. Same critical letters occur alone or in scrambled strings _E__ _O__ EADR EODR Percent Correct W PW Scr L

_E__ O READ READ

Questions Can we explain the Word Superiority Effect and the Contextual Enhancement Effect as a consequence of a synergistic combination of ‘top-down’ and ‘bottom-up’ influences? Can the same processes also explain the Pseudoword advantage? What specific assumptions are necessary to capture the data? What can we learn about these assumptions from the study of model variants and effects of parameter changes? Can we derive novel predictions? What do we learn about the limitations as well as the strengths of the model?

Initial Approach Draw on ideas from the way neurons work Keep it simple

The Interactive Activation Model Feature, letter and word units. Activation is the system’s only ‘currency’ Mutually consistent items on adjacent levels excite each other Mutually exclusive alternatives inhibit each other. Response selected from the letter units in the cued location according to the Luce choice rule: where

IAC Activation Function Unit i Output from unit j wij max min rest a Calculate net input to each unit: neti = Sjoj wij Set outputs: oj = [aj]+

Interactive Activation

How the Model Works: Words vs. Single Letters

Word and Letter Level Activations for Words and Pseudowords Idea of ‘conspiracy effect’ rather than consistency with rules as a basis of performance on ‘regular’ items.

The problem with the 1981 Model The model did not show the empirically-observed pattern of ‘logistic additivity’ when context and stimulus information were separately manipulated. Massaro & Cohen (1991) presented different /l/ t0 /r/-like segments in four contexts: “p_ee”,”t_ee”,”s_ee”, “v_ee” Idealization of empirical pattern Simulation

The Multinomial IA Model Very similar to Rumelhart’s 1977 formulation Based on a simple generative model of displays in letter perception experiments. Experimenter selects a word, Selects letters based on word, with possible random errors Selects features based on letters, again with possible random error AND/OR Visual system registers features with some possibility of error Some features may be missing as in the WOR? example above Units without parents have biases equal to log of prior Weights defined ‘top down’: correspond to log of p(C|P) where C = child, P = parent Units within a layer take on probabilistic activations based on softmax function (at right) only one unit allowed to be active within each set of mutually exclusive hypotheses with probability ri A state corresponds to one active word unit and one active letter unit in each position, together with the provided set of feature activations. Response selection: - Select the most active letter in the target position - If it is one of the alternatives, select that alternative - If not, choose randomly between the two alternatives If the priors and weights correspond to those underlying the generative model, than states are ‘sampled’ in proportion to their posterior probability State of entire system = sample from joint posterior State of word or letter units in a given position = sample from marginal posterior

Simulated and Calculated Probabilities1 1See citation next slide

Simulation of Word and Pseudoword Enhancement Effects1 1McClelland, J. L., Mirman, D., Bolger, D. J., & Khaitan, P. (2014). Interactive activation and mutual constraint satisfaction in perception and cognition. Cognitive Science, 6, pp. 1139-1189. DOI: 10.1111/cogs.12146. [PDF]

Input and activation of units in PDP models General form of unit update: Simple version used in cube simulation: An activation function that links PDP models to Bayesian ideas: Or set activation to 1 probabilistically: max=1 a min=-.2 rest unit i Input from unit j wij ai or pi neti

Interactivity in the Brain Bidirectional Connectivity Maunsell & van Essen Interactions between V5 (MT) and V1/V2: Hupe el al Subjective Contours in V1: Lee and Nguyen Binocular Rivalry Leopold and Logothetis

Hupe, James, Payne, Lomber, Girard & Bullier (Nature, 1998, 394, 784-787) Investigated effects of cooling V5 (MT) on neuronal responses in V1, V2, and V3 to a bar on a background grid of lower contrast. MT cooling typically produces a reversible reduction in firing rate to V1/V2/V3 cells’ optimal stimulus (figure) Top down effect is greatest for stimuli of low contrast. If the stimulus is easy to see when it is not moving, top-down influences from MT have little effect. Concept of ‘inverse effectiveness’ arises here and in many other related cases. *

Lee & Nguyen (PNAS, 2001, 98, 1907-1911) They asked the question: Do V1 neurons participate in the formation of a representation of the illusory contour seen in the upper panel (but not in the lower panel)? They recorded from neurons in V1 tuned to the illusory line segment, and varied the position of the illusory segment with respect to the most responsive position of the neuron.

Response to the illusory contour is found at precisely the expected location.

Temporal Response to Real and Illusory Contours Neuron’s receptive field falls right over the middle of the real or illusory line defining the bottom edge of the square

Figure shows a V1/V2 neuron that showed strong modulation in firing around epochs in which the monkey perceives the cell’s preferred stimulus. From Leopold and Logothetis, 1996. Top: psth’s show strong orientation preference. Bottom: When both stimuli are presented simultaneously, neuron is silent just before a response indicating perception of the null direction, but quite active just before a response (t < 0) indicating perception of the preferred direction.

Leopold and Logothetis (Nature, 1996, 379, 549-553) found that some neurons in V1/V2 as well as V4 modulate their responses in concert with Monkey’s percept, as if participating in a massively distributed constraint-satisfaction process. However, some neurons in all areas do not modulate their responses. Thus the conscious percept appears to be correlated with the activity of only a subset of neurons. The fraction of neurons that covary with perception is greater in higher areas.