Graded Constraint Satisfaction, the IA Model, and Network States as Perceptual Inferences Psychology 209 January 15, 2019.

Slides:



Advertisements
Similar presentations
CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)
Advertisements

Learning in Recurrent Networks Psychology 209 February 25, 2013.
A saliency map model explains the effects of random variations along irrelevant dimensions in texture segmentation and visual search Li Zhaoping, University.
For stimulus s, have estimated s est Bias: Cramer-Rao bound: Mean square error: Variance: Fisher information How good is our estimate? (ML is unbiased:
Stochastic Neural Networks, Optimal Perceptual Interpretation, and the Stochastic Interactive Activation Model PDP Class January 15, 2010.
Network Goodness and its Relation to Probability PDP Class Winter, 2010 January 13, 2010.
Interactive Activation: Behavioral and Brain Evidence and the Interactive Activation Model PDP Class January 8, 2010.
Stochastic Interactive Activation and Interactive Activation in the Brain PDP Class January 20, 2010.
Processing and Constraint Satisfaction: Psychological Implications The Interactive-Activation (IA) Model of Word Recognition Psychology /719 January.
CS 4700: Foundations of Artificial Intelligence
Cognition and Perception as Interactive Activation Jay McClelland Symsys 100 April 16, 2009.
Interactive Activation: Behavioral and Brain Evidence and the Interactive Activation Model PDP Class January 10, 2011.
The Interactive Activation Model. Ubiquity of the Constraint Satisfaction Problem In sentence processing –I saw the grand canyon flying to New York –I.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Perception, Thought and Language as Graded Constraint Satisfaction Processes Jay McClelland SymSys 100 April 12, 2011.
Reicher (1969): Word Superiority Effect Dr. Timothy Bender Psychology Department Missouri State University Springfield, MO
The Essence of PDP: Local Processing, Global Outcomes PDP Class January 16, 2013.
Constraint Satisfaction and Schemata Psych 205. Goodness of Network States and their Probabilities Goodness of a network state How networks maximize goodness.
Principled Probabilistic Inference and Interactive Activation Psych209 January 25, 2013.
Perception and Thought as Constraint Satisfaction Processes Jay McClelland Symsys 100 April 27, 2010.
Interactive Activation Cognitive Core Class May 2, 2007.
How Do Brain Areas Work Together When We Think, Perceive, and Remember? J. McClelland Stanford University.
Bayesian inference in neural networks
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Learning Deep Generative Models by Ruslan Salakhutdinov
Deep Feedforward Networks
Network States as Perceptual Inferences
Perception, interaction, and optimality
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Nicolas Alzetta CoNGA: Cognition and Neuroscience Group of Antwerp
Bayesian inference in neural networks
Network States as Perceptual Inferences
Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.
How to handle missing data values
Simple learning in connectionist networks
Brain States: Top-Down Influences in Sensory Processing
Interacting Roles of Attention and Visual Salience in V4
Hidden Markov Models Part 2: Algorithms
Volume 20, Issue 5, Pages (May 1998)
Illusory Jitter Perceived at the Frequency of Alpha Oscillations
Cognitive Processes PSY 334
Perceptual Echoes at 10 Hz in the Human Brain
Two-Dimensional Substructure of MT Receptive Fields
Volume 20, Issue 5, Pages (May 1998)
Connectionist Units, Probabilistic and Biologically Inspired
Some Basic Aspects of Perceptual Inference Under Uncertainty
Volume 87, Issue 1, Pages (July 2015)
Chapter 8: Estimating with Confidence
Confidence as Bayesian Probability: From Neural Origins to Behavior
A Switching Observer for Human Perceptual Estimation
Boltzmann Machine (BM) (§6.4)
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Brain States: Top-Down Influences in Sensory Processing
Chapter 8: Estimating with Confidence
Uma R. Karmarkar, Dean V. Buonomano  Neuron 
Chapter 8: Estimating with Confidence
Simple learning in connectionist networks
Timescales of Inference in Visual Adaptation
The Normalization Model of Attention
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Neural Network Models in Vision
Chapter 8: Estimating with Confidence
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Graded Constraint Satisfaction, the IA Model, and Network States as Perceptual Inferences Psychology 209 January 15, 2019

Overview The IA model Evidence and issues The modeling process A problem for the IA model Theory Network state as a perceptual inference Goodness of a network state How networks maximize goodness The Hopfield network and Rumelhart’s continuous version The Boltzmann Machine, and the relation between goodness and probability Sampling from the probability distribution over states The multinomial IA model [Mutual constraint satisfaction in the brain]

Findings Motivating the IA Model The word superiority effect (Reicher, 1969) Subjects identify letters in words better than single letters or letters in scrambled strings. The pseudoword advantage The advantage over single letters and scrambled strings extends to pronounceable non-words (e.g. LEAT LOAT…) The contextual enhancement effect Increasing the duration of the context or of the target letter facilitates correct identification. Reicher’s experiment: Used pairs of 4-letter words differing by one letter READ ROAD The ‘critical letter’ is the letter that differs. Critical letters occur in all four positions. Same critical letters occur alone or in scrambled strings _E__ _O__ EADR EODR Percent Correct W PW Scr L

_E__ O READ READ

Questions Can we explain the Word Superiority Effect and the Contextual Enhancement Effect as a consequence of a synergistic combination of ‘top-down’ and ‘bottom-up’ influences? Can the same processes also explain the Pseudoword advantage? What specific assumptions are necessary to capture the data? Can we derive novel predictions? Addressing failures as well as successes

Initial Approach Draw on ideas from the way neurons work Keep the model as simple as possible while also allowing us to address the data

The Interactive Activation Model Feature, letter and word units. Activation is the system’s only ‘currency’ Mutually consistent items on adjacent levels excite each other Mutually exclusive alternatives inhibit each other. Response selected from the letter units in the cued location according to the Luce choice rule: where

IAC Activation Function Unit i Output from unit j wij max min rest a Calculate net input to each unit: neti = Sjoj wij Set outputs: oj = [aj]+

How the Model Works: Words vs. Single Letters Assumption: Under the conditions of Reicher’s experiment, all features are activated correctly bottom up; the role of the word level is to enhance and sustain the activation of letter units, increasing probability of correct readout.

What about pronounceable nonwords? There is no unit for ‘MAVE’ or other non-words Why do people see letters in these items better than single letters? Intuition: Partial activation of similar words Exploration: Requires letter-to-word inhibition to be small, so that partially matching words can be activated

Word and Letter Level Activations for Words and Pseudowords Idea of ‘conspiracy effect’ rather than consistency with rules as a basis of performance on ‘regular’ items.

A novel prediction It doesn’t matter whether a letter string is pronounceable or not, as long as it partially activates known words. Test: Pronounceable pairs: SPET – SNET Unpronounceable pairs: SPLT – SNLT Pairs with no word neighbors: ZPJQ – ZNJQ In both simulation and experiment, performance on the first two types was about the same, performance on the last type was much worse.

Questions Can we explain the Word Superiority Effect and the Contextual Enhancement Effect as a consequence of a synergistic combination of ‘top-down’ and ‘bottom-up’ influences? Can the same processes also explain the Pseudoword advantage? What specific assumptions are necessary to capture the data? Can we derive novel predictions? Understanding failures as well as successes

The problem with the 1981 Model The model did not show the empirically-observed pattern of ‘logistic additivity’ when context and stimulus information were separately manipulated as they often are in experiments. For example, Massaro & Cohen (1991) presented different segments ranging from /l/-like to /r/-like in four contexts: “p_ee”,”t_ee”,”s_ee”, “v_ee” Results: Idealized logistic-additivity pattern log(p(r)/(1-p(r)) Simulation log(p(r)/(1-p(r))

What’s Wrong? Two possibilities: Perception is not interactive Some of the specific assumptions may be incorrect I explored this initially via simulations Short answer: As long as there is intrinsic variability, one tends to observe logistic additivity (McClelland, 1991, Cognitive Psychology). Can we fully understand why this is true? Need an analytic foundation, (as in McClelland, 2014, Frontiers in Cognitive Science).

Network Goodness and How to Increase it

The Hopfield Network Assume symmetric weights. Units have binary states [+1,-1] Units set into initial states Choose a unit to update at random If net > 0, then set state to 1. Else set state to -1. Goodness always increases… until it stops changing.

Rumelhart’s Continuous Version Unit states have values between 0 and 1. Units are updated asynchronously. Update is gradual, according to the rule:

The Cube Network Positive weights have value +1 Negative weights have value -1 ‘External input’ is implemented as a positive bias of .5 to all units.

Goodness Landscape of Cube Network

The Boltzmann Machine: The Stochastic Hopfield Network Units have binary states [0,1], Update is asynchronous. The activation function is: Assuming processing is ergodic: that is, it is possible to get from any state to any other state, then when the state of the network reaches equilibrium, the relative probability and relative goodness of two states are related as follows: or More generally, at equilibrium we have the Probability-Goodness Equation:

Simulated Annealing Start with high temperature. This means it is easy to jump from state to state. Gradually reduce temperature. In the limit of infinitely slow annealing, we can guarantee that the network will be in the best possible state (or in one of them, if two or more are equally good). Thus, the best possible interpretation can always be found (if you are patient)!

Exploring Probability Distributions over States Imagine settling at a fixed non-zero temperature, such as T = 1. At this temperature, there’s still some probability of being in or switching to a state that is less good than one of the optimal states. Consider an ensemble of networks. At equilibrium (i.e. after enough cycles, possibly with annealing) the relative frequencies of being in the different states will depend on the relative probabilities given by the Probability-Goodness equation. The state of any one of these networks is a sample from this distribution.

The Multinomial IA Model Very similar to Rumelhart’s 1977 formulation Based on a simple generative model of displays in letter perception experiments. Experimenter selects a word, Selects letters based on word, with possible random errors Selects features based on letters, again with possible random error AND/OR Visual system registers features with some possibility of error Some features may be missing as in the WOR? example above Units without parents have biases equal to log of prior Weights defined ‘top down’: correspond to log of p(C|P) where C = child, P = parent Units within a layer take on probabilistic activations based on softmax function (at right) only one unit allowed to be active within each set of mutually exclusive hypotheses with probability ri A state corresponds to one active word unit and one active letter unit in each position, together with the provided set of feature activations. Response selection: - Select the most active letter in the target position - If it is one of the alternatives, select that alternative - If not, choose randomly between the two alternatives If the priors and weights correspond to those underlying the generative model, than states are ‘sampled’ in proportion to their posterior probability State of entire system = sample from joint posterior State of word or letter units in a given position = sample from marginal posterior

Simulated and Calculated Probabilities1 1See citation next slide

Simulation of Word and Pseudoword Enhancement Effects1 1McClelland, J. L., Mirman, D., Bolger, D. J., & Khaitan, P. (2014). Interactive activation and mutual constraint satisfaction in perception and cognition. Cognitive Science, 6, pp. 1139-1189. DOI: 10.1111/cogs.12146. [PDF]

Input and activation of units in PDP models General form of unit update: Simple version used in cube simulation: An activation function that links PDP models to Bayesian ideas: Or set activation to 1 probabilistically: max=1 a min=-.2 rest unit i Input from unit j wij ai or pi neti

Interactivity in the Brain Bidirectional Connectivity Maunsell & van Essen Subjective Contours in V1: Lee and Nguyen Interactions between V5 (MT) and V1/V2: Hupe el al Binocular Rivalry Leopold and Logothetis McClelland, J. L., Mirman, D., Bolger, D. J., & Khaitan, P. (2014). Interactive activation and mutual constraint satisfaction in perception and cognition. Cognitive Science, 6, pp. 1139-1189. DOI: 10.1111/cogs.12146. [PDF]

Lee & Nguyen (PNAS, 2001, 98, 1907-1911) They asked the question: Do V1 neurons participate in the formation of a representation of the illusory contour seen in the upper panel (but not in the lower panel)? They recorded from neurons in V1 tuned to the illusory line segment, and varied the position of the illusory segment with respect to the most responsive position of the neuron.

Response to the illusory contour is found at precisely the expected location.

Temporal Response to Real and Illusory Contours Neuron’s receptive field falls right over the middle of the real or illusory line defining the bottom edge of the square

Hupe, James, Payne, Lomber, Girard & Bullier (Nature, 1998, 394, 784-787) Investigated effects of cooling V5 (MT) on neuronal responses in V1, V2, and V3 to a bar on a background grid of lower contrast. MT cooling typically produces a reversible reduction in firing rate to V1/V2/V3 cells’ optimal stimulus (figure) Top down effect is greatest for stimuli of low contrast. If the stimulus is easy to see when it is not moving, top-down influences from MT have little effect. Concept of ‘inverse effectiveness’ arises here and in many other related cases. *

Figure shows a V1/V2 neuron that showed strong modulation in firing around epochs in which the monkey perceives the cell’s preferred stimulus. From Leopold and Logothetis, 1996. Top: psth’s show strong orientation preference. Bottom: When both stimuli are presented simultaneously, neuron is silent just before a response indicating perception of the null direction, but quite active just before a response (t < 0) indicating perception of the preferred direction.

Leopold and Logothetis (Nature, 1996, 379, 549-553) found that some neurons in V1/V2 as well as V4 modulate their responses in concert with Monkey’s percept, as if participating in a massively distributed constraint-satisfaction process. However, some neurons in all areas do not modulate their responses. Thus the conscious percept appears to be correlated with the activity of only a subset of neurons. The fraction of neurons that covary with perception is greater in higher areas.