Graded Constraint Satisfaction, the IA Model, and Network States as Perceptual Inferences Psychology 209 January 15, 2019.

Graded Constraint Satisfaction, the IA Model, and Network States as Perceptual Inferences
Psychology 209 January 15, 2019

Overview The IA model Evidence and issues The modeling process
A problem for the IA model Theory Network state as a perceptual inference Goodness of a network state How networks maximize goodness The Hopfield network and Rumelhart’s continuous version The Boltzmann Machine, and the relation between goodness and probability Sampling from the probability distribution over states The multinomial IA model [Mutual constraint satisfaction in the brain]

Findings Motivating the IA Model
The word superiority effect (Reicher, 1969) Subjects identify letters in words better than single letters or letters in scrambled strings. The pseudoword advantage The advantage over single letters and scrambled strings extends to pronounceable non-words (e.g. LEAT LOAT…) The contextual enhancement effect Increasing the duration of the context or of the target letter facilitates correct identification. Reicher’s experiment: Used pairs of 4-letter words differing by one letter READ ROAD The ‘critical letter’ is the letter that differs. Critical letters occur in all four positions. Same critical letters occur alone or in scrambled strings _E__ _O__ EADR EODR Percent Correct W PW Scr L

_E__ O READ READ

Questions Can we explain the Word Superiority Effect and the Contextual Enhancement Effect as a consequence of a synergistic combination of ‘top-down’ and ‘bottom-up’ influences? Can the same processes also explain the Pseudoword advantage? What specific assumptions are necessary to capture the data? Can we derive novel predictions? Addressing failures as well as successes

Initial Approach Draw on ideas from the way neurons work
Keep the model as simple as possible while also allowing us to address the data

The Interactive Activation Model
Feature, letter and word units. Activation is the system’s only ‘currency’ Mutually consistent items on adjacent levels excite each other Mutually exclusive alternatives inhibit each other. Response selected from the letter units in the cued location according to the Luce choice rule: where

IAC Activation Function
Unit i Output from unit j wij max min rest a Calculate net input to each unit: neti = Sjoj wij Set outputs: oj = [aj]+

How the Model Works: Words vs. Single Letters
Assumption: Under the conditions of Reicher’s experiment, all features are activated correctly bottom up; the role of the word level is to enhance and sustain the activation of letter units, increasing probability of correct readout.

What about pronounceable nonwords?
There is no unit for ‘MAVE’ or other non-words Why do people see letters in these items better than single letters? Intuition: Partial activation of similar words Exploration: Requires letter-to-word inhibition to be small, so that partially matching words can be activated

Word and Letter Level Activations for Words and Pseudowords
Idea of ‘conspiracy effect’ rather than consistency with rules as a basis of performance on ‘regular’ items.

A novel prediction It doesn’t matter whether a letter string is pronounceable or not, as long as it partially activates known words. Test: Pronounceable pairs: SPET – SNET Unpronounceable pairs: SPLT – SNLT Pairs with no word neighbors: ZPJQ – ZNJQ In both simulation and experiment, performance on the first two types was about the same, performance on the last type was much worse.

Questions Can we explain the Word Superiority Effect and the Contextual Enhancement Effect as a consequence of a synergistic combination of ‘top-down’ and ‘bottom-up’ influences? Can the same processes also explain the Pseudoword advantage? What specific assumptions are necessary to capture the data? Can we derive novel predictions? Understanding failures as well as successes

The problem with the 1981 Model
The model did not show the empirically-observed pattern of ‘logistic additivity’ when context and stimulus information were separately manipulated as they often are in experiments. For example, Massaro & Cohen (1991) presented different segments ranging from /l/-like to /r/-like in four contexts: “p_ee”,”t_ee”,”s_ee”, “v_ee” Results: Idealized logistic-additivity pattern log(p(r)/(1-p(r)) Simulation log(p(r)/(1-p(r))

What’s Wrong? Two possibilities:
Perception is not interactive Some of the specific assumptions may be incorrect I explored this initially via simulations Short answer: As long as there is intrinsic variability, one tends to observe logistic additivity (McClelland, 1991, Cognitive Psychology). Can we fully understand why this is true? Need an analytic foundation, (as in McClelland, 2014, Frontiers in Cognitive Science).

Network Goodness and How to Increase it

The Hopfield Network Assume symmetric weights.
Units have binary states [+1,-1] Units set into initial states Choose a unit to update at random If net > 0, then set state to 1. Else set state to -1. Goodness always increases… until it stops changing.

Rumelhart’s Continuous Version
Unit states have values between 0 and 1. Units are updated asynchronously. Update is gradual, according to the rule:

The Cube Network Positive weights have value +1
Negative weights have value -1 ‘External input’ is implemented as a positive bias of .5 to all units.

Goodness Landscape of Cube Network

The Boltzmann Machine: The Stochastic Hopfield Network
Units have binary states [0,1], Update is asynchronous. The activation function is: Assuming processing is ergodic: that is, it is possible to get from any state to any other state, then when the state of the network reaches equilibrium, the relative probability and relative goodness of two states are related as follows: or More generally, at equilibrium we have the Probability-Goodness Equation:

Simulated Annealing Start with high temperature. This means it is easy to jump from state to state. Gradually reduce temperature. In the limit of infinitely slow annealing, we can guarantee that the network will be in the best possible state (or in one of them, if two or more are equally good). Thus, the best possible interpretation can always be found (if you are patient)!

Exploring Probability Distributions over States
Imagine settling at a fixed non-zero temperature, such as T = 1. At this temperature, there’s still some probability of being in or switching to a state that is less good than one of the optimal states. Consider an ensemble of networks. At equilibrium (i.e. after enough cycles, possibly with annealing) the relative frequencies of being in the different states will depend on the relative probabilities given by the Probability-Goodness equation. The state of any one of these networks is a sample from this distribution.

The Multinomial IA Model
Very similar to Rumelhart’s 1977 formulation Based on a simple generative model of displays in letter perception experiments. Experimenter selects a word, Selects letters based on word, with possible random errors Selects features based on letters, again with possible random error AND/OR Visual system registers features with some possibility of error Some features may be missing as in the WOR? example above Units without parents have biases equal to log of prior Weights defined ‘top down’: correspond to log of p(C|P) where C = child, P = parent Units within a layer take on probabilistic activations based on softmax function (at right) only one unit allowed to be active within each set of mutually exclusive hypotheses with probability ri A state corresponds to one active word unit and one active letter unit in each position, together with the provided set of feature activations. Response selection: - Select the most active letter in the target position - If it is one of the alternatives, select that alternative - If not, choose randomly between the two alternatives If the priors and weights correspond to those underlying the generative model, than states are ‘sampled’ in proportion to their posterior probability State of entire system = sample from joint posterior State of word or letter units in a given position = sample from marginal posterior

Simulated and Calculated Probabilities1
1See citation next slide

Simulation of Word and Pseudoword Enhancement Effects1
1McClelland, J. L., Mirman, D., Bolger, D. J., & Khaitan, P. (2014). Interactive activation and mutual constraint satisfaction in perception and cognition. Cognitive Science, 6, pp DOI: /cogs [PDF]

Input and activation of units in PDP models
General form of unit update: Simple version used in cube simulation: An activation function that links PDP models to Bayesian ideas: Or set activation to 1 probabilistically: max=1 a min=-.2 rest unit i Input from unit j wij ai or pi neti

Interactivity in the Brain
Bidirectional Connectivity Maunsell & van Essen Subjective Contours in V1: Lee and Nguyen Interactions between V5 (MT) and V1/V2: Hupe el al Binocular Rivalry Leopold and Logothetis McClelland, J. L., Mirman, D., Bolger, D. J., & Khaitan, P. (2014). Interactive activation and mutual constraint satisfaction in perception and cognition. Cognitive Science, 6, pp DOI: /cogs [PDF]

Lee & Nguyen (PNAS, 2001, 98, ) They asked the question: Do V1 neurons participate in the formation of a representation of the illusory contour seen in the upper panel (but not in the lower panel)? They recorded from neurons in V1 tuned to the illusory line segment, and varied the position of the illusory segment with respect to the most responsive position of the neuron.

Response to the illusory contour is found at precisely the expected location.

Temporal Response to Real and Illusory Contours
Neuron’s receptive field falls right over the middle of the real or illusory line defining the bottom edge of the square

Hupe, James, Payne, Lomber, Girard & Bullier (Nature, 1998, 394, 784-787)
Investigated effects of cooling V5 (MT) on neuronal responses in V1, V2, and V3 to a bar on a background grid of lower contrast. MT cooling typically produces a reversible reduction in firing rate to V1/V2/V3 cells’ optimal stimulus (figure) Top down effect is greatest for stimuli of low contrast. If the stimulus is easy to see when it is not moving, top-down influences from MT have little effect. Concept of ‘inverse effectiveness’ arises here and in many other related cases. *

Figure shows a V1/V2 neuron that showed strong modulation in firing around epochs in which the monkey perceives the cell’s preferred stimulus. From Leopold and Logothetis, 1996. Top: psth’s show strong orientation preference. Bottom: When both stimuli are presented simultaneously, neuron is silent just before a response indicating perception of the null direction, but quite active just before a response (t < 0) indicating perception of the preferred direction.

Leopold and Logothetis (Nature, 1996, 379, ) found that some neurons in V1/V2 as well as V4 modulate their responses in concert with Monkey’s percept, as if participating in a massively distributed constraint-satisfaction process. However, some neurons in all areas do not modulate their responses. Thus the conscious percept appears to be correlated with the activity of only a subset of neurons. The fraction of neurons that covary with perception is greater in higher areas.

Graded Constraint Satisfaction, the IA Model, and Network States as Perceptual Inferences Psychology 209 January 15, 2019.

Similar presentations

Presentation on theme: "Graded Constraint Satisfaction, the IA Model, and Network States as Perceptual Inferences Psychology 209 January 15, 2019."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Graded Constraint Satisfaction, the IA Model, and Network States as Perceptual Inferences Psychology 209 January 15, 2019.

Similar presentations

Presentation on theme: "Graded Constraint Satisfaction, the IA Model, and Network States as Perceptual Inferences Psychology 209 January 15, 2019."— Presentation transcript:

Similar presentations

About project

Feedback