Download presentation
Presentation is loading. Please wait.
1
Perception, interaction, and optimality
2
DALMATIAN
8
The policeman ate the spaghetti with a nice sauce.
The policeman ate the spaghetti with a nice fork.
11
Context matters
12
Today An influential early account of how/why context matters in perception/recognition that depends on interactive processing A critique of the interactive approach that depends on proposal that perception/recognition is a process of optimal statistical inference. Conflict between “neural-like” processing account versus “functional” accounts of perceptual and cognitive phenomena. A demonstration that an interactive/distributed network can carry out optimal statistical inference.
13
Key task: word superiority effect
People are faster to recognize (perceive?) a letter when it appears within a word or a word-like letter string than (1) when it appears alone or (2) when it appears within a non-word-like letter string. So: Is the final letter an E or F? SAVE, MAVE … faster than … KLVE or … E Important because perception/recognition of the *feature/letter* is influenced by information about the *object/word* in which it appears… …SO process can’t be: first recognize features, then recognize object
17
At word level, about 1100 4-letter English words…
At letter level, 26 units (one for each letter) in each of 4 possible locations (96 units)… At feature level, “present” and “absent” units for each of 14 possible line segments at each of 4 locations (112 units)
18
Though a bit ad-hoc the model was highly successful:
Weight values: Inhibitory among incompatible items, excitatory among mutually compatible items… BUT no principled way of setting them; set by hand to get something that worked okay… Though a bit ad-hoc the model was highly successful: Accounted for basic word-superiority effect on letter perception Explained ambiguity resolution Explained aspects of the time-course of processing Predicted word-superiority effects for well-formed nonwords, even if these were unpronounceable.
20
What’s the problem? For recognition/perception problems like this, it is possible to compute the “right” answer—the exactly correct probability for the ambiguous or missing information—using Bayesian inference. Human behavior seems to accord with this “correct” inference behavior… …but the IAC model does *not*---its behaves similarly to people, but does not compute exactly correct probabilities. This difference was thought to arise because the model is interactive. So, important difference: building in interactivity as a “neutrally motivated” processing mechanism seemed to produce a model that is wrong in important ways.
21
Bayes rule You work for a Madison Avenue advertising agency in the early 1960’s There are 80 men and 20 women at the firm 72 men have short hair 82 employees have short hair You spot someone with short hair at the end of the hallway. What is the probability that the person is a man?
22
p(m | s) = p (m) x p (s|m) p(s) # males with short hair # males
# people with short hair # males # males + # females # males with short hair # males # people with short hair # males + # females
23
p (m) x p (s|m) p(s) m m+f ms m x m m+f ms x ms+fs = ms+fs m+f ms
p(m | s) = p (m) x p (s|m) p(s) = p(m | s)
24
P(m | s) = ? P(m | s) = p(m)… 80/100 = 0.8 P(m | s) = p(m) x p(s | m) … 0.8 x 72/80 = 0.8 x 0.9 … P(m | s) = p(m) x p(s | m) p(s) 0.8 x 0.9 0.82 =~0.88
26
But what about word recognition?
If you can specify the relevant priors, likelihoods, and unconditional probabilities, you can compute exact probabilities for letter identities, given some input features and lexicon. To specify those probabilities you need a “generative model”—a kind of hypothesis about how the distribution of words in the environment is generated, that allows for computation of the needed probabilities.
27
A probabilistic generative model of 4-letter words
Select a word from among all candidate word with a probability proportional to its log frequency… For each location, select a letter based on the conditional probability of the letters in that location given the word, p(l | w). For each feature in each location, select “present” or “absent” depending upon the conditional probability of these values given the letter selected in the same location, p(f | l). So generative model specified prior probabilities of words, conditional probabilities of letters in 4 locations given word, and conditional probabilities of feature status in different locations given letter.
28
For each feature in location 2:
Compute p(l | f) = p(f | l) * p(l) / p(f) Product of all these is p(l | all features) Compute this for all letters in all locations BUT this does not take context into account!
29
At word level, for every word, compute p(w | l)
= p (l | w) p(w) / p(l) …for each letter in each position Product of these over letter positions is probability of word
31
So…. For a generative model that specifies required probabilities, you can exactly compute what the “correct” recognition behavior is! And, you can do this in feed-forward pass---no need for interaction AND, interaction seems wrong, because first-pass probabilities over letters & words are incorrect. Conclusion: interactive model probably wrong… …and IAC model behavior shown to be non-optimal where human behavior is optimal…
32
Multinomial model
34
Bi-directional weights set to natural log of actual conditional probabilities stipulated by generative model… Bias weights on word units set to natural log of prior probability of words (ie, subjective frequency)… Activation function: Stochastic winner-take all within competing pools (instead of direct inhibition) Softmax activation function:
37
Letter probabilities initially determined by bottom-up input
After 2 steps, determined partly by word activity… …but still INCORRECT! Need to let it continue to run for several cycles. Eventually…
39
So interactive model can sample from correct posterior distribution….
Why prefer one model over another?
41
Other reasons? For discussion…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.