Announcements Next few lectures Next few lectures –Require some syntactic knowledge –Review Chapter 2’s Syntax Section Readings Readings –Original Articles Greater difficulty level –Read in order as stated in syllabus. –Statistics knowledge? Sample exam questions Sample exam questions –This week (Friday): I will post a few Qs in our discussion forum. –Next week (Thursday): You will submit your Qs into dropbox
Psy1302 Psychology of Language Lecture 9 Models of Speech Recognition
Continuation of Last Lecture… Outline We are fast at speech recognition. We are fast at speech recognition. How do we achieve speed? How do we achieve speed? –Parallel Activation –Constrained by contextual Effects –Terminologies and Ideas –Two Classic Models Cohort Model Cohort Model TRACE Model TRACE Model [and many experimental paradigms and findings]
Top-down Example 1 Last time: Shadowing and Corrections IntendedMispronunciationFeature narrowmarrowplace narrowmarrowplace detrimentaltetrimentalvoicing detrimentaltetrimentalvoicing perfectionisticberfectionisticvoicing perfectionisticberfectionisticvoicing livesrivesplace livesrivesplace backmackmanner backmackmanner hamperedkamperedplace & manner hamperedkamperedplace & manner takenakemanner takenakemanner selfzelfvoicing selfzelfvoicing comfortvomfortall three comfortvomfortall three
Bottom-Up vs. Top-Down Processing Bottom-up: Processing that is stimulus or data- driven. Bottom-up: Processing that is stimulus or data- driven. Top-down: Processing that involves the use of knowledge obtained from higher-level sources Top-down: Processing that involves the use of knowledge obtained from higher-level sources Terminologies
Top-down Examples 2 Lexical Influence on Phoneme Perception Ganong (1980) Ganong (1980) –Splice speech waves /d/ to /t/ + /æsk/ dask-task /d/ to /t/ + /æsk/ dask-task /d/ to /t/ + /æš/ dash-tash /d/ to /t/ + /æš/ dash-tash –Obtained % of /d/ identification Two possible outcomes: Two possible outcomes: –No Effect of Lexical Knowledge –Effect of Lexical Knowledge
nonword-word: dask-task word-nonword: dash-tash % identification as /d/ short VOT (d) long VOT (t) Top-down Examples 2 Lexical Influence on Phoneme Perception
Ganong (1980) Ganong (1980) –Lexical knowledge influence perception –Only able to shift AMBIGUOUS phones and not those at the ends of continuum Top-down Examples 2 Lexical Influence on Phoneme Perception nonword-word: dask-task word-nonword: dash-tash % identification as /d/ short VOT (d) long VOT (t) 100 0
Top-down Examples 3 Phoneme Restoration Effect Warren (1970) & Warren & Warren (1970) : “ The state governors met with their respective legiSlatures convening in the capital city ” “ The state governors met with their respective legiSlatures convening in the capital city ” –S replaced with cough or noise and played to listeners –Then asked listener to figure out where the sound was replaced. –What happened?
Top-down Examples 3 Phoneme Restoration Effect Warren (1970) & Warren & Warren (1970) : It was found that the *eel was on the orange. It was found that the *eel was on the axle. It was found that the *eel was on the fishing-rod. It was found that the *eel was on the table. It was found that the *eel was on the shoe.
Gating Task (Grosjean 1980) Cumulative fragment of speech played. Cumulative fragment of speech played. Measure how much from the onset of word participants need to hear before identifying it. Measure how much from the onset of word participants need to hear before identifying it. –RECOGNITION POINT = earliest “gate” at which the participant picks the correct response and maintains it for the rest of the trials. 50 ms100 ms150 ms200 ms250 ms 300 ms 367 ms
Top-down Examaple 4 Gating Task (Grosjean 1980) Compare word in isolation and in context. Compare word in isolation and in context. In isolation: “camel” In isolation: “camel” In context: “The kids went to the zoo and rode on the camel” In context: “The kids went to the zoo and rode on the camel” –Recognition Point: In Isolation ~333 ms In context ~199 ms In context ~199 ms 50 ms100 ms150 ms200 ms250 ms 300 ms 367 ms 50 ms100 ms150 ms200 ms250 ms 300 ms Isolation Context
Top-down Example 5 Word Monitoring ( Top-down Example 5 Word Monitoring (Marslen-Wilson, Brown, & Tyler, 1988) Listening to sentences & monitoring for specific words Listening to sentences & monitoring for specific words –Word in isolation: ~300 ms –Normal: The boy held the guitar. ~ 240 ms. –Discourse Incongruence: ~235 ms. –Pragmatic Anomalous: The boy buried the guitar. ~ 268 ms –Semantic Anomalous: The boy drank the guitar. ~291 ms –Categorical Anomalous: The boy slept the guitar. ~320 ms
Speech Recognition How do we achieve speed? How do we achieve speed? –Parallel search I.e. Activation of potential candidates in parallel I.e. Activation of potential candidates in parallel –Consult contextual information Use of contextual information to select or weed out candidates! Use of contextual information to select or weed out candidates!
Models that consider contextual information Examine 2 influential models of speech processing Examine 2 influential models of speech processing (evolved from Forster & Morton’s) –Cohort Model –TRACE Model Currently other existing models in the literature. Currently other existing models in the literature.
Subtext How might psychology experiments How might psychology experiments –inform us of our mental processes –help us create models of our mental representations and of how our mind process information? –be designed to help us distinguish between models or help us revise an existing one?
Subtext In evaluating any model, consider: In evaluating any model, consider: –How well does the model account for existing experimental findings? –Is the representation depicted in the model an intuitively plausible one? –Does the model make predictions that are not in fact borne out by available empirical (i.e. observational and/or experimental) evidence?
INTEGRATION STAGE (in which the semantic and syntactic properties of the chosen words are utilized) SELECTION STAGE (the most likely candidate is chosen from cohort) ACCESS STAGE (perceptual representation used to activate lexical items, thus generating a candidate set of items – the cohort) Cohort Model Marslen-Wilson and Welsh (1978) Input
Cohort Model – Access Stage Marslen-Wilson and Welsh (1978) Ssongstorysparrowsaunterslowsecretsentry... (i.e., words beginning w/ the sound heard so far)
Cohort Model – Access Stage Marslen-Wilson and Welsh (1978) SPspicespokesparespinsplendidspellingspread (candidates that no longer fit the incoming stream, are eliminated)...
Cohort Model – Access Stage Marslen-Wilson and Welsh (1978) SPIspitspigotspillspiffyspinakerspiritspin...
Cohort Model – Access Stage Marslen-Wilson and Welsh (1978) SPINspinspinachspinsterspinakerspindle
SPINAspinach
SPINAspinach word uniqueness point Note: Some words have no uniqueness point (e.g., “spin”)
Cohort Model – Access Stage Marslen-Wilson and Welsh (1978) Uniqueness point Uniqueness point Recognition point Recognition point Highly Correlated. Highly Correlated. Support idea of cohort. Support idea of cohort.
Cohort Model Auditory Lexical Decision. Auditory Lexical Decision. Uniqueness point ms constant for responding “NO, It’s not a word.” Uniqueness point ms constant for responding “NO, It’s not a word.”
INTEGRATION STAGE (in which the semantic and syntactic properties of the chosen words are utilized) SELECTION STAGE (the most likely candidate is chosen from cohort) ACCESS STAGE (perceptual representation used to activate lexical items, thus generating a candidate set of items – the cohort) Cohort Model Marslen-Wilson and Welsh (1978) Input
Cohort Model – Access Stage Marslen-Wilson and Welsh (1978) Selection stage: Making use of contextual effects Selection stage: Making use of contextual effects to achieve speed.Contexts: –All the information not in the immediate sensory signal. –E.g., Information from previous sensory input (prior context) to higher knowledge sources (e.g., lexical, syntactic, semantic, and pragmatic info). One big Q: One big Q: –Which contextual effects are helpful?
Cohort Model – Access Stage Marslen-Wilson and Welsh (1978) Another BIG Q : do/can we consider contextual information? When do/can we consider contextual information? –Generation vs. Selection Proposal vs. Disposal Proposal vs. Disposal –Pre-lexical or Post-lexical How do we address the when Q experimentally? How do we address the when Q experimentally?
Zwitserlood (1989) Crazy complicated classic experiment. Crazy complicated classic experiment. Involves 3 separate groups of participants Involves 3 separate groups of participants –Involves Sentence Completion Task. Determines the Strength of Contextual Information Determines the Strength of Contextual Information –Involves Gating Task. Determines Probe Positions on the PRIME word. Determines Probe Positions on the PRIME word. –Involves Cross-Modal Priming. Determines whether CAPITAIN primes BOAT and MONEY (semantically related to CAPITAL) at various probe positions (i.e. points in time). Determines whether CAPITAIN primes BOAT and MONEY (semantically related to CAPITAL) at various probe positions (i.e. points in time).
KAPITEIN BOOT GELD KAPITEIN KAPITAAL Cross-Modal Priming or Hear Prime: Lexical Decision: “BOAT”“MONEY” Varying position of when to do lexical decision
What is the strength of the context? (sentence completion) What’s a good continuation for: They mourned the loss of their _______. They mourned the loss of their _______. With dampened spirits the men stood around the grave. They mourned the loss of their _______. With dampened spirits the men stood around the grave. They mourned the loss of their _______. Classify Responses of Participants into: Classify Responses of Participants into: –Biasing contexts: 16%-33% said the prime word and 0% said prime competitor. 16%-33% said the prime word and 0% said prime competitor. –Neutral contexts: 0% said prime word, and 0% said prime competitor. 0% said prime word, and 0% said prime competitor.
Where to Probe for Activation? (Gating Task) Isolation Point: 1 st time 50% of the participants pick the correct word and sticks with it to the end. Isolation Point: 1 st time 50% of the participants pick the correct word and sticks with it to the end. PROBE POSITIONS Position 0: Onset of word Position 0: Onset of word Position 1: Isolation Point with Biasing Context Position 1: Isolation Point with Biasing Context –(ave. 130 ms after onset) Position 2: Isolation Point with Neutral Context Position 2: Isolation Point with Neutral Context –(ave. 199 ms after onset) Position 3: Isolation Point in Carrier Phrase Position 3: Isolation Point in Carrier Phrase –The next word is ____. (ave. 278 ms after onset) Position 4: Recognition Point w/ Carrier Phrase Position 4: Recognition Point w/ Carrier Phrase –(ave. 410 ms after onset)
When does context play a role? (Four Possible Outcomes) Before word spoken During lexical access During selection phase At post-lexical integration stage TASK Hear: CAPTAIN Lexical Decision: BOAT or MONEY BOAT – solid line MONEY – dashed line GRAPH LEGEND
Context plays a role BEFORE word spoken C A P T A I N BOAT MONEY
Context plays a role DURING lexical access BOAT MONEY C A P T A I N
Context plays a role DURING selection phase BOAT MONEY C A P T A I N
Context plays a role AT POST-LEXICAL integration BOAT MONEY C A P T A I N
Comparing Data to Predictions Zwitserlood’s prediction slides plots level of activation vs. time. Zwitserlood’s prediction slides plots level of activation vs. time. Her data is in terms of reaction time vs. probe positions (~time). Her data is in terms of reaction time vs. probe positions (~time). How do we compare the two? How do we compare the two? –Assumption: Faster reaction = higher level of activation
Results Reaction Time (ms) C A P T A I N MONEY BOAT
INTEGRATION STAGE (in which the semantic and syntactic properties of the chosen words are utilized) SELECTION STAGE (the most likely candidate is chosen from cohort) ACCESS STAGE (perceptual representation used to activate lexical items, thus generating a candidate set of items; the cohort) Cohort Model Marslen-Wilson and Welsh (1978) Autonomous Interactive Input
Some Terminologies Serial vs. Parallel Serial vs. Parallel Bottom-up vs. Top-down Bottom-up vs. Top-down Autonomous vs. Interactive Autonomous vs. Interactive –Autonomous: stage of processing proceeds independently of information from other processing modules –Interactive: stage of processing quickly considers information from other processing modules as info comes in Incremental: structuring and interpreting information as it comes in Incremental: structuring and interpreting information as it comes in Terminologies
Problem for Cohort Model If you set up the wrong cohort, how do you recover? If you set up the wrong cohort, how do you recover? –e.g. dragedy for tragedy –Misalignment problem The sky is falling! This guy is falling! or ThesKyisfalling!
Revised Cohort Model (Marslen-Wilson (1987) Still set up an initial cohort of candidates. Still set up an initial cohort of candidates. Elimination process is no longer all-or nothing. Items that do not receive further positive information decay in activation rather than being eliminated Elimination process is no longer all-or nothing. Items that do not receive further positive information decay in activation rather than being eliminated –Allows backtracking for misheard/distorted words –Context loses some of its power, as it cannot be used to influence the items that form the initial cohort. A recognized word has a higher relative activation than other words in the cohort. A recognized word has a higher relative activation than other words in the cohort.
TRACE Model (McClelland, Elman, Rumelhart’86) Model used for other things… Model used for other things…
Connectionist Models A NEURON NETWORK OF NEURONS Connections can be either inhibitory or excitatory. Digression: Connectionist Networks
Properties of Connectionist Unit Activation Level = w 1 *A 1 + w 2 *A w 8 *A 8 where -1 w n +1 Digression: Connectionist Networks
Squashing/Threshold Function If Activation Level < 0.5 Output = 0 If Activation Level 0.5 Output = 1 Digression: Connectionist Networks
Network of Connectionist Units Digression: Connectionist Networks
McClelland (1981) Art Lance Ralph RickSam 20s 30s 40s Jet SharkSing. Marr. Div. Pusher Burglar Bookie Digression: Connectionist Networks
Inhibitory Connections Art Lance Ralph RickSam 20s 30s 40s Jet SharkSing. Marr. Div. Pusher Burglar Bookie Digression: Connectionist Networks
Who’s Art? Art Lance Ralph RickSam 20s 30s 40s Jet SharkSing. Marr. Div. Pusher Burglar Bookie Digression: Connectionist Networks
Who’s Art? Art Lance Ralph RickSam 20s 30s 40s Jet SharkSing. Marr. Div. Pusher Burglar Bookie Digression: Connectionist Networks
Content Addressability: Who is Single and 30-something? Art Lance Ralph RickSam 20s 30s 40s Jet SharkSing. Marr. Div. Pusher Burglar Bookie Digression: Connectionist Networks
Content Addressability: Who is Single and 30-something? Art Lance Ralph RickSam 20s 30s 40s Jet SharkSing. Marr. Div. Pusher Burglar Bookie Digression: Connectionist Networks
Who is Single and 30- something? Art Lance Ralph RickSam 20s 30s 40s Jet SharkSing. Marr. Div. Pusher Burglar Bookie Digression: Connectionist Networks
Training a Connectionist Model All connection weights are initially set to random numbers. All connection weights are initially set to random numbers. Input pattern is applied. Input pattern is applied. Model Produces output. (garbage) Model Produces output. (garbage) Output compared to “desired output” Output compared to “desired output” Connection weights adjusted slightly. Connection weights adjusted slightly. Repeat process with other inputs. Repeat process with other inputs. ==> Memory is in the weights. Digression: Connectionist Networks
Simple Learning Rule for a Node If Node is ON and is suppose to be OFF: If Node is ON and is suppose to be OFF: –turn down all connections from nodes passing activation to it. (w = w ). If Node is OFF and is suppose to be ON: If Node is OFF and is suppose to be ON: –turn up all connections from nodes passing activation to it. (w = w ) Digression: Connectionist Networks
TRACE Model Elman & McClelland (note: TRACE preconfigured. Not trained)
panban ma n can /p//b/ /k/ +vo i +na s - son - voi CONTEXT SENSORY INPUT Features Phonemes Words TRACE Model
Features of the TRACE Model (in comparison to OLD Cohort Model) TRACE can “recover” if a given segment (even the first one) is missed TRACE can “recover” if a given segment (even the first one) is missed –Does not rely heavily on knowing the left edge of the word TRACE’s bidirectional connections account for phoneme restoration & other contextual effects on speech recognition TRACE’s bidirectional connections account for phoneme restoration & other contextual effects on speech recognition TRACE predicts a lot of top-down information flow TRACE predicts a lot of top-down information flow –Potential problem: Weight given to contextual information may be too strong? Cohort vs. Trace
Cohort vs. TRACE? Do rhymes compete? Old Cohort Model: onset similarity is primary because of the incremental (serial) nature of speech Old Cohort Model: onset similarity is primary because of the incremental (serial) nature of speech –Cat activates cap, cast, cattle, camera, etc. –Rhymes won’t compete TRACE: global similarity constrained by incremental nature of speech TRACE: global similarity constrained by incremental nature of speech –Cohorts and rhymes compete, but with different time course Cohort vs. Trace
Eye tracking Eye camera Scene camera Allopenna, Magnuson & Tanenhaus (1998) “Pick up the beaker” “Pick up the speaker” (RHYME COMPETITOR!) Cohort vs. Trace
TRACE predictions match eye- tracking data Adapted from Jim Magnuson, “Interaction in language processing: Pragmatic constraints on lexical access” Cohort vs. Trace
Cohort vs. Trace? Is there lateral inhibition? Is there lateral inhibition? Old Cohort Model: units compete, but don’t necessarily have inhibition built in. Old Cohort Model: units compete, but don’t necessarily have inhibition built in. TRACE: within level, units compete and inhibit each other. TRACE: within level, units compete and inhibit each other. jog job Cohort vs. Trace
Marslen-Wilson & Warren (1994) job + = jog jo (g) b + = jobjod jo (d) b + = (Nonword + Word) (Word + Word) FAST MEDIUM SLOW!!! TRACE Predictions Auditory Lexical Decision on spliced & recombined sound waves. Auditory Lexical Decision on spliced & recombined sound waves. jog job jo (g) Cohort vs. Trace
Marslen-Wilson & Warren (1994) Found & equally slow, and slower than. No lateral inhibition. Found jo (g) b & jo (d) b equally slow, and slower than job. No lateral inhibition. job + = jog jo (g) b + = jobjod jo (d) b + = (Nonword + Word) (Word + Word) FAST MEDIUM SLOW TRACE Predictions Auditory Lexical Decision on spliced & recombined sound waves. Auditory Lexical Decision on spliced & recombined sound waves. Cohort vs. Trace
Let’s try a more natural & sensitive measure! ne (t) t ne (k) t ne (p) t Pick up the Dahan, Magnuson, Tanenhaus & Hogan (2001) net + = neck ne (k) t + = netnep ne (p) t + = Cohort vs. Trace
beaglebeadbeastcamerabeak bellneck net ring lobster Prediction: Delayed target looks to the net with compared to NE (k) T compared to NE (p) T Predictions Cohort vs. Trace
time since target onset (in ms) N3W1 W2W1 W1W1 ne (p) t ne (k) t Delayed look net ne (p) t ne (k) t Results Fixation Proportion 200 Cohort vs. Trace
Interim Summary Newer data are beginning to favor the TRACE model over the cohort model. Newer data are beginning to favor the TRACE model over the cohort model. Cohort model proposes that access stage is autonomous, but newer data suggests that there is continuous sensitivity to contextual information. Cohort model proposes that access stage is autonomous, but newer data suggests that there is continuous sensitivity to contextual information. Cohort vs. Trace