PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition.

Slides:



Advertisements
Similar presentations
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Advertisements

Language and Psychology. The comprehension of sounds The comprehension of words The comprehension of sentences.
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
Language Comprehension Speech Perception Semantic Processing & Naming Deficits.
SPEECH PERCEPTION 2 DAY 17 – OCT 4, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
The Neuroscience of Language. What is language? What is it for? Rapid efficient communication – (as such, other kinds of communication might be called.
Speech perception 2 Perceptual organization of speech.
9/22/10Psyc / Ling / Comm 525 Fall 10 Semantic Priming (Phenomenon & Tool)...armkitchentree Related prime >doctoractor < Unrelated prime nurse floor...
PSY 369: Psycholinguistics Language Comprehension: Perception of language.
Language Perception and Comprehension
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
PHONETICS AND PHONOLOGY
The mental lexicon LG 103 Introduction to psycholinguistics Celia (Vasiliki) Antoniou.
Auditory Word Recognition
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
Structure of Human Speech Chris Darwin Vocal Tract.
Exam 1 Monday, Tuesday, Wednesday next week WebCT testing centre Covers everything up to and including hearing (i.e. this lecture)
Language. Using Language What is language for? Using Language What is language for? – Rapid, efficient communication To accomplish this goal, what needs.
Cognitive Processes PSY 334 Chapter 2 – Perception April 9, 2003.
PSY 369: Psycholinguistics Representing language.
PSY 369: Psycholinguistics
Mental Lexicon Body of knowledge we hold in our minds about words Includes pronunciation, spelling, meaning syntactic roles Recognition of words—whether.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
PSY 369: Psycholinguistics Language Comprehension: Introduction & Perception of language.
Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Sound and Speech. The vocal tract Figures from Graddol et al.
Reading. Reading Research Processes involved in reading –Orthography (the spelling of words) –Phonology (the sound of words) –Word meaning –Syntax –Higher-level.
PSY 369: Psycholinguistics Representing language Part II: Semantic Networks & Lexical Access.
Language Comprehension Speech Perception Naming Deficits.
PSY 369: Psycholinguistics Language Comprehension: Visual perception.
The Perception of Speech
Psycholinguistics 05 Internal Lexicon.
Language Comprehension Speech Perception Meaning Representation.
Cognitive Processes PSY 334 Chapter 2 – Perception.
PSY 369: Psycholinguistics Lexical selection Lexical access How do we retrieve the linguistic information from Long-term memory? What factors are involved.
SPEECH RECOGNITION LEXICON DAY 19 – OCT 9, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
Amira Al Harbi.  Psycholinguistics is concerned with language and the brain.  To be a perfect psycholinguistist, you would need to have a comprehensive.
Speech Perception. Phoneme - a basic unit of a speech sound that distinguishes one word from another Phonemes do not have meaning on their own but they.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition.
Lexicon Organization: How are words stored? Atomist view  Words are stored in their full inflected form  talk –> talk  talked –> talked  toothbrush.
Speech Perception 4/4/00.
Cognitive Processes PSY 334
Study Question. Compare and contrast acoustic and articulatory phonetics 10/24/2015 Language »Speech perception ◊Acoustic Phonetics ◊Articulatory Phonetics.
SPEECH PERCEPTION DAY 16 – OCT 2, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
SPEECH PERCEPTION DAY 18 – OCT 9, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
PSY 369: Psycholinguistics Language Comprehension Speech recognition.
Sensation & Perception
Sounds and speech perception Productivity of language Speech sounds Speech perception Integration of information.
PSY 369: Psycholinguistics Representing language Part II: Semantic Networks & Lexical Access.
PSY 369: Psycholinguistics
Chapter 13: Speech Perception. The Acoustic Signal Produced by air that is pushed up from the lungs through the vocal cords and into the vocal tract Vowels.
Language Perception.
WebCT You will find a link to WebCT under the “Current Students” heading on It is your responsibility to know how to work WebCT!
Understanding Spoken Speech ● Bottom up is minimal for humans, but crucial for computers – voice recognition.
Transitions + Perception March 25, 2010 Tidbits Mystery spectrogram #3 is now up and ready for review! Final project ideas.
Chapter 11 Language. Some Questions to Consider How do we understand individual words, and how are words combined to create sentences? How can we understand.
Solve this maze at your leisure. Start at phil’s house. At first, you can only make right turns through the maze. Each time you cross the red zigzag sign.
VISUAL WORD RECOGNITION. What is Word Recognition? Features, letters & word interactions Interactive Activation Model Lexical and Sublexical Approach.
Recognizing Visual and Auditory Stimuli
Cognitive Processes PSY 334
Cognitive Processes in SLL and Bilinguals:
Phonological Priming and Lexical Access in Spoken Word Recognition
Speech Perception Models
Cognitive Processes PSY 334
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Phonological Priming and Lexical Access in Spoken Word Recognition
Topic: Language perception
Presentation transcript:

PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Lexicon Semantic Analysis Syntactic Analysis Word Recognition Letter/phoneme Recognition Formulator Grammatical Encoding Phonological Encoding Articulator Conceptualizer Thought Comprehension Production Lexical Access

Lexical access How do we retrieve the linguistic information from Long-term memory? How is the information organized/stored? What factors are involved in retrieving information from the lexicon? Models of lexical access

Lexical access How do we retrieve the linguistic information from Long-term memory? How is the information organized/stored? What factors are involved in retrieving information from the lexicon? Models of lexical access

Lexical access Factors affecting lexical access Morphological structure Phonological structure Concretness/abstractne ss Imageability Frequency Semantic priming Role of prior context Lexical ambiguity Grammatical class Some of these may reflect the structure of the lexicon Some may reflect the processes of access from the lexicon

Role of prior context Listen to short paragraph. At some point during the paragraph a string of letters will appear on the screen. Decide if it is an English word or not. Say ‘yes’ or ‘no’ as quickly as you can. Swinney (1979): Cross Modal Priming Task:

Role of prior context ant “Rumor had it that, for years, the government building has been plagued with problems. The man was not surprised when he found several spiders, roaches and other bugs in the corner of his room.” Lexical decision: button press YNY Swinney (1979)

Role of prior context Swinney (1979) Lexical Decision task Context related:ant Context inappropriate:spy Context unrelated:sew Results and conclusions Within 400 msecs of hearing "bugs", both ant and spy are primed After 700 msecs, only ant is primed “Rumor had it that, for years, the government building has been plagued with problems. The man was not surprised when he found several spiders, roaches and other bugs in the corner of his room.” faster 400 msec faster No difference 700 msec

Lexical ambiguity Hogaboam and Pefetti (1975) Words can have multiple interpretations The role of frequency of meaning Task, is the last word ambiguous? The jealous husband read the letter (dominant meaning) The antique typewriter was missing a letter (subordinate meaning) Results: Participants are faster on the second sentence. The results may seem counterintuitive The task is the key, “is the final word ambiguous” In the first sentence, the meaning is dominant and the context strongly biases that meaning. So the second meaning may not be around, which in turn makes the it harder to make the ambiguity judgment in the first sentence

Lexical access How do we retrieve the linguistic information from Long-term memory? How is the information organized/stored? What factors are involved in retrieving information from the lexicon? Models of lexical access

Lexical access How do we retrieve the linguistic information from Long-term memory? How is the information organized/stored? What factors are involved in retrieving information from the lexicon? Models of lexical access Lexicon Semantic Analysis Syntactic Analysis Word Recognition Letter/phoneme Recognition Formulator Grammatical Encoding Phonological Encoding Articulator Conceptualizer Thought Lexical Access

(some) Models of lexical access Serial comparison models Search model (Forster, 1976, 1979, 1987, 1989) Parallel comparison models Logogen model (Morton, 1969) Cohort model (Marslen-Wilson, 1987, 1990) Connectionist models Interactive Activation Model (McClelland and Rumelhart, 1981) Important contrasting features: (1) interactive or autonomous, (2) serial or parallel, (3) cascaded or discrete

Search model (e.g., Forster, 1976) Access of the lexicon is considered autonomous, independent of other systems involved in processing language A complete perceptual representation of the perceived stimulus is constructed The representation is compared with representations in access files Three access files: Orthographic Phonological Syntactic/semantic (for language production) Access files are organized in a series of bins (first syllable or letters) Position within the bins is organized by lexical frequency Access files have “pointers” to meaning information in semantic memory

Entries in order of Decreasing frequency Visual input cat Auditory input /kat/ Access codes Pointers matcatmouse Mental lexicon Search model (e.g., Forster, 1976)

Frequency effects Bin organization Repetition priming effects Temporary reordering of bins in response to recent encounter Semantic priming effects Accounted for by cross referencing in the lexicon Context effects Search is considered to be autonomous, un affected by context (so context effects are “post-access”) Search model (Forster, 1976, 1979, 1987, 1989, 1994) Entries in order of Decreasing frequency Visual input cat Auditory input /kat/ Access codes Pointer s matcatmouse Mental lexicon

Logogen model (Morton 1969) Logogens specify word’s attributes e.g., semantic, orthographic, phonological Activated in two ways By sensory input By contextual information Access (recognition) when reach threshold Different thresholds depending on different factors e.g., frequency Access makes information associated with word available catdogcap The lexical entry for each word comes with a logogen

Logogen model (Morton 1969) Auditory stimuli Visual stimuli Auditory analysis Visual analysis Logogen system Output buffer Context system Responses Available Responses Semantic Attributes catdogcap

Think of a logogen as being like a ‘strength-o-meter’ at a fairground When the bell rings, the logogen has ‘fired’

‘cat’ [kæt] What makes the logogen fire? – seeing/hearing the word What happens once the logogen has fired? – access to lexical entry!

– High frequency words have a lower threshold for firing – e.g., cat vs. cot ‘cat’ [kæt] So how does this help us to explain the frequency effect? ‘cot’ [kot] Low freq takes longer

Spreading activation from doctor lowers the threshold for nurse to fire – So nurse take less time to fire ‘nurse’ [n ə :s] ‘doctor’ [dokt ə ] nurse doctor Spreading activation network doctornurse

Interactive Activation Model (IAM) McClelland and Rumelhart, (1981) Proposed to account for Word Superiority effect Also goes by the name: Interactive Activation and Competition Model (IAC)

+ Until the participant hits some start key The Word-Superiority Effect (Reicher, 1969)

COURSE Presented briefly … say 25 ms The Word-Superiority Effect (Reicher, 1969)

U &&&&& A Mask presented with alternatives above and below the target letter … participants must pick one as the letter they believe was presented in that position. The Word-Superiority Effect (Reicher, 1969)

+ E E & T + PLANE E &&&&& T + KLANE E &&&&& T Letter only Say 60% Letter in Nonword Say 65% Letter in Word Say 80% Why is identification better when a letter is presented in a word?

Interactive Activation Model (IAM) McClelland and Rumelhart, (1981) Nodes: (visual) feature (positional) letter word detectors Inhibitory and excitatory connections between them. Previous models posed a bottom-up flow of information (from features to letters to words). IAM also poses a top- down flow of information Also goes by the name: Interactive Activation and Competition Model (IAC)

Inhibitory connections within levels If the first letter of a word is “a”, it isn’t “b” or “c” or … Inhibitory and excitatory connections between levels (bottom-up and top-down) If the first letter is “a” the word could be “apple” or “ant” or …., but not “book” or “church” or…… If there is growing evidence that the word is “apple” that evidence confirms that the first letter is “a”, and not “b”….. Interactive Activation Model (IAM) McClelland and Rumelhart, (1981)

IAM & the word superiority effect The model processes at the word and letter levels simultaneously Letters in words benefit from bottom-up and top-down activation But letters alone receive only bottom-up activation. McClelland and Rumelhart, (1981)

Cohort model (Marslen-Wilson & Welch, 1978) Specifically for auditory word recognition (covered in chapter 9 of textbook) Speakers can recognize a word very rapidly Usually within msec Recognition point (uniqueness point) Point at which a word is unambiguously different from other words and can be recognized (strong emphasis on word onsets) Three stages of word recognition 1) activate a set of possible candidates 2) narrow the search to one candidate 3) integrate single candidate into semantic and syntactic context

Cohort model Prior context: “I took the car for a …” /s//sp//spi//spin/ … soap spinach psychologist spin spit sun spank … spinach spin spit spank … spinach spin spit … spin time

Comparing the models Each model can account for major findings (e.g., frequency, semantic priming, context), but they do so in different ways. Search model is serial, bottom-up, and autonomous Logogen is parallel and interactive (information flows up and down) AIM is both bottom-up and top-down, uses facilitation and inhibition Cohort is bottom-up but parallel initially, but then interactive at a later stage

Different signals Visual word recognitionSpeech Perception Some parallel input Orthography Letters Clear delineation Difficult to learn Serial input Phonetics/Phonology Acoustic features Usually no delineation “Easy” to learn Where are you going

Different signals Visual word recognitionSpeech Perception Some parallel input Orthography Letters Clear delineation Difficult to learn Serial input Phonetics/Phonology Acoustic features Usually no delineation “Easy” to learn Where are you going

Speech perception Articulatory phonetics Production based Place and manner of articulation Placemanner Acoustic phonetics Based on the acoustic signal Formants, transitions, co-articulation, etc.

Acoustic cues are extracted and stored in sensory memory and then mapped onto linguistic information Acoustic cues Air is pushed into the larynx across the vocal cords and into the mouth nose, different types of sounds are produced. The different qualities of the sounds are represented in formants The formants and other features are mapped onto phonemes Speech production to perception

Acoustic features Spectrogram Time on the x-axis Frequency (pressure under which the air is pushed) on the y-axis Amplitude is represented by the darkness of the lines time Frequency

Acoustic features Formants - bands of resonant frequencies Formants Formant transitions - up or down movement of formants Steady states - flat formant patterns Bursts - sudden release of air Voice onset time (VOT) - when the voicing begins relative to the onset of the phoneme Voice onset time

Formants in a wide-band spectrogram <-- F1 <-- F 2 <-- F 3 Formants - bands of resonant frequencies Formants Formant transitions - up or down movement of formants Steady states - flat formant patterns

Formants in a wide-band spectrogram <-- F1 <-- F 2 <-- F 3 Burst --> Bursts – sudden release of air

Voice-Onset Time (VOT) 40 ms 5 ms bitpit

Categorical Perception Categorical Perception is the perception of different sensory phenomena as being qualitatively, or categorically, different. Categorical Perception Liberman et al (1957) Used the speech synthesizer to create a series of syllables panning categories /b/, /d/, /g/ (followed by /a/) Was done by manipulating the F2 formant Stimuli formed a physical continuum Result, people didn’t “hear” a continuum, instead classified them into three categories

Categorical Perception 1. Set up a continuum of sounds between two categories …5…7 /ba/ - /da/ Liberman et al (1957)

2. Run an identification experiment … 5 … 7 % /ba/ Sharp phoneme boundary Categorical Perception Liberman et al (1957) Our perception of phonemes is categorical rather than continuous.

Hard Problems in Speech Perception Linearity (parallel transmission): Acoustic features often spread themselves out over other sounds Linearity Where does show start and money end? Wave form Demo's and info

Hard Problems in Speech Perception Invariance: Invariance One phoneme should have a one waveform But, the /i/ (‘ee’) in ‘money’ and ‘me’ are different There aren’t invariant cues for phonetic segments Although the search continues Wave form Demo's and info

Hard Problems in Speech Perception Co-articulation: the influence of the articulation (pronunciation) of one phoneme on that of another phoneme. Co-articulation Essentially, producing more than one speech sound at once Demo's and info Wave form

Hard Problems in Speech Perception Trading relations Most phonetic distinctions have more than one acoustic cue as a result of the particular articulatory gesture that gives the distinction. slit–split – the /p/ relies on silence and rising formant, different mixtures of these can result in the same perception Perception must establish some "trade-off" between the different cues.

Hard Problems in Speech Perception McGurk effect McGurk effect2 The McGurk effect: McGurk and MacDonald (1976) Showed people a video where the audio and the video don’t match Think “dubbed movie” visual /ga/ with auditory /ba/ often hear /da/ Implications phoneme perception is an active process influenced by both audio and visual information

Motor theory of speech perception A. Liberman (and others, initially proposed in late 50’s) Direct translation of acoustic speech into articulatorally defined categories Holds that speech perception and motor control involved linked (or the same) neural processes Theory held that categorical perception was a direct reflection of articulatory organization Categories with discrete gestures (e.g., consonants) will be perceived categorically Categories with continuous gestures (e.g., vowels) will be perceived continuously There is a speech perception module that operates independently of general auditory perception

Frontal slices showing differential activation elicited during lip and tongue movements (Left), syllable articulation including [p] and [t] (Center), and listening to syllables including [p] and [t] (Right) Pulvermüller F et al. PNAS 2006;103: ©2006 by National Academy of Sciences

Motor theory of speech perception Some problems for MT Categorical perception found in non-speech sounds (e.g., music) Categorical perception for speech sounds in non-humans Chinchillas can be trained to show categorical perception of /t/ and /d/ consonant-vowel syllables (Kuhl & Miller, 1975)

Other theories of speech perception Direct Realist Theory (C. Fowler and others) Similar to Motor theory, articulation representations are key, but here they are directly perceived Perceiving speech is part of a more general perception of gestures that involves the motor system General Auditory Approach (e.g., Diehl, Massaro) Do not invoke special mechanisms for speech perception, instead rely on more general mechanisms of audition and perception For nice reviews see: Diehl, Lotto, & Holt (2003) Galantucci, Fowler, Turvey (2006)

Top-down effects on Speech Perception Phoneme restoration effect Sentence context effects

Phoneme restoration effect Listen to a sentence which contained a word from which a phoneme was deleted and replaced with another noise (e.g., a cough) The state governors met with their respective legi*latures convening in the capital city. * /s/ deleted and replaced with a cough Click here for a demo and additional information

Phoneme restoration effect Typical results: Participants heard the word normally, despite the missing phoneme Usually failed to identify which phoneme was missing Interpretation We can use top-down knowledge to “fill in” the missing information

Phoneme restoration effect Further experiments (Warren and Warren, 1970): What if the missing phoneme was ambiguous The *eel was on the axle. Results: Participants heard the contextually appropriate word normally, despite the missing phoneme The *eel was on the shoe. The *eel was on the orange. The *eel was on the table.

Phoneme restoration effect Possible loci of phoneme restoration effects Perceptual loci of effect: Lexical or sentential context influences the way in which the word is initially perceived. Post-perceptual loci of effect: Lexical or sentential context influences decisions about the nature of the missing phoneme information.

Beyond the segment Shillcock (1990): hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming) The scientist made a new discovery last year. Hear: NUDIST

Cross-modal priming The scientist made a novel discovery last year. Hear: Shillcock (1990): hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming) NUDIST

Cross-modal priming The scientist made a novel discovery last year. Hear: Shillcock (1990): hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming) The scientist made a new discovery last year.faster

Cross-modal priming The scientist made a novel discovery last year. Hear: Shillcock (1990): hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming) NUDIST gets primed by segmentation error faster Although no conscious report of hearing “nudist” The scientist made a new discovery last year.

Beyond the segment Prosody and intonation English: Speech is divided into phrases. Word stress is meaningful in English. Stressed syllables are aligned in a fairly regular rhythm, while unstressed syllables take very little time. Every phrase has a focus. An extended flat or low-rising intonation at the end of a phrase can indicate that a speaker intends to continue to speak. A falling intonation sounds more final.

Beyond the segment Prosodic factors (supra segmentals) Stress Emphasis on syllables in sentences Rate Speed of articulation Intonation Use of pitch to signify different meanings across sentences

Beyond the segment Stress effects On meaning “black bird” versus “blackbird” Top-down effects on perception Better anticipation of upcoming segments when syllable is stressed

Beyond the segment Rate effects How fast you speak has an impact on the speech sounds Faster talking - shorter vowels, shorter VOT Normalization Taking speed and speaker information into account Rate normalization Speaker normalization