PSY 369: Psycholinguistics Language Comprehension Speech recognition.

Slides:



Advertisements
Similar presentations
Tom Lentz (slides Ivana Brasileiro)
Advertisements

Chapter 12: Speech and Music Perception
Plasticity, exemplars, and the perceptual equivalence of ‘defective’ and non-defective /r/ realisations Rachael-Anne Knight & Mark J. Jones.
Speech Perception Dynamics of Speech
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
1 CS 551/651: Structure of Spoken Language Spectrogram Reading: Stops John-Paul Hosom Fall 2010.
CS 551/651: Structure of Spoken Language Lecture 12: Tests of Human Speech Perception John-Paul Hosom Fall 2008.
Chapter 12 Speech Perception. Animals use sound to communicate in many ways Bird calls Bird calls Whale calls Whale calls Baboons shrieks Baboons shrieks.
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
Language Comprehension Speech Perception Semantic Processing & Naming Deficits.
Cognition, 8e by Margaret W. MatlinChapter 2 Cognition, 8e Chapter 2 Perceptual Processes I: Visual and Auditory Recognition.
SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
Speech perception 2 Perceptual organization of speech.
PSY 369: Psycholinguistics Language Comprehension: Perception of language.
Language Perception and Comprehension
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
PHONETICS AND PHONOLOGY
Auditory Word Recognition
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
Structure of Human Speech Chris Darwin Vocal Tract.
Exam 1 Monday, Tuesday, Wednesday next week WebCT testing centre Covers everything up to and including hearing (i.e. this lecture)
Cognitive Processes PSY 334 Chapter 2 – Perception April 9, 2003.
PSY 369: Psycholinguistics
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
PSY 369: Psycholinguistics Language Comprehension: Introduction & Perception of language.
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Sound and Speech. The vocal tract Figures from Graddol et al.
Phonetics, day 2 Oct 3, 2008 Phonetics 1.Experimental a. production b. perception 2. Surveys/Interviews.
Language Comprehension Speech Perception Naming Deficits.
PSY 369: Psycholinguistics Language Comprehension: Visual perception.
The Perception of Speech
Language Comprehension Speech Perception Meaning Representation.
Cognitive Processes PSY 334 Chapter 2 – Perception.
Phonetics HSSP Week 5.
Chapter 13: Speech Perception
Speech Perception. Phoneme - a basic unit of a speech sound that distinguishes one word from another Phonemes do not have meaning on their own but they.
Phonetics: the generation of speech Phonemes “The shortest segment of speech that, if changed, would change the meaning of a word.” hog fog log *Phonemes.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
? Speech Perception []
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition.
Speech Or can you hear me now?. Linguistic Parts of Speech Phone Phone Basic unit of speech sound Basic unit of speech sound Phoneme Phoneme Phone to.
Study Question. Compare and contrast acoustic and articulatory phonetics 10/24/2015 Language »Speech perception ◊Acoustic Phonetics ◊Articulatory Phonetics.
SPEECH PERCEPTION DAY 16 – OCT 2, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
SPEECH PERCEPTION DAY 18 – OCT 9, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
Sensation & Perception
Sounds and speech perception Productivity of language Speech sounds Speech perception Integration of information.
Acoustic Continua and Phonetic Categories Frequency - Tones.
PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition.
Chapter 13: Speech Perception. The Acoustic Signal Produced by air that is pushed up from the lungs through the vocal cords and into the vocal tract Vowels.
Introduction to Digital Speech Processing Presented by Dr. Allam Mousa 1 An Najah National University SP_1_intro.
Language I: Structure Defining language: symbolic, rule-based system of communication shared by a community Linguistics: study of language Psycholinguistics:
Language Perception.
Motor Theory + Signal Detection Theory
WebCT You will find a link to WebCT under the “Current Students” heading on It is your responsibility to know how to work WebCT!
Psych 156A/ Ling 150: Psychology of Language Learning Lecture 3 Sounds I.
Transitions + Perception March 25, 2010 Tidbits Mystery spectrogram #3 is now up and ready for review! Final project ideas.
Motor Theory of Perception March 29, 2012 Tidbits First: Guidelines for the final project report So far, I have two people who want to present their.
Speech in the DHH Classroom A new perspective. Speech in the DHH Bilingual Classroom Important to look beyond the traditional view of speech Think of.
Chapter 11 Language. Some Questions to Consider How do we understand individual words, and how are words combined to create sentences? How can we understand.
Solve this maze at your leisure. Start at phil’s house. At first, you can only make right turns through the maze. Each time you cross the red zigzag sign.
PHONETICS AND PHONOLOGY
Cognitive Processes PSY 334
Speech Perception Models
Cognitive Processes PSY 334
The lexical/phonetic interface: Evidence for gradient effects of within-category VOT on lexical access Bob McMurray Richard N. Aslin Mickey K. TanenMouse.
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Language Comprehension
Speech Perception (acoustic cues)
Topic: Language perception
Presentation transcript:

PSY 369: Psycholinguistics Language Comprehension Speech recognition

Announcements Homeworks Extended due date for Homework 5 to Thursday Posted Homework 6 (due Apr 3) Briefly go over that at start of class Cut the number of homework from 11 down to 8 still will drop lowest grade, so top 7 count) So class “total” will be out of 925 points instead of 1000 Hope to have Hwks 7 & 8 in there soon (one speech error collection, 1 journal summary), they’ll be for after exam 3

Different features than visual Visual word recognitionSpeech Perception Some parallel input Orthography Letters Clear delineation Difficult to learn Serial input Phonetics/Phonology Acoustic features Usually no delineation “Easy” to learn Where are you going

Acoustic features Spectrogram Time on the x-axis Amplitude is represented by the darkness of the lines Frequency (pressure under which the air is pushed) on the y-axis

Acoustic features <-- F 2 <-- F 3 Burst --> Formants - bands of resonant frequencies Formant transitions - up or down movement of formants Steady states - flat formant patterns Bursts - sudden release of air

40 ms 5 ms bitpit Formants - bands of resonant frequencies Formant transitions - up or down movement of formants Steady states - flat formant patterns Bursts - sudden release of air Voice onset time (VOT) - when the voicing begins relative to the onset of the phoneme Acoustic features

The confusion of palatalized labials > dentals & alveolars [ bA bJA dA ] What looks similar to the eye will probably seem similar to the ear!

Hard problems: Ambiguity in speech signal Chest Jew Wade Aim In It Just you wait a minute Delights Haven Dime Daylight Savings Time Canoes He Wad Ice He Can You See What I See? Free Quaintly As Quest Shuns Frequently Asked Questions TV ad

Hard Problems in Speech Perception Segmentation problem Lack of Invariance Linearity (parallel transmission) Co-articulation Trading relations

Hard Problems in Speech Perception Segmentation problem: Unlike visual input, the acoustic input is not physically segmented Segmentation problem Illusion of silence. There are no silent gaps in the wave form, even though we may “hear” some.

Hard Problems in Speech Perception Segmentation problem: Unlike visual input, the acoustic input is not physically segmented Segmentation problem Here the silence that we see in the acoustics isn’t perceived as a gap in the word

Hard Problems in Speech Perception Lack of Invariance:Invariance One phoneme should have a one waveform This is not the case. The /i/ (‘ee’) in ‘money’ and ‘me’ are different Show me the money

Hard Problems in Speech Perception Lack of Invariance:Invariance One phoneme should have a one waveform Another example: Here is the phoneme /d/ followed by different vowels

Hard Problems in Speech Perception Lack of Invariance:Invariance One phoneme should have a one waveform And another. The phrase has five /t/ phonemes, but there are not 5 identical sweeps in the spectrogram There aren’t invariant cues for phonetic segments Although the search continues Peter buttered the burnt toast

Hard Problems in Speech Perception Linearity (parallel transmission): Acoustic features often spread themselves out over other sounds Linearity Where does show start and money end? Wave form Show me the money

Hard Problems in Speech Perception Co-articulation: the influence of the articulation (pronunciation) of one phoneme on that of another phoneme. Co-articulation Essentially, producing more than one speech sound at once May be helpful because it allows some parallel transmission of information (possibly helping predict what’s coming next) Each sound partially shaped by sounds before & after it keel vs kill vs cool / kil / vs / kIl / vs / kul / (IPA characters) place of articulation and rounding on the k differ a lot different versions of “the same sound” in different contexts from different speakers This is what allows us to talk so fast May be helpful because it allows some parallel transmission of information (possibly helping predict what’s coming next)

Hard Problems in Speech Perception Trading relations Most phonetic distinctions have more than one acoustic cue as a result of the particular articulatory gesture that gives the distinction. Voice-onset-time (VOT) Energy in burst Onset frequency of the first formant Placement in syllable e.g., slit–split – the /p/ relies on silence and rising formant, different mixtures of these can result in the same perception Perception must establish some "trade-off" between the different cues.

Hard Problems in Speech Perception Many factors that may be important Acoustic Information Visual information Prosodic information Semantic context Syntactic structure Top-down UNDERSTANDING Bottom-up

Using Visual information McGurk effect The McGurk effect: McGurk and MacDonald (1976) Showed people a video where the audio and the video don’t match ( Think “dubbed movie”) Visual /ga/ with auditory /ba/ often hear /da/ Implications Phoneme perception is an active process Influenced by both audio and visual information

Beyond the segment Prosodic factors (supra segmentals) English: Speech is divided into phrases. Every phrase has a focus. Word stress is meaningful in English. Stressed syllables are aligned in a fairly regular rhythm, while unstressed syllables take very little time. An extended flat or low-rising intonation at the end of a phrase can indicate that a speaker intends to continue to speak. A falling intonation sounds more final.

Beyond the segment Prosodic factors (supra segmentals) Stress Emphasis on syllables in sentences On meaning “black bird” versus “blackbird” Top-down effects on perception Better anticipation of upcoming segments when syllable is stressed Rate Speed of articulation: Faster talking - shorter vowels, shorter VOT Normalization: taking the speaker’s rate into account Intonation Use of pitch to signify different meanings across sentences

Top-down effects on Speech Perception Sentence context effects Excised speech Sentence context effects Phoneme restoration effect Top-down UNDERSTANDING Bottom-up

Excised Speech Syntactic and semantic cues can help Pollack & Pickett (1964) Task: Recorded conversations and excised individual words. Presented the words to listeners for identification Within context Out of context Results: Words out of context were only recognized 47% of time, identification was greatly improved with context Suggests that clarity in speech reflects processing (top-down as well as bottom-up)

Semantic Influences Garnes & Bond (1976): 16 tokens, spanning the spectrum of bait-date-gate (/b/ /d/ /g/) So some were clear examples (unambiguous), others in between (ambiguous) 3 carrier sentences (context): Here’s the fishing gear and the ______. Check the time and the _______. Paint the fence and the _______. Results If unambiguous, get semantically implausible sentences (Paint the fence and the bait.) If ambiguous (near a phoneme boundary), semantic context effects – interpreted the word as contextually appropriate

Phoneme restoration effect Task: Listen to a sentence which contained a word from which a phoneme was deleted and replaced with another noise (e.g., a cough) The state governors met with their respective legi*latures convening in the capital city. * /s/ deleted and replaced with a cough Click here for a demo and additional information Warren (1970) Results: Participants heard the word normally, despite the missing phoneme Usually failed to identify which phoneme was missing Interpretation: We can use top-down knowledge to “fill in” the missing information

Phoneme restoration effect Warren and Warren (1970) What if the missing phoneme was ambiguous? The *eel was on the axle. Results: Participants heard the contextually appropriate word normally, despite the missing phoneme The *eel was on the shoe. The *eel was on the orange. The *eel was on the table.

Phoneme restoration effect Possible loci of phoneme restoration effects Perceptual loci of effect: Lexical or sentential context influences the way in which the word is initially perceived. Post-perceptual loci of effect: Lexical or sentential context influences decisions about the nature of the missing phoneme information. Samuel (2001) attempts to look at this issue

Cross-modal priming Shillcock (1990) hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming) The scientist made a new discovery last year. Hear: NUDIST

Cross-modal priming The scientist made a novel discovery last year. Hear: Shillcock (1990) hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming) NUDIST

Cross-modal priming The scientist made a novel discovery last year. Hear: The scientist made a new discovery last year.faster Shillcock (1990) hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming) NUDIST

Cross-modal priming The scientist made a novel discovery last year. Hear: NUDIST gets primed by segmentation error faster Although no conscious report of hearing “nudist” The scientist made a new discovery last year. Shillcock (1990) hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming)

Theories of speech perception Motor Theory Direct Realist Theory General Auditory Approach Cohort TRACE Model

Motor theory of speech perception A. Liberman (initially proposed in late 50s, recent Liberman & Mattingly, 1985) Direct translation of acoustic speech into articulatory categories Holds that speech perception and motor control involved linked (or the same) neural processes Theory held that categorical perception was a direct reflection of articulatory organization Categories with discrete gestures (e.g., consonants) will be perceived categorically Categories with continuous gestures (e.g., vowels) will be perceived continuously There is a speech perception module that operates independently of general auditory perception

Frontal slices showing differential activation elicited during lip and tongue movements (Left), syllable articulation including [p] and [t] (Center), and listening to syllables including [p] and [t] (Right) Pulvermüller F et al. PNAS 2006;103: ©2006 by National Academy of Sciences Speech Perception & the brain

Motor theory of speech perception Some problems for MT Categorical perception found in non-speech sounds (e.g., music) Categorical perception for speech sounds in non-humans Chinchillas can be trained to show categorical perception of /t/ and /d/ consonant-vowel syllables (Kuhl & Miller, 1975)

Other theories of speech perception Direct Realist Theory (C. Fowler and others) Direct Realist Theory Similar to Motor theory, articulation representations are key, but here they are directly perceived (related to Gibson’s perceptual theory) Perceiving speech is part of a more general perception of gestures that involves the motor system General Auditory Approach (e.g., Diehl, Massaro) Do not invoke special mechanisms for speech perception, instead rely on more general mechanisms of audition and perception For nice reviews see: Diehl, Lotto, & Holt (2003) Galantucci, Fowler, Turvey (2006)

Other theories of spoken word rec. Cohort Model (Marslen-Wilson & Welsh, 1978; Discussed last time) 1) The acoustic information at the beginning of a word activates a “cohort” of possible words 2) Syntax and semantics influence the selection of the target word from the cohort TRACE Model ( Elman and McClelland 1984, 1986) TRACE Model Connectionist, parallel distributed model Processing occurs through excitatory and inhibitory connections – in processing units called nodes 3 levels of nodes: features, phonemes, and words all highly interconnected