Speech Science XII Speech Perception (acoustic cues) Version 2007-8.

Slides:



Advertisements
Similar presentations
Normal Aspects of Articulation. Definitions Phonetics Phonology Articulatory phonetics Acoustic phonetics Speech perception Phonemic transcription Phonetic.
Advertisements

Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew
Sounds that “move” Diphthongs, glides and liquids.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Basic Spectrogram & Clinical Application: Consonants
Acoustic Characteristics of Consonants
Speech Perception Dynamics of Speech
Vowel Formants in a Spectogram Nural Akbayir, Kim Brodziak, Sabuha Erdogan.
1 CS 551/651: Structure of Spoken Language Lecture 4: Characteristics of Manner of Articulation John-Paul Hosom Fall 2008.
JPN494: Japanese Language and Linguistics JPN543: Advanced Japanese Language and Linguistics Phonology & Phonetics (2)
1 CS 551/651: Structure of Spoken Language Spectrogram Reading: Stops John-Paul Hosom Fall 2010.
Nasal Stops.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Suprasegmentals The term suprasegmental refers to those properties of an utterance which aren't properties of any single segment. The following are usually.
Digital Systems: Hardware Organization and Design
Sound Chapter 15.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
ACOUSTICAL THEORY OF SPEECH PRODUCTION
Phonetics (Part 1) Dr. Ansa Hameed.
Unit 4 Articulation I.The Stops II.The Fricatives III.The Affricates IV.The Nasals.
A.Diederich– International University Bremen – Sensation and Perception – Fall Frequency Analysis in the Cochlea and Auditory Nerve cont'd The Perception.
Spectrogram & its reading
Speech Perception Richard Wright Linguistics 453.
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Nasal Stops. Nasals Distinct vocal tract configuration Pharyngeal cavity Oral cavity (closed) Nasal cavity (open)
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
An important point… When discussing source-filter theory, the sound source was the glottal spectrum When discussing stops (and fricatives and affricates),
The articulation behind the acoustics
Speech Perception. Phoneme - a basic unit of a speech sound that distinguishes one word from another Phonemes do not have meaning on their own but they.
Speech Production1 Articulation and Resonance Vocal tract as resonating body and sound source. Acoustic theory of vowel production.
Audio Scene Analysis and Music Cognitive Elements of Music Listening
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Speech Science Fall 2009 Oct 26, Consonants Resonant Consonants They are produced in a similar way as vowels i.e., filtering the complex wave produced.
Speech Science VII Acoustic Structure of Speech Sounds WS
LING 001 Introduction to Linguistics Fall 2010 Sound Structure I: Phonetics Acoustic phonetics Jan. 27.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Say “blink” For each segment (phoneme) write a script using terms of the basic articulators that will say “blink.” Consider breathing, voicing, and controlling.
Articulation and Resonance
Glides, Place and Perception March 18, 2010 News The hard drive on the computer has been fixed! A couple of new readings have been posted to the course.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Chapter 7: Loudness and Pitch. Loudness (1) Auditory Sensitivity: Minimum audible pressure (MAP) and Minimum audible field (MAF) Equal loudness contours.
Transitions + Perception March 27, 2012 Tidbits First: Guidelines for the final project report So far, I have two people who want to present their projects.
Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp
Stops Stops include / p, b, t, d, k, g/ (and glottal stop)
SOUND PRESSURE, POWER AND LOUDNESS MUSICAL ACOUSTICS Science of Sound Chapter 6.
Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
Stop Acoustics and Glides December 2, 2013 Where Do We Go From Here? The Final Exam has been scheduled! Wednesday, December 18 th 8-10 am (!) Kinesiology.
Stop + Approximant Acoustics
Phonetics: consonants
Transitions + Perception March 25, 2010 Tidbits Mystery spectrogram #3 is now up and ready for review! Final project ideas.
1 Acoustic Phonetics 3/28/00. 2 Nasal Consonants Produced with nasal radiation of acoustic energy Sound energy is transmitted through the nasal cavity.
Acoustic Phonetics 3/14/00.
Stop/Plosives.
Stop Acoustics + Glides December 2, 2015 Down The Stretch They Come Today: Stop and Glide Acoustics Friday: Sonorant Acoustics + USRI evaluations We’ll.
SOUND PRESSURE, POWER AND LOUDNESS
Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.
PSYCHOACOUSTICS A branch of psychophysics
Precedence-based speech segregation in a virtual auditory environment
Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.
Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.
Structure of Spoken Language
English Phonetics and Phonology
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
The Vocal Pedagogy Workshop Session III – Articulation
Chapter 2 Phonology.
Speech Perception (acoustic cues)
Eugeniusz Cyran KUL, Lublin
Presentation transcript:

Speech Science XII Speech Perception (acoustic cues) Version

Topics  Psychoacoustics  Psychophonetics – acoustic cues Reading: BHR, chap. 6, (5th ed.) chaps. 9/10, 201 ff. P.-M., , first part. pp (2nd ed.) (1st ed.)

Psychoacoustics 1 Psychoacoustics investigates the relationship between basic (acoustic) signal properties and basic auditory impressions: - How loud something sounds. - How high- or low-pitched something sounds. - How long somethings sounds. - What the timbre (quality) of a sound is. The questions asked are: - Can the signal be heard? (signal strength) - Can differences between signals be heard? (for all signal properties)

Psychoacoustics 2 Important: Psychoacoustics relates the objective, measurable signal to subjective impressions. These are two different “worlds” The simplest “model” of psychoacoustic perception would be a linear relationship: - A change in a signal parameter always has an equivalent change in the auditory impression. This not the case (which makes psychoacoustics very complex ….) Some of the non-linearity has direct implications for phonetic understanding…..

A non-linear relationship: Loudness Signal strength inside ear Signal strength outside ear

The reason for non-linear loudness Resonance characteristics of the outer ear

Non-linearity above threshold Phon = dB at 1kHz So, e.g.: 80 Phons = 80 dB at 1 kHz but approx. 100 dB at 50 Hz.& 70 dB at 3.5 kHz

Also, sounds mask one another If noise is present, a tone has to be stronger to be heard (it has a higher audibility threshold). The closer the tone is in frequency to the centre frequency of the noise, the stronger it has to be to be heard! Intensity of pure tone (masked) stimuls (dB ) Intensity of masking noise

“Critical Bands” (Barks & Erbs) Wide-band noise with a gap still masks a tone in the middle of the gap … until the gap reaches a critical width. Then the signal is heard at the same threshold as if there were no noise. The noise no longer interferes with the part of the hearing mechanism dealing with the tone. These “critical bands” are narrow at low and broader at higher frequencies. strong masking no masking

Non-linearity of loudness with duration Above approx 300 ms (exact duration not certain) the perceived loudness of a sound is determined by signal strength (and frequency) independent of its duration. Below this duration, a shorter sound is heard as less loud than a longer sound of equal intensity. I.e., it is as if the energy is integrated over time, so that a shorter sound has less energy than a longer one. Phonetic importance? Short (unstressed) syllables are perceptually less prominent than longer (stressed) syllables.

„Psychophonetics“ Used here as a term to parallel “psychoacoustics”. In our definition, psychophonetics is the study of the relationship between the acoustic speech signal and functional aspects of speech – e.g., speech sounds, (stressed/unstressed) syllables, tonal accents, junctural phenomena etc. The experimental procedure typically requires changing the analytic properties of the acoustic speech signal in a controlled manner and recording the perceptual effect. The properties changed are those of acoustic analysis: duration, intensity, fundamental frequency and spectral structure.

„Acoustic Cues“ This term was coined in the 1950s, when synthesis and manipulation of the acoustic speech signal was starting. ( Origin: Haskins Laboratories, NJ, USA ) The „cues“ are those acoustic properties that can be shown to affect the perception of a speech sound. (so we have „acoustic cues“ for vowels and consonants, and within these categories for: e.g. voicing, manner, place of articulation in consonants, degree of opening, place, rounding etc. in vowels )

Acoustic cues – vowels 1 Cues: Formants 1 and 2 (to a first approximation) …. and the evidence from formant synthesis: rounded vowels lower F2 front vowels higher F2 open vowels higher F1

Acoustic cues - vowels 2 While monophthongs have a steady state formant structure, diphthongs – e.g. [ aI, aU,  I ] – and (vowel glide) approximants – e.g. [ j, w,  ] – have changing formants as a „cue“ to their identity. [ aI, aU,  I ] have a more or less fixed formant pattern, determined by the identity two vocalic elements which define them. [ j, w,  ] have a defined starting point, but the degree of formant change is determined by the following vowel. The starting point has a (slightly more damped) formant structure similar to the related vowel: [ j ]  [ i ]; [ w ]  [ u ]; [  ]  [ y ] (see acoustics slides)

Acoustic cues – plosives Plosives have a temporally complex set of acoustic cues resulting from (i) the closing movement, (ii) the closure phase and the (iii) release of the closure. The closure is a period with no energy (voiceless stops) or a weak low frequency periodic signal (voicing in the closure). This introduces a perceptible interruption. The release burst is the result of turbulence due to the escaping air from the increased intra-oral pressure built up during the closure. This may be relatively weak (in voiced stops) or strong (in voiceless stops). The different spectral properties of the burst noise signal the different places of articulation.

Release bursts and vowel quality

Vowel formant transitions as consonant cues Formant transitions (changing formant values in the vowel preceding and following the stop consonant) reflect the articulator movement towards and away from the closure. The F2 transition is a cue to the consonantal place of articulation; F1 just signals the opening and closing movement. The place of the stop determines the F2 formant value from which or towards which the transition moves (called the locus). But the actual shape of the transition is determined by the vowel (as it is with vowel glides).

Locus frequencies – e.g. [d] F1 rise = opening movement

What sort of transitions for which place? The previous slide showed that the locus for [d] (and – logically – for [t, n, l, s, z]) is fairly constant. The value (for the average adult male vocal tract) is about 1800 Hz. For labial consonants, the vowel can be formed independent of the consonant closure (the tongue is free to move). Both F2 and F1 therefore just reflect the opening and closing of the jaw and lips. The “locus” is therefore always low. For velar consonants, the consonant closure is very dependent on the vowel (both use the tongue dorsum). The locus is higher than for alveolars both for front and back vowels, but for back vowels it is lower than for front vowels. F2 and F3 transitions often converge with velars.

The importance of timing as a cue to the „voicing“ distinction The temporal differences shown here signal the difference between „weak“ and „strong“ plosives, whether there is closure voicing present or not. It is often claimed that the distinction “fortis-lenis” is better than “voiced-voiceless”

Acoustic cues - fricatives Fricative identity is determined by the spectral distribution of the energy (see also acoustics slides). [D][D] [T][T] [v][v] [f][f] [Z][Z] [S][S] [z][z][s][s]

Summary of cues - Manner

Summary of cues - Place

Summary of cues: Fortis-lenis voice bar