Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

Tom Lentz (slides Ivana Brasileiro)
Sounds that “move” Diphthongs, glides and liquids.
Perturbation Theory, part 2 November 4, 2014 Before I forget Course project report #3 is due! I have course project report #4 guidelines to hand out.
Basic Spectrogram & Clinical Application: Consonants
Acoustic Characteristics of Consonants
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Voice Onset Time (VOT) An Animated and Narrated Glossary of Terms used in Linguistics presents.
Acoustic Characteristics of Vowels
Vowels (again) February 23, 2010 The News For Thursday: Give me a (one paragraph or so) description of what you’re thinking of doing for a term project.
Periodicity and Pitch Importance of fine structure representation in hearing.
Room Acoustics: implications for speech reception and perception by hearing aid and cochlear implant users 2003 Arthur Boothroyd, Ph.D. Distinguished.
Ling 240: Language and Mind Acquisition of Phonology.
Speech perception 2 Perceptual organization of speech.
Development of Speech Perception. Issues in the development of speech perception Are the mechanisms peculiar to speech perception evident in young infants?
PHONETICS AND PHONOLOGY
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.
Speech Group INRIA Lorraine
Vowel Acoustics, part 2 November 14, 2012 The Master Plan Acoustics Homeworks are due! Today: Source/Filter Theory On Friday: Transcription of Quantity/More.
Identification and discrimination of the relative onset time of two component tones: Implications for voicing perception in stops David B. Pisoni ( )
Modelling the evolution of language for modellers and non-modellers EvoLang Vowel Systems Practical Example.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
Acoustic Continua and Phonetic Categories Frequency - Tones.
Auditory-acoustic relations and effects on language inventory Carrie Niziolek [carrien] may 2004.
1 Recent development in hearing aid technology Lena L N Wong Division of Speech & Hearing Sciences University of Hong Kong.
Representing Acoustic Information
Source/Filter Theory and Vowels February 4, 2010.
LE 460 L Acoustics and Experimental Phonetics L-13
Topics covered in this chapter
Sebastián-Gallés, N. & Bosch, L. (2009) Developmental shift in the discrimination of vowel contrasts in bilingual infants: is the distributional account.
Speech Perception. Phoneme - a basic unit of a speech sound that distinguishes one word from another Phonemes do not have meaning on their own but they.
CSD 5400 REHABILITATION PROCEDURES FOR THE HARD OF HEARING Auditory Perception of Speech and the Consequences of Hearing Loss.
Resonance, Revisited March 4, 2013 Leading Off… Project report #3 is due! Course Project #4 guidelines to hand out. Today: Resonance Before we get into.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
Vowels, part 4 March 19, 2014 Just So You Know Today: Source-Filter Theory For Friday: vowel transcription! Turkish, British English and New Zealand.
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
Perturbation Theory + Vowels (again) March 17, 2011.
Statistical learning, cross- constraints, and the acquisition of speech categories: a computational approach. Joseph Toscano & Bob McMurray Psychology.
Speech Enhancement Using Spectral Subtraction
ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Acoustic Cues to Laryngeal Contrasts in Hindi Susan Jackson and Stephen Winters University of Calgary Acoustics Week in Canada October 14,
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Studies of Information Coding in the Auditory Nerve Laurel H. Carney Syracuse University Institute for Sensory Research Departments of Biomedical & Chemical.
speech, played several metres from the listener in a room - seems to have the same phonetic content as when played nearby - that is, perception is constant.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
3308 First Language acquisition Acquisition of sounds Perception Sook Whan Cho Fall, 2012.
Fricatives, part 2 November 14, 2008 Who’s Next Today: some leftover notes on vowels Then: more fricatives Monday: fricative spectrogram matching.
Phonetic Context Effects Major Theories of Speech Perception Motor Theory: Specialized module (later version) represents speech sounds in terms of intended.
Vowels (yet again) February 25, 2010 Housekeeping Term project prospectuses? Today we’ll lay the groundwork for lab exercise #3. Due next Thursday. Wrap.
Hearing Research Center
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Acoustic Continua and Phonetic Categories Frequency - Tones.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
CSD 2230 INTRODUCTION TO HUMAN COMMUNICATION DISORDERS Normal Sound Perception, Speech Perception, and Auditory Characteristics at the Boundaries of the.
Infant Perception. William James, 1890 “The baby, assailed by eyes, ears, nose, skin and entrails all at once, feels it all as one great blooming, buzzing.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
Psych 156A/ Ling 150: Psychology of Language Learning Lecture 3 Sounds I.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
Perturbation Theory, part 2
Transitions + Perception March 25, 2010 Tidbits Mystery spectrogram #3 is now up and ready for review! Final project ideas.
PHONETICS AND PHONOLOGY
Speech and Singing Voice Enhancement via DNN
4aPPa32. How Susceptibility To Noise Varies Across Speech Frequencies
PATTERN COMPARISON TECHNIQUES
Cognitive Processes PSY 334
Jessica McKee Speech, Language and Hearing Sciences
Remember me? The number of times this happens in 1 second determines the frequency of the sound wave.
Speech Perception (acoustic cues)
Presentation transcript:

Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Two observations: Sound systems of natural languages underexploit the sound producing capabilities of humans. The sounds that are used in natural languages vary in frequency of occurrence.  /a/, /i/, /u/ are common; /š/, /õ/ are rare.  /p/, /t/ are common; />/, /q/ are rare.

Why are certain speech sounds favored? Possibilities: They are easy to hear (i.e., to distinguish from other sounds). They are easy to produce. They are easy to learn.

The role of auditory distinctiveness in the design of vowel inventories Liljencrants & Lindblom (1972) Diehl, Lindblom, & Creeger (2002, 2003)

Liljencrants & Lindblom (1972) Possible vowel sound: Any vowel-like output of a computational model of the human vocal tract (Lindblom & Sundberg, 1971). Auditory distance: Euclidean distance between any two vowel sounds i and j in a space defined by the frequencies of the first several formants: D ij = ((∆M 1 ) 2 + (∆M’ 2 ) 2 ) 1/2. Selection criterion: For any given inventory size, select those vowels whose pairwise distances, D ij, are maximal.

Predicted vowel systems (Liljencrants & Lindblom (1972)

A problem: Too many high vowels

These simulations were unrealistic in at least two ways: Acoustic distance (based on formant frequencies) is probably not a good proxy for auditory distance. Vowel sounds do not naturally occur in conditions of total quiet.

Improving the realism of the simulations (Diehl, Lindblom, and Creeger, 2002) Define a notion of ‘auditory distance’ based on plausible auditory representations of vowel sounds. Model vowel systems as they would have emerged under natural conditions of background noise.

From acoustic to auditory representations Hz to Bark * Input Output Auditory filtering

Computing distances among auditory spectra 1.At each point along the Bark dimension, calculate the difference in Phons/Bark between any vowel pair. 2.Square these differences. 3.Sum the squares. 4.Take the square root of the sum. This is a measure of the Euclidean auditory distance between two vowels.

Effects of the auditory transform Auditory-based System Formant-based System

Effects of the auditory transform Auditory-based System Formant-based System

Effects of the auditory transform The problem of excessive high vowels is reduced—but not eliminated.

Effects of adding background noise Hypothesis: Vowel systems have evolved to be perceptually robust even at unfavorable signal/noise ratios.

Method We used noise whose spectral shape mimicked the long-term average for speech (-6 dB/octave). We computed auditory distances among vowels at 8 different S/N ratios, ranging from 10 dB to -7.5 dB. We then averaged these distances to determine the optimal vowel systems.

QuietNoise Effects of adding background noise (3 vowel system)

Effects of adding background noise (5 vowel system) Quiet Noise

Effects of adding background noise (7 vowel system) Quiet Noise

Effects of adding background noise (9 vowel system) Quiet Noise

Comparisons with actual vowel inventories The reduction in the number of high vowels (relative to the Liljencrants & Lindblom simulations) yields a much better fit with actual vowel systems. Some fronting/unrounding of the high, back vowel /u/ also appears to be common among the world’s languages (e.g., Japanese and many other 5-vowel systems, American English).

Why does background noise reduce the number of high vowels? First formant information tends to be more noise-resistant than higher formant information. This warps the auditory-distance space for vowels: the front-back dimension contracts relative to the open-close dimension. This, in turn, leaves less room for high vowels.

More recent modeling (Diehl, Lindblom, and Creeger, 2003) By further improving the realism of our auditory model by incorporating temporal (phase locking) information as well as spectral (excitation pattern) information, we obtain predicted vowel systems that fairly closely match observed systems even without the presence of background noise.

Preferred vowel inventories are reasonably well predicted on the basis of a principle of maximal auditory contrast. What about preferred consonant inventories?

Voice distinctions Many languages distinguish certain consonants (e.g., /b/ vs /p/, /d/ vs /t/) based on the differences in voice onset time (VOT). This is the interval between the opening of the vocal tract and the onset of vocal fold vibration (voicing).

Voice categories across languages (Lisker & Abramson 1964)

Why do languages select from these three categories of VOT? One possibility: aerodynamic and biomechanical factors. Another possibility: enhanced discriminability at -20 ms VOT and +20 ms VOT yields robust perceptual distinctions between the three categories. Evidence: human infants, chinchillas, nonspeech analogs of VOT

Voice onset time and tone onset time Time (ms) -50 ms +50 ms Frequency A B Time (ms) Frequency

Discriminability of TOT stimuli

Are TOT categories that are consistent with the natural boundaries more learnable? (Holt, Lotto, and Diehl, JASA, 2004)

Summary of VOT results: Preferred voice categories are more discriminable than other possible voice categories. The results of Holt, Lotto, and Diehl (2004) suggest that they are also more learnable.

Conclusion Cross-language preferences in speech sound systems appear to reflect performance constraints on talkers, listeners, and language learners.

Unsolved problems Measuring articulatory energy costs Weighting contributions of auditory distinctiveness, least effort, and learnability Predicting variability