Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC.

Slides:

Advertisements

Similar presentations

Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements Christopher A. Shera, John J. Guinan, Jr., and Andrew J. Oxenham.

Advertisements

Vowel Formants in a Spectogram Nural Akbayir, Kim Brodziak, Sabuha Erdogan.

Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.

Hearing relative phases for two harmonic components D. Timothy Ives 1, H. Martin Reimann 2, Ralph van Dinther 1 and Roy D. Patterson 1 1. Introduction.

“Connecting the dots” How do articulatory processes “map” onto acoustic processes?

Multipitch Tracking for Noisy Speech

Sound source segregation Development of the ability to separate concurrent sounds into auditory objects.

1 Filters Definition: A filter is a frequency selective system that allows energy at certain frequencies and attenuates the rest.

CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.

Intensity representation 1 Representation of the intensity of sound (or is it something else about efficiency?)

Voice source characteristics in speaker segregation Patti Adank.

Chapter 1: Information and Computation. Cognitive Science  José Luis Bermúdez / Cambridge University Press 2010 Overview Review key ideas from last few.

SYED SYAHRIL TRADITIONAL MUSICAL INSTRUMENT SIMULATOR FOR GUITAR1.

Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.

The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.

Effectiveness of spatial cues, prosody, and talker characteristics in selective attention C.J. Darwin & R.W. Hukin.

Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.

Structure of Human Speech Chris Darwin Vocal Tract.

L 17 The Human Voice. The Vocal Tract epiglottis.

Source Localization in Complex Listening Situations: Selection of Binaural Cues Based on Interaural Coherence Christof Faller Mobile Terminals Division,

A.Diederich– International University Bremen – Sensation and Perception – Fall Frequency Analysis in the Cochlea and Auditory Nerve cont'd The Perception.

Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota.

Source Segregation Chris Darwin Experimental Psychology University of Sussex.

Hearing & Deafness (4) Pitch Perception 1. Pitch of pure tones 2. Pitch of complex tones.

Hearing & Deafness (5) Timbre, Music & Speech Vocal Tract.

Speech Perception Richard Wright Linguistics 453.

Localising multiple sounds. Phenomenology Different sounds localised appropriately The whole of a sound is localised appropriately …even when cues mangled.

Auditory Objects of Attention Chris Darwin University of Sussex With thanks to : Rob Hukin (RA) Nick Hill (DPhil) Gustav Kuhn (3° year proj) MRC.

Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.

The Science of Sound Chapter 8

Harmonics, Timbre & The Frequency Domain

Speech Segregation Based on Sound Localization DeLiang Wang & Nicoleta Roman The Ohio State University, U.S.A. Guy J. Brown University of Sheffield, U.K.

Source/Filter Theory and Vowels February 4, 2010.

Alan Kan, Corey Stoelb, Matthew Goupell, Ruth Litovsky

Resonance, Revisited March 4, 2013 Leading Off… Project report #3 is due! Course Project #4 guidelines to hand out. Today: Resonance Before we get into.

C ENTRAL A UDITORY P ROCESSING D ISORDERS AND A SSISTIVE L ISTENING D EVICES R EVIEW.

Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.

METHODOLOGY INTRODUCTION ACKNOWLEDGEMENTS LITERATURE Low frequency information via a hearing aid has been shown to increase speech intelligibility in noise.

CAPD: ”Behavioral assessment”

Sounds in a reverberant room can interfere with the direct sound source. The normal hearing (NH) auditory system has a mechanism by which the echoes, or.

Hearing & Aging Or age brings wisdom and other bad news.

Dynamic Aspects of the Cocktail Party Listening Problem Douglas S. Brungart Air Force Research Laboratory.

Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp

Phonetic Context Effects Major Theories of Speech Perception Motor Theory: Specialized module (later version) represents speech sounds in terms of intended.

Hearing Research Center

Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none.

SOUND PRESSURE, POWER AND LOUDNESS MUSICAL ACOUSTICS Science of Sound Chapter 6.

Auditory Processing Disorders. Definition Observed defciency in one or more of the following behaviors: –Sound localization –Auditory disrcrimination.

When the Brain is attending a cocktail party When the Brain is attending a cocktail party Rossitza Draganova.

Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.

PSYC Auditory Science Spatial Hearing Chris Plack.

Fletcher’s band-widening experiment (1940)

The role of reverberation in release from masking due to spatial separation of sources for speech identification Gerald Kidd, Jr. et al. Acta Acustica.

What can we expect of cochlear implants for listening to speech in noisy environments? Andrew Faulkner: UCL Speech Hearing and Phonetic Sciences.

SPATIAL HEARING Ability to locate the direction of a sound. Ability to locate the direction of a sound. Localization: In free field Localization: In free.

COMBINATION TONES The Science of Sound Chapter 8 MUSICAL ACOUSTICS.

L 17 The Human Voice.

B. Harpsichord Strings are plucked

Aim To test Cherry’s findings on attention ‘more rigorously’. Sample

4aPPa32. How Susceptibility To Noise Varies Across Speech Frequencies

PSYCHOACOUSTICS A branch of psychophysics

Consistent and inconsistent interaural cues don't differ for tone detection but do differ for speech recognition Frederick Gallun Kasey Jakien Rachel Ellinger.

Ana Alves-Pinto, Joseph Sollini, Toby Wells, and Christian J. Sumner

CS 591 S1 – Computational Audio -- Spring, 2017

Attentional Tracking in Real-Room Reverberation

The Production of Speech

Speech Perception (acoustic cues)

Attentive Tracking of Sound Sources

Auditory, Tactical, and Olfactory Displays

Presentation transcript:

Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC & EPSRC

1.Review past work on the way that the human auditory system uses differences in Fo to separate two voices; 2. Present new data on the use of Fo, vocal- tract length and their combination to allow listeners to select one of tw o simultaneous messages. Something old, something new, something borrowed, background blue.

Difference in Fo leads to: 1.binaural separation of sound sources 2.increase in intelligibility 3.ability to track a sound source over time. Three types of experiment:

Difference in Fo leads to: 1.binaural separation of sound sources 2.increase in intelligibility 3.ability to track a sound source over time. Three types of experiment:

Broadbent & Ladefoged (1957) PAT-generated sentence “What did you say before that?” F1F2 when Fo the same -125 Hz (either natural or monotone), listeners heard: one voice only 16/18 in one place 18/18 when Fo different -125 /135 (monotone), listeners heard: two voices 15/18 in two places 12/18

... Harvey Fletcher (1953) was there first ! (almost) p 216 describes experiment (suggested by Arnold). Speech fuses but polyphonic music sounds weird since different notes are heard at different

B & L Conclusion Common Fo integrates –broadband frequency regions of a single voice –coming simultaneously to different ears into a single voice heard in one position.

Is a common Fo sufficient for fusion? Broadbent & Ladefoged's stimuli used formant resonators with broad low-frequency skirts. Sharply-filtered sounds sometimes give impression of two sound sources even with common Fo.

Formant T(f) & abs difference

Dichotic : same Fo original PSOLA Fo -> 0% PSOLA Fo -> 0% LP filter HP filter Left ear Right ear apologies to Hideki

Dichotic : different Fo original PSOLA Fo -> - 4% PSOLA Fo -> + 4% LP filter HP filter Left ear Right ear

Complementary LP/HP filters Variable bandwidth

Complementary LP/HP filters (dB)

Dichotic Results (female voice) Filter 1 kHz

Dichotic Results (male voice) Dichotic

-| Level difference | between ears (dB)

Higher filter cut-offs need wider bandwidths Same Fo

Low-frequency overlap cf natural ILDs higher for low frequency sounds

ITD : same Fo original PSOLA Fo -> 0% PSOLA Fo -> 0% LP filter HP filter Left ear Right ear Delay ±571 µs

ITD : different Fo original PSOLA Fo -> - 4% PSOLA Fo -> + 4% LP filter HP filter Left ear Right ear Delay ±571 µs

ITD Results (female voice) ±570 µs ITD

ITD Results (male voice) ±570 µs ITD

Summary Fusion at same Fo? Fusion at Different Fo (±4%)? Dichotic Low-frequency overlap needed No But what about Fo’s ability to separate different voices? (original B & L question)

Difference in Fo leads to: 1.binaural separation of sound sources 2.increase in intelligibility 3.ability to track a sound source over time. Three types of experiment:

 Fo improves identification double vowels sentences double vowels over by 1 semitone sentences improve for longer

Mechanisms of  Fo improvement A. Global: Across formant grouping by Fo (as originally conceived by B & L) B. Local: Better definition of individual formants - especially F1 where harmonics resolved At small ∆Fos B more important than A for double vowels (Culling & Darwin, JASA 1993). Also true for sentences?

 Fo between two sentences (Bird & Darwin 1998; after Brokx & Nooteboom, 1982) Target sentence Fo = 140 Hz Masking sentence = 140 Hz ± 0,1,2,5,10 semitones Two sentences (same talker) only voiced consonants (with very few stops) Task: write down target sentence Replicates & extends Brokx & Nooteboom

Chimeric sentences (Bird & Darwin, Grantham Meeting 1998) Fo below 800 HzFo above 800 Hz

Paired sentences' Fos Low Pass High Pass Normal Same Fo in High Same Fo in Low Swapped (gives wrong gping)112100

Segregating sentence pairs by Fo all the action is in the low frequency region (<800 Hz) no strong evidence of across-formant grouping

Adding Fo-swapped inappropriate pairing of Fo only detrimental above 4 semitones

Summary of Fo-differences Across-formant grouping only significant for large Fo differences (> ~ 4 semitones) Most of the improvement with small Fo differences happens in the F1 frequency- region.

another caveat for auto-correlation Improvement in identification of double vowels for small ∆Fos is about as good when each vowel is made up of alternating harmonics of the two Fos (Culling & Darwin) Autocorrelation would pull out completely wrong envelopes.

No simultaneous effect of FM different Frequency Modulations of Fo Although separation by Fo shows strong effects, there is no detectable effect of simultaneous separation by different Frequency Modulations of Fo. Listeners unable to discriminate correlated from uncorrelated FM in simulataneous inharmonic sine waves (Carlyon).

Summary of  Fo effects in separating competing voices Intelligibility increased by small  Fo only in F1 region (and harmonic alternation tolerated)... … but not by  Fo in only higher freq. region. Across-formant consistency of Fo only important at larger  Fo FM produces no additional separation

Difference in Fo leads to: 1.binaural separation of sound sources 2.increase in intelligibility 3.ability to track a sound source over time. Three types of experiment:

Tracking by Fo We can also continuity of an Fo contour to track a particular sound source over time.

CRM task (tracking a sound source) (Bolia et al., 2000) 2 simultaneous sentences each of form  Ready (Call Sign) go to (Color) (Number) now.  Same talker (TT); Same Sex (TS); Different sex (TD) Target denoted by Call-Sign "Baron" 8 Talkers in corpus, 2048 tokens

Listeners responded by selecting the appropriate colored digit with the computer mouse CRM task (Bolia et al., 2000)

CRM task results (Brungart et al)

Effect of change in Fo

Fo contours for 2 individuals Individuals, with most constant Fo contours, show most improvement with ∆Fo

Effect of change of VT

Effect of joint change of Fo and VT Original: male

Effect of joint change of Fo and VT Original: female

Superadditivity of ∆Fo and ∆VT predicted d' actual d' male female ∆Fo & ∆VT superadditive … and still less than real different-sex talkers

Conclusions Same Fo not a sufficient condition for dichotic fusion for complemenarily filtered speech. Intelligibility increase for small ∆Fo confined to F1 region. Only across-formant for larger ∆Fo. Fo & VT-size useful for tracking sources across time. Superadditive.