Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC.

Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC & EPSRC

1.Review past work on the way that the human auditory system uses differences in Fo to separate two voices; 2. Present new data on the use of Fo, vocal- tract length and their combination to allow listeners to select one of tw o simultaneous messages. Something old, something new, something borrowed, background blue.

Difference in Fo leads to: 1.binaural separation of sound sources 2.increase in intelligibility 3.ability to track a sound source over time. Three types of experiment:

Broadbent & Ladefoged (1957) PAT-generated sentence “What did you say before that?” F1F2 when Fo the same -125 Hz (either natural or monotone), listeners heard: one voice only 16/18 in one place 18/18 when Fo different -125 /135 (monotone), listeners heard: two voices 15/18 in two places 12/18

... Harvey Fletcher (1953) was there first ! (almost) p 216 describes experiment (suggested by Arnold). Speech fuses but polyphonic music sounds weird since different notes are heard at different ears LP @1kHzHP @1kHz

B & L Conclusion Common Fo integrates –broadband frequency regions of a single voice –coming simultaneously to different ears into a single voice heard in one position.

Is a common Fo sufficient for fusion? Broadbent & Ladefoged's stimuli used formant resonators with broad low-frequency skirts. Sharply-filtered sounds sometimes give impression of two sound sources even with common Fo.

Formant T(f) & abs difference

Dichotic : same Fo original PSOLA Fo -> 0% PSOLA Fo -> 0% LP filter HP filter Left ear Right ear apologies to Hideki

Dichotic : different Fo original PSOLA Fo -> - 4% PSOLA Fo -> + 4% LP filter HP filter Left ear Right ear

Complementary LP/HP filters Variable bandwidth

Complementary LP/HP filters (dB)

Dichotic Results (female voice) Filter X-over @ 1 kHz

Dichotic Results (male voice) Dichotic

-| Level difference | between ears (dB)

Higher filter cut-offs need wider bandwidths Same Fo

Low-frequency overlap cf natural ILDs higher for low frequency sounds

ITD : same Fo original PSOLA Fo -> 0% PSOLA Fo -> 0% LP filter HP filter Left ear Right ear Delay ±571 µs

ITD : different Fo original PSOLA Fo -> - 4% PSOLA Fo -> + 4% LP filter HP filter Left ear Right ear Delay ±571 µs

ITD Results (female voice) ±570 µs ITD

ITD Results (male voice) ±570 µs ITD

Summary Fusion at same Fo? Fusion at Different Fo (±4%)? Dichotic Low-frequency overlap needed No But what about Fo’s ability to separate different voices? (original B & L question)

 Fo improves identification double vowels sentences double vowels over by 1 semitone sentences improve for longer

Mechanisms of  Fo improvement A. Global: Across formant grouping by Fo (as originally conceived by B & L) B. Local: Better definition of individual formants - especially F1 where harmonics resolved At small ∆Fos B more important than A for double vowels (Culling & Darwin, JASA 1993). Also true for sentences?

 Fo between two sentences (Bird & Darwin 1998; after Brokx & Nooteboom, 1982) Target sentence Fo = 140 Hz Masking sentence = 140 Hz ± 0,1,2,5,10 semitones Two sentences (same talker) only voiced consonants (with very few stops) Task: write down target sentence Replicates & extends Brokx & Nooteboom

Chimeric sentences (Bird & Darwin, Grantham Meeting 1998) 100-100100-106100-112100-133100-178 Fo below 800 HzFo above 800 Hz

Paired sentences' Fos Low Pass High Pass Normal100100 112112 Same Fo in High100100 112100 Same Fo in Low100100 100112 Swapped100112 (gives wrong gping)112100

Segregating sentence pairs by Fo all the action is in the low frequency region (<800 Hz) no strong evidence of across-formant grouping

Adding Fo-swapped inappropriate pairing of Fo only detrimental above 4 semitones

Summary of Fo-differences Across-formant grouping only significant for large Fo differences (> ~ 4 semitones) Most of the improvement with small Fo differences happens in the F1 frequency- region.

another caveat for auto-correlation Improvement in identification of double vowels for small ∆Fos is about as good when each vowel is made up of alternating harmonics of the two Fos (Culling & Darwin) Autocorrelation would pull out completely wrong envelopes.

No simultaneous effect of FM different Frequency Modulations of Fo Although separation by Fo shows strong effects, there is no detectable effect of simultaneous separation by different Frequency Modulations of Fo. Listeners unable to discriminate correlated from uncorrelated FM in simulataneous inharmonic sine waves (Carlyon).

Summary of  Fo effects in separating competing voices Intelligibility increased by small  Fo only in F1 region (and harmonic alternation tolerated)... … but not by  Fo in only higher freq. region. Across-formant consistency of Fo only important at larger  Fo FM produces no additional separation

Tracking by Fo We can also continuity of an Fo contour to track a particular sound source over time.

CRM task (tracking a sound source) (Bolia et al., 2000) 2 simultaneous sentences each of form  Ready (Call Sign) go to (Color) (Number) now.  Same talker (TT); Same Sex (TS); Different sex (TD) Target denoted by Call-Sign "Baron" 8 Talkers in corpus, 2048 tokens

Listeners responded by selecting the appropriate colored digit with the computer mouse CRM task (Bolia et al., 2000)

CRM task results (Brungart et al)

Effect of change in Fo

Fo contours for 2 individuals Individuals, with most constant Fo contours, show most improvement with ∆Fo

Effect of change of VT

Effect of joint change of Fo and VT Original: male

Effect of joint change of Fo and VT Original: female

Superadditivity of ∆Fo and ∆VT 0.00 0.50 1.00 1.50 0.000.501.001.50 predicted d' actual d' male female ∆Fo & ∆VT superadditive … and still less than real different-sex talkers

Conclusions Same Fo not a sufficient condition for dichotic fusion for complemenarily filtered speech. Intelligibility increase for small ∆Fo confined to F1 region. Only across-formant for larger ∆Fo. Fo & VT-size useful for tracking sources across time. Superadditive.

Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC.

Similar presentations

Presentation on theme: "Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC.

Similar presentations

Presentation on theme: "Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC."— Presentation transcript:

Similar presentations

About project

Feedback