Download presentation
Presentation is loading. Please wait.
2
Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC & EPSRC
3
1.Review past work on the way that the human auditory system uses differences in Fo to separate two voices; 2. Present new data on the use of Fo, vocal- tract length and their combination to allow listeners to select one of tw o simultaneous messages. Something old, something new, something borrowed, background blue.
4
Difference in Fo leads to: 1.binaural separation of sound sources 2.increase in intelligibility 3.ability to track a sound source over time. Three types of experiment:
5
Difference in Fo leads to: 1.binaural separation of sound sources 2.increase in intelligibility 3.ability to track a sound source over time. Three types of experiment:
6
Broadbent & Ladefoged (1957) PAT-generated sentence “What did you say before that?” F1F2 when Fo the same -125 Hz (either natural or monotone), listeners heard: one voice only 16/18 in one place 18/18 when Fo different -125 /135 (monotone), listeners heard: two voices 15/18 in two places 12/18
7
... Harvey Fletcher (1953) was there first ! (almost) p 216 describes experiment (suggested by Arnold). Speech fuses but polyphonic music sounds weird since different notes are heard at different ears LP @1kHzHP @1kHz
8
B & L Conclusion Common Fo integrates –broadband frequency regions of a single voice –coming simultaneously to different ears into a single voice heard in one position.
9
Is a common Fo sufficient for fusion? Broadbent & Ladefoged's stimuli used formant resonators with broad low-frequency skirts. Sharply-filtered sounds sometimes give impression of two sound sources even with common Fo.
10
Formant T(f) & abs difference
11
Dichotic : same Fo original PSOLA Fo -> 0% PSOLA Fo -> 0% LP filter HP filter Left ear Right ear apologies to Hideki
12
Dichotic : different Fo original PSOLA Fo -> - 4% PSOLA Fo -> + 4% LP filter HP filter Left ear Right ear
13
Complementary LP/HP filters Variable bandwidth
14
Complementary LP/HP filters (dB)
15
Dichotic Results (female voice) Filter X-over @ 1 kHz
16
Dichotic Results (male voice) Dichotic
17
-| Level difference | between ears (dB)
18
Higher filter cut-offs need wider bandwidths Same Fo
19
Low-frequency overlap cf natural ILDs higher for low frequency sounds
20
ITD : same Fo original PSOLA Fo -> 0% PSOLA Fo -> 0% LP filter HP filter Left ear Right ear Delay ±571 µs
21
ITD : different Fo original PSOLA Fo -> - 4% PSOLA Fo -> + 4% LP filter HP filter Left ear Right ear Delay ±571 µs
22
ITD Results (female voice) ±570 µs ITD
23
ITD Results (male voice) ±570 µs ITD
24
Summary Fusion at same Fo? Fusion at Different Fo (±4%)? Dichotic Low-frequency overlap needed No But what about Fo’s ability to separate different voices? (original B & L question)
25
Difference in Fo leads to: 1.binaural separation of sound sources 2.increase in intelligibility 3.ability to track a sound source over time. Three types of experiment:
26
Fo improves identification double vowels sentences double vowels over by 1 semitone sentences improve for longer
27
Mechanisms of Fo improvement A. Global: Across formant grouping by Fo (as originally conceived by B & L) B. Local: Better definition of individual formants - especially F1 where harmonics resolved At small ∆Fos B more important than A for double vowels (Culling & Darwin, JASA 1993). Also true for sentences?
28
Fo between two sentences (Bird & Darwin 1998; after Brokx & Nooteboom, 1982) Target sentence Fo = 140 Hz Masking sentence = 140 Hz ± 0,1,2,5,10 semitones Two sentences (same talker) only voiced consonants (with very few stops) Task: write down target sentence Replicates & extends Brokx & Nooteboom
29
Chimeric sentences (Bird & Darwin, Grantham Meeting 1998) 100-100100-106100-112100-133100-178 Fo below 800 HzFo above 800 Hz
30
Paired sentences' Fos Low Pass High Pass Normal100100 112112 Same Fo in High100100 112100 Same Fo in Low100100 100112 Swapped100112 (gives wrong gping)112100
31
Segregating sentence pairs by Fo all the action is in the low frequency region (<800 Hz) no strong evidence of across-formant grouping
32
Adding Fo-swapped inappropriate pairing of Fo only detrimental above 4 semitones
33
Summary of Fo-differences Across-formant grouping only significant for large Fo differences (> ~ 4 semitones) Most of the improvement with small Fo differences happens in the F1 frequency- region.
34
another caveat for auto-correlation Improvement in identification of double vowels for small ∆Fos is about as good when each vowel is made up of alternating harmonics of the two Fos (Culling & Darwin) Autocorrelation would pull out completely wrong envelopes.
35
No simultaneous effect of FM different Frequency Modulations of Fo Although separation by Fo shows strong effects, there is no detectable effect of simultaneous separation by different Frequency Modulations of Fo. Listeners unable to discriminate correlated from uncorrelated FM in simulataneous inharmonic sine waves (Carlyon).
36
Summary of Fo effects in separating competing voices Intelligibility increased by small Fo only in F1 region (and harmonic alternation tolerated)... … but not by Fo in only higher freq. region. Across-formant consistency of Fo only important at larger Fo FM produces no additional separation
37
Difference in Fo leads to: 1.binaural separation of sound sources 2.increase in intelligibility 3.ability to track a sound source over time. Three types of experiment:
38
Tracking by Fo We can also continuity of an Fo contour to track a particular sound source over time.
39
CRM task (tracking a sound source) (Bolia et al., 2000) 2 simultaneous sentences each of form Ready (Call Sign) go to (Color) (Number) now. Same talker (TT); Same Sex (TS); Different sex (TD) Target denoted by Call-Sign "Baron" 8 Talkers in corpus, 2048 tokens
40
Listeners responded by selecting the appropriate colored digit with the computer mouse CRM task (Bolia et al., 2000)
41
CRM task results (Brungart et al)
42
Effect of change in Fo
44
Fo contours for 2 individuals Individuals, with most constant Fo contours, show most improvement with ∆Fo
45
Effect of change of VT
46
Effect of joint change of Fo and VT Original: male
47
Effect of joint change of Fo and VT Original: female
48
Superadditivity of ∆Fo and ∆VT 0.00 0.50 1.00 1.50 0.000.501.001.50 predicted d' actual d' male female ∆Fo & ∆VT superadditive … and still less than real different-sex talkers
49
Conclusions Same Fo not a sufficient condition for dichotic fusion for complemenarily filtered speech. Intelligibility increase for small ∆Fo confined to F1 region. Only across-formant for larger ∆Fo. Fo & VT-size useful for tracking sources across time. Superadditive.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.