Voice source characteristics in speaker segregation Patti Adank
Some speaker-related characteristics have been found to be helpful: Darwin et al. 2003, F0 (pitch) and vocal tract length (VTL) differences between concurrent speakers help listeners attending to the target speaker Aim project: to establish whether voice source characteristics of speakers can be useful to listeners when attending to a target speaker in a multi-speaker situation
Speaker-related differences that might aid listeners: - style of speech - voice quality: creaky voice, roughness, breathiness My experiments: - establish the possible relevance of acoustic aspect of a creaky voice: jitter Speaker-related differences that aid listeners: - F0 difference (if > 2 semitones) - Vocal tract length difference (VTL) (if > 1.08) - Effects of F0 and VTL are superadditive Darwin et al. 2003
Pitch: periodicity of the voice source
Jitter: a- periodicity of the voice source
Literature: - McAdams (1989): natural jitter present in speaker’s voice may be helpful for listeners - Ellis (1993): segregate simultaneously presented vowels using jitter differences alone, for a computational model
How could jitter help listeners? Auditory Scene Analysis - primitive segregations cues bottom-up involuntary listening - schema-driven segegation cues (Bregman, 1990) top-down voluntary/effortful listening
Pitch = primitive segregation cue (Scheffers, 1983, Assmann & Summerfield, 1990 etc…) + schema-driven segregation cue (Darwin et al, 2003)
Hypotheses : 0. jitter does not aid the auditory system 1. jitter is only a primitive segregation cue 2. jitter is a primitive cue AND schema-driven cue 3. jitter is only a schema-driven segregation cue
Experiments: 1. one double-vowel experiment with pitch as the experimental factor to replicate earlier results for pitch as a primitive cue 2. one double-vowel experiment with jitter as the experimental factor to establish if jitter is a primitive cue 3. An experiment like Darwin et al., with pitch and jitter as factors to establish if jitter is a schema-driven cue
Experiment 1: - Double-vowel experiment to test pitch effect - Synthetic vowels (Klat 1990): AH, EE, ER, OO, OR, 200 milliseconds - five versions of each vowel: 100 Hz, +1/4 semitone (st), +1/2 st, +1 st, +2 st
Experiment 2: - Double-vowel experiment to test jitter effect - Synthetic vowels (Klat 1990) altered version: AH, EE, ER, OO, OR, 200 milliseconds - five versions of each vowel: 100 Hz, +/-1%, +/-2%, +/-4%, +/-8%
Procedure (1 & 2): - 7 listeners (5 British-English, 2 bilingual) - categorization pre-test (45 stimuli) - experiment 1 (or 2): presentation double vowel (125 combinations) select one of 15 options
Results pitch
Results jitter
Hypotheses : 0. jitter does not aid the auditory system 1. jitter is only a primitive segregation cue 2. jitter is a primitive cue AND schema-driven cue 3. jitter is only a schema-driven segregation cue 4. jitter is a primitive segregation cue if there is also a pitch difference.
Results jitter & pitch
Is there still hope for jitter? Next experiment: test if jitter is schema-driven cue Setup as in Darwin et al.: 2 sentences from same speaker presented simultaneously attend to target sentence report on target words vary jitter and pitch of the sentences