Download presentation
Presentation is loading. Please wait.
Published byPreston Goodwin Modified over 9 years ago
2
The long-term retention of fine- grained phonetic details: evidence from a second language voice identification training task Steve Winters CAA Presentation Victoria, BC October 13, 2010
3
Basic Precepts Exemplar theory: listeners store in memory every speech experience they have in their lifetime (Johnson, 2007). Including all details of those experiences. Variability forms an inherent (and informative) part of linguistic representations. Evidence: interactions in speech processing between indexical and linguistic information. 1.Word recognition is easier for familiar voices. (Nygaard and Pisoni, 1998) 2.Talker recognition is easier in familiar languages. (Goggin et al., 1991; Perrachione et al., 2009)
4
Bilingual Talker Interactions Winters et al. (2008) tested generalization of bilingual voice recognition across languages. 1.Listeners trained to identify voices speaking in English: Showed reduced identification accuracy in German (language-dependent knowledge) 2.Listeners trained to identify voices speaking in German: Showed equivalent ID accuracy in English (language-independent knowledge) Levi et al. (submitted): listeners trained to identify talkers speaking in German do not show a word recognition advantage for those talkers in English.
5
L2 Speech Perception Indexical and linguistic information do not seem to interact when listeners learn to identify German voices. Q: Are L2 stimuli not stored in exemplar fashion? I.e., are phonetic details lost in memory? Note: non-native sound contrasts can often be difficult for second language learners to acquire. Japanese listeners have difficulty discriminating between English /l/ + /r/ (Miyawaki et al., 1975). English listeners have difficulty discriminating between Thai voiced + unaspirated stops. (Abramson + Lisker, 1970). Perhaps listeners only store in memory what they know how to label. (Pierrehumbert, 2001)
6
Empirical Ambitions Thai contains a variety of phonetic features which are not contrastive in English: Lexical tones, vowel length, three-way VOT contrast (voiced ~ unaspirated ~ aspirated stops)… Can listeners encode this information in long-term memory? Experimental goal: train listeners to identify Thai voices which are associated with a particular phonetic property. (an implicit perception task)
7
Experimental Design Example talker identification training paradigm: Talker A is associated with Tone 1 Talker B is associated with Tone 2 Talker C is associated with Tone 3, etc. Q1: How much do these phonetic associations improve talker identification accuracy over a control condition? Q2: How much is identification accuracy impaired when the tone-talker associations no longer hold? Generalization: Talker A is presented with not-Tone 1 Talker B is presented with not-Tone 2, etc.
8
Experimental Design Four different training conditions: 1.Tone-talker associations 2.VOT-talker associations 3.(Vowel-talker associations) 4.Control: no consistent associations between talkers and phonetic properties Anticipated hierarchy of talker ID accuracy: Tone associations > Vowel associations > VOT associations (primarily for reasons of cue duration)
9
Exp. 1: Talker-Tone Associations 21 native English listeners learned to identify 5 Thai/English bilingual voices. Training paradigm: 6 learning sessions (2 on each day) familiarization, training w/feedback, testing In these training sessions, each voice produced only Thai words with a particular tone. High, Mid, Low, Falling, Rising Final day of experiment: generalization 1. English words 2. Novel Thai words in which previous tone-talker associations no longer held.
10
Talker-Tone Demo
11
Rising Mid Low High Falling
12
Talker-Tone Results
13
Rapid (and consistent) learning of voices during training Generalization: No effect of language Worse performance than on initial session Note: Thai generalization performance statistically equivalent to performance on first feedback session. Generalization mistakes: 37.6% gave the talker associated with the stimulus tone in training. (remember that chance = 1/4 = 25%) Conclusion: listeners used tone as a cue to voice identity.
14
Exp. 2: Talker-VOT Associations 20 native English listeners learned to identify six Thai/English bilingual voices. Identical training paradigm (with a few more stimuli) In training session, each voice produced only Thai words with a particular Voice Onset Time: Voiced, unvoiced, aspirated Note: two voices associated with each VOT type Generalization: novel English + novel Thai words (without the same Talker-VOT associations)
15
Talker-VOT Demo
16
Aspirated Unaspirated Aspirated Voiced Unaspirated
17
Talker-VOT Results
18
Result #1: Listeners do learn to identify the voices. Although pace of learning is slower than in Tone condition. Possible confounds: More voices to learn in VOT condition (6) Two voices associated with each VOT type Result #2: Performance does drop off significantly in generalization. Listeners use VOT distinctions to identify voices. VOT distinctions are encoded in memory. Note: Allen & Miller, 2004; Francis and Driscoll, 2006
19
Talker-VOT Mistakes In generalization, there are three potential mistake types. Stimuli: Talker (VOT Type A) - Word (VOT Type B) Mistake #1: Respond with other talker of Type A. (1/5) Mistake #2: Respond with talker of Type B. (2/5) Mistake #3: Respond with unrelated talker. (2/5) Totals: Mistake #1 (talker bias): 20.2% Mistake #2 (stimulus bias): 46.3% Mistake #3 (neither): 33.4% VOT similarities are more salient than voice similarities.
20
Exp. 3: Control Condition 20 native English listeners learned to identify six Thai/English bilingual voices. Identical training paradigm to Experiment 2. No consistent associations in training between voices and particular phonetic properties. Note: essentially equivalent to German training in Winters et al. (2008)… with fewer speakers and with a different language.
21
* * *
22
Results: Experiments 1-3 In Training: Tone accuracy > Control + VOT accuracy in all six sessions. VOT accuracy > Control in sessions 3-6. In all conditions: accuracy is higher in session 6 than in session 1. In Generalization: No differences between learning conditions. But in Control: accuracy is higher for Thai stimuli than for English stimuli.
23
Discussion Listeners are storing in memory low-level acoustic cues to non-native sound contrasts. When they are associated with talker identity. Lexical tones provide more salient cues than VOT, but even VOT distinctions can be a cue to talker identity. Generalization to novel tokens works best in a Control condition. …even though rate of learning is slower in this condition, as well.
24
Conclusions These results provide further evidence for exemplar- based speech processing. Listeners encode in memory any potential cue which can be used to perform a listening task; Even if those cues are not distinctive in the listener’s native language… Or are not necessarily accessible to conscious reflection. Note: a perceptual reliance on highly specific phonetic details… Can make generalization hard.
25
Thanks! Thanks go to Kelly-Ann Casey, Tara Dainton and Sue Jackson, for all of their work in recording speakers, editing stimuli, analyzing data and running subjects through the listening experiments. This work was supported by a University of Calgary University Research Grants Committee starter grant.
26
Future Directions 1.Stronger test of exemplar-based memory: token recognition of training items 2.Is knowledge of talkers’ voices generalizable across different voice qualities? 3.Which phonetic properties support a familiar talker advantage in word recognition across languages? 4.Does learning to identify talkers associated with particular phonetic properties facilitate the learning of non-native sound contrasts?
28
Experiment 4: Vowels Still in progress! 9 native English listeners learned to identify six Thai/English bilingual voices. Identical training paradigm to Experiment 2 Each talker consistently produced only front, central, or back Thai vowels. In Generalization: talker-vowel quality associations no longer held. Voice/name labels were randomized between listeners.
29
The Thai Vowel Space iu eo a two talkers Note: there are also long/short vowel contrasts
30
Performance in the Vowel condition is no better (or worse) than the Control…yet.
31
One Persistent Issue: Talker Distinctiveness
33
One future direction: How much do talker representations depend on voice quality?
34
Imponderables Q: What cues do the listeners use to make the cross- language transfer? One future direction: Copy Thai Tones onto English words. Do language-dependent effects emerge: English word recognition? English talker identification? Also try the same trick with vowel-talker associations. “Linguistically irrelevant” vs. “Linguistically relevant” language-independent talker information.
35
More Future Directions A stronger test of exemplar memory: Listeners store in memory consistent cues to talker identity… Do they also store in memory inconsistent talker cues (found in particular tokens)? Plan: train listeners to identify talkers with particular (focused) phonetic associations Test them on training token recognition with: Words that differ in focused and unfocused phonetic properties.
36
More Future Directions Could talker identification training--with talker- property associations--aid L2 learners in the acquisition of non-native sound contrasts? Compare sound identification training regimen that: 1.alternates with talker identification training 2.alternates with a different listening task Does learning improve more with: 1.One-to-one talker-property associations? 2.Many-to-many talker-property associations?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.