Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam University
2 Contents Acquiring prosody in language learning…...3 Previous approaches……………………….4 A new tool…………………………………5 Technical details…………………………...6 Implications of the technique…………….15 Preliminary plans for an experiment……..16
3 Acquiring prosody in language learning One of the critical tasks in language learning Prosody as non-segmental features of speech 1. phrase breaks 2. intonation (F0) contour 3. segmental durations 4. intensity contour
4 Previous approaches Explicit teaching of prosodic features such as the intonation contours, segmental durations, etc. Audio aid Listen and repeat! Visual aid in computer software Dr.Speaking ® : F0 contour comparison between native speaker and non-native speaker
5 A new tool A new kind of audio aid in the form of a non-native speaker’s utterance with the prosodic features of a native speaker’s utterance How this works 1. Software presents a native speaker’s utterance 2. A non-native speaker repeats the utterance 3. Software records the non-native speaker’s utterance 4. Software imposes the native speaker’s prosody onto the non-native speaker’s utterance 5. Software presents the processed non-native utterance
6 Technical details Manipulation of 1. segmental durations, including phrase breaks 2. F0 contours 3. intensity contours For 1 and 2 PSOLA (Pitch Synchronous OverLap and Add), developed by Moulines & Charpentier, 1990 implemented in Praat For 3 Intensity swap in Praat
7 Technical details Moulines & Charpentier, 1990 original waveform windowed waveform shortened waveform waveform with lower F0
8 Technical details 1 Segmental durations Segmental alignment & PSOLA processing : Alignment can be manual or automatic (with the help of speech recognition) keIeI min “…came in…”native keIeI in non-native m
9 Technical details 2 F0 contours PSOLA processing on duration-treated utterance keIeI min native non-native keIeI min higher F0 lower F0
10 Technical details 3 Intensity contours Mathematically “neutralize” non-native speaker’s intensity contour and transfer native speaker’s intensity contour in Praat – Holger Miterer (personal communication) keIeI min native non-native keIeI min
11 Technical details Weakness 1. Voiceless segments can be made “voiced” in the windowing process (pitch-synchronous technique) 2. Excessive handling results in unnatural synthesis Segment alignment should be fine-tuned according to the voiced/voicless status of the (sub-)segments for better results
12 Technical details Examples Praat script native utterance non-native utterance synthetic non-native
13 Technical details Comparison before synthesis – duration, F0 & intensity native utterance non-native utterance (blue & yellow)
14 Technical details Comparison after synthesis – duration, F0 & intensity native utterance synthetic non-native (blue & yellow)
15 Implications of the technique The technique can be used in second language education: to facilitate/motivate acquisition of the target language prosody to emphasize the importance of prosody in achieving native speaker fluency ASR (Automatic Speech Recognition) can be employed to automate the segment aligning stage
16 Preliminary plans for an experiment Hypothesis The new type of audio feedback improves the efficiency of language, i.e. prosody, learning Method Key idea: (In a listen-and-repeat type of language learning) Contrast the “old” type of audio feedback, i.e. playing native utterances only, with the “new” type of audio feedback, i.e. playing native and synthetic utterances.
17 Preliminary plans for an experiment Method 1. Baseline: Grouping non-native learners into two (“good” and “bad”) 2. Administration: Learning either with the “old” type of audio feedback or with the “new” type of audio feedback 3. Evaluation: Evaluate the two type of feedback by examining the recordings of the learners In 1 and 3, a native speaker marks the recordings of the non-native learners on a categorical/numerical scale. In 2, the two groups (good/bad) are divided into four subgroups (good-A/good-B/bad-A/bad-B) so that A groups are given “old” type of audio feedback and B groups are given “new” type of audio feedback.