Download presentation
Presentation is loading. Please wait.
Published byPercival Page Modified over 8 years ago
1
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing University of Edinburgh Supervisor: Prof. D. R. Ladd External Advisor: Robert Clark (CSTR)
2
2 Today’s presentation Project’s main goal Theoretical background Hypothesis Tools & Methods Pilot experiment design results Future work
3
3 Prosody prediction in modern TTS systems Abstract Level Acoustics Perception f0 pitch duration rhythm amplitude loudness Interaction of correlates not always clear… Not necessarily adequate information from text Speaker variability (production & perception) prosodic structure prominence tune
4
4 F 0 prediction Global f 0 properties: declination, reset. Local f 0 properties: contour shape, tonal targets, alignment. F0 predictors:syllable properties word properties rhythm syntactic structure information structure
5
5 Project’s Main Goal Intonational phonetics & phonology prosody prediction in synthesis Synthetic speech: insight on role of tonal alignment Naturalness judgements effect distribution TTS system design?
6
6 Pre-nuclear accents Prosodic units: IP (intonational phrase) iP (intermediate phrase) iP contains one or more pitch accents Final accent in iP is the nuclear accent All non final accents are pre-nuclear
7
7 The case of Modern Greek (Arvaniti et al., 1998) Tonal targets: scaling & alignment Modern Greek pre-nuclear accents: two tonal targets, a L and a H. Stability of valley (F 0 min) vs variability of peak (F 0 max) type of accent? a. bitonal L* + H b. L* accent followed by H phrase tone
8
8 The case of Modern Greek (Arvaniti et al., 1998) H L C 0 V 0 *C 1 V 1 Tonal targets independently aligned with specific points in segmental string. Duration & slope of f 0 movement depends on segmental quality. (+15ms)(-5ms)
9
9 What does the project actually involve? Presuppose validity of Arvaniti et al.’s findings Apply them in synthetic speech (DEMOSTHeNES Speech Composer) Move alignment points of both L and H (Praat) Perceptual experiments (E-Prime)
10
10 Original hypothesis Original hypothesis Movements in alignment are not going to influence perception of naturalness significantly. In case perception is affected, late alignment of the F0 max is expected to have the greatest influence.
11
11 Test Sentences At least one unaccented syllable preceding accented one Accented vowel between nasals, lateral At least two syllables before following accent Example Sentence Τοανώνυμογράμματηναναστάτωσε. Toano*nimogra*matinanasta*tose
12
12 DEMOSTHeNES University of Athens, M-PIRO project a modular system like Edinburgh’s Festival (HRG, VSERVER, VCOM, VMOD) Prosody in DEMOSTHeNES duration, pitch, amplitude offered as VCOMs linked to the HRG Current prosodic model: phrasing & lexical stress
13
Output (Praat) f 0 declination reset at phrase breaks limited pitch range limited movements
14
14 Towards naturalness I Apply results of Arvaniti et al. to default pitch contour of DEMOSTHeNES. H L C 0 V 0 *C 1 V 1 Not only first but also second stressed syllable (-5ms) (+15ms)
15
Output (Praat) f0 declination same pitch range more f0 movements
16
16 Towards naturalness II : modifications in alignment Targets moved independently earlier or later than normal alignment points Early – Late Late – Early Normal – Late etc… 40 – 80 ms 50 – 100 ms 60 – 120 ms ?
17
Output Early L (50ms) Late H (100ms)
18
Output Late L (50ms) Early H (100ms)
19
19 Design of pilot perceptual experiment 2 sentences: standard VS modified alignment N – N VSEarly – Late Late – Early Normal - Late Naturalness judgement of pair-comparisons 12 native Greek speakers, students in Edinburgh Aim: 40 – 80 50 – 100 60 - 120 ?
20
Results I
21
Results II
22
22 Future Work 10 sentences: standard VS modified alignment N – N all possible combinations between Early – Normal – Late Modifications by 40 – 80 and 60 – 120 ms Native Greek speakers, Greece, July :-) Aim: patterns in perception of naturalness?
23
23 The contribution of this project Insight on role of alignment in perceiving a synthetic utterance as natural TTS system design results not restricted to Greek evidence for segmental anchoring in other languages – studies of Dutch, German, English
24
24 Sound files DEMOSTHeNES Arvaniti et al. Early L (50ms)– Late H (100ms) Late L (50ms)– Early H (100ms)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.