Download presentation
Presentation is loading. Please wait.
Published byLeticia Joplin Modified over 10 years ago
1
SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg
2
Text-to speech Synthesis
Text analysis Prosody generation Sound generation Synthetic speech Lexicon & Rules Pitch & duration (stød) Diphone-database
3
Why is it so difficult ? Text nomalisation Morphological analysis
“kl 12-14”, “8-3=5”, “ ”, “mio”, “USA” Morphological analysis “periferien” vs. “skoleferien”, “hul” Syntactic analysis “en mand med hul røst dør bag en dør med hul i” Semantic analysis “The man fed her dog biscuits” Sound generation Transitions, time- and pitch scaling
4
Concatenative synthesis
test = /tEsd/ = /#t/ + /tE/ + /Es/ + /sd/ + /d#/ /#t/ /tE/ /Es/ /sd/ /d#/
5
Di-(tri)phone Database
database of male speaker Approx subword units (di- & triphones) Requires pitch-, di- and triphone segmentation
6
Input to the sound generator
7
Effect of scaling No scaling Time scaled + pitch scaled
+ energy + stød
8
More examples Normal High speaking rate, normal pitch
(aalb.wav) High speaking rate, normal pitch (fast.wav) Low speaking rate, normal pitch (slow.wav) Normal speaking rate, high pitch (light.wav) Normal speaking rate, low pitch (dark.wav)
9
Evaluation - intelligibility
32 test persons 156 stimuli in carrier sentence: “Det er <keyword>, de siger“
10
Evaluation - naturalness
32 test persons 155 stimuli
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.