Download presentation
Presentation is loading. Please wait.
Published byJoshua Whitehead Modified over 9 years ago
1
04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University
2
04/08/04 Issues for text-to-speech It should sound like a person AND should sound like a person who can read AND it should sound like a person who understands what they are reading
3
04/08/04 Credits FESTIVAL: Alan W. Black, Paul Taylor, Simon King, Kevin Lenzo Huang, Acero and Huang: Spoken Language Processing Many web-based demos – http://www.ims.uni- stuttgart.de/~moehler/synthspeech/examples.html http://www.ims.uni- stuttgart.de/~moehler/synthspeech/examples.html – http://www.icsi.berkeley.edu/eecs225d/klatt.html http://www.icsi.berkeley.edu/eecs225d/klatt.html
4
04/08/04 Text-to-speech Text and Phonetic Analysis: What to say Prosody: How to say it Waveform synthesis: Making it sound right
5
04/08/04 Text and phonetic processing Homographs Letter-to-sound Abbreviations
6
04/08/04 Prosody Pauses Pitch Speech rate/ relative duration
7
04/08/04 Waveform generation Articulatory Synthesis – Simulation of mechanics of speech production Formant Synthesis – Source/filter model. Concatenative synthesis – Limited domain waveform concatenation – No waveform modification – With waveform modification
8
04/08/04 Waveform generation Use linear predictive coding to analyse signal into filter and residual, then excite with appropriate residual. Main benefit, compression.
9
04/08/04 One slide of speech acoustics Formants - bands of strong energy in the speech signal Spectrogram - representation of relation between time (x), frequency (y) and intensity The speech organs consist of a noise source and some resonant cavities. We speak by changing the shape of the cavities, making some parts of the source come out strong, others weaker.
10
04/08/04 Sound like a person Get a person to record whole vocabulary, then splice together the words to make sentences. But: speech is hard to cut up in such a way that it sews back together nicely.
11
04/08/04 Sound like a person who can read Grapheme to phoneme conversion. Input: text Output: phoneme string + annotations for stress and intonation. Spelling rules get you some of the way, but even in languages with regular spelling (English not among these) exceptions require the use of a dictionary.
12
04/08/04 Text Normalization Henry V Part I, Act II scene 11, Mr. X is, I believe V.I. Lenin and not Charles I.
13
04/08/04 Specialized text types Smith,Bobbie Q,3337 St Laurence St, Fort Worth,TX 71611-5484 (817) 839-3689 Anderson, W, 445 Sycamore Way NE, Lincoln, NE 98125- 5108,(212)404-9998 Raw Address
14
04/08/04 SABLE See rinss-slides
15
04/08/04 Sound like you understand Lexical stress and intonation matter very much, and tie in with pragmatics. The system doesn’t in fact understand enough to get this right. Best you can do is fake it. There are lots of cues available in the text, but mistakes are inevitable.
16
04/08/04 Rumpke Advert Rhetorical Systems Definitely wrong Possibly good enough
17
04/08/04 Multilingual and flexible Festival is open-architecture, and has been extended by lots of people It can even (easily) be made to speak in your voice.
18
04/08/04 Prosody
19
04/08/04 Boston It will be rainy today in Boston
20
04/08/04 Challenges for speech synthesis Improve overall speech quality Refine ways of organizing and collecting speech databases Improve the quality of the control signal
21
04/08/04 Sounds
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.