Presentation is loading. Please wait.

Presentation is loading. Please wait.

04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.

Similar presentations


Presentation on theme: "04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University."— Presentation transcript:

1 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

2 04/08/04 Issues for text-to-speech It should sound like a person AND should sound like a person who can read AND it should sound like a person who understands what they are reading

3 04/08/04 Credits FESTIVAL: Alan W. Black, Paul Taylor, Simon King, Kevin Lenzo Huang, Acero and Huang: Spoken Language Processing Many web-based demos – http://www.ims.uni- stuttgart.de/~moehler/synthspeech/examples.html http://www.ims.uni- stuttgart.de/~moehler/synthspeech/examples.html – http://www.icsi.berkeley.edu/eecs225d/klatt.html http://www.icsi.berkeley.edu/eecs225d/klatt.html

4 04/08/04 Text-to-speech Text and Phonetic Analysis: What to say Prosody: How to say it Waveform synthesis: Making it sound right

5 04/08/04 Text and phonetic processing Homographs Letter-to-sound Abbreviations

6 04/08/04 Prosody Pauses Pitch Speech rate/ relative duration

7 04/08/04 Waveform generation Articulatory Synthesis – Simulation of mechanics of speech production Formant Synthesis – Source/filter model. Concatenative synthesis – Limited domain waveform concatenation – No waveform modification – With waveform modification

8 04/08/04 Waveform generation Use linear predictive coding to analyse signal into filter and residual, then excite with appropriate residual. Main benefit, compression.

9 04/08/04 One slide of speech acoustics Formants - bands of strong energy in the speech signal Spectrogram - representation of relation between time (x), frequency (y) and intensity The speech organs consist of a noise source and some resonant cavities. We speak by changing the shape of the cavities, making some parts of the source come out strong, others weaker.

10 04/08/04 Sound like a person Get a person to record whole vocabulary, then splice together the words to make sentences. But: speech is hard to cut up in such a way that it sews back together nicely.

11 04/08/04 Sound like a person who can read Grapheme to phoneme conversion. Input: text Output: phoneme string + annotations for stress and intonation. Spelling rules get you some of the way, but even in languages with regular spelling (English not among these) exceptions require the use of a dictionary.

12 04/08/04 Text Normalization Henry V Part I, Act II scene 11, Mr. X is, I believe V.I. Lenin and not Charles I.

13 04/08/04 Specialized text types Smith,Bobbie Q,3337 St Laurence St, Fort Worth,TX 71611-5484 (817) 839-3689 Anderson, W, 445 Sycamore Way NE, Lincoln, NE 98125- 5108,(212)404-9998 Raw Address

14 04/08/04 SABLE See rinss-slides

15 04/08/04 Sound like you understand Lexical stress and intonation matter very much, and tie in with pragmatics. The system doesn’t in fact understand enough to get this right. Best you can do is fake it. There are lots of cues available in the text, but mistakes are inevitable.

16 04/08/04 Rumpke Advert Rhetorical Systems Definitely wrong Possibly good enough

17 04/08/04 Multilingual and flexible Festival is open-architecture, and has been extended by lots of people It can even (easily) be made to speak in your voice.

18 04/08/04 Prosody

19 04/08/04 Boston It will be rainy today in Boston

20 04/08/04 Challenges for speech synthesis Improve overall speech quality Refine ways of organizing and collecting speech databases Improve the quality of the control signal

21 04/08/04 Sounds


Download ppt "04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University."

Similar presentations


Ads by Google