04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.

04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04 Issues for text-to-speech It should sound like a person AND should sound like a person who can read AND it should sound like a person who understands what they are reading

04/08/04 Credits FESTIVAL: Alan W. Black, Paul Taylor, Simon King, Kevin Lenzo Huang, Acero and Huang: Spoken Language Processing Many web-based demos – http://www.ims.uni- stuttgart.de/~moehler/synthspeech/examples.html http://www.ims.uni- stuttgart.de/~moehler/synthspeech/examples.html – http://www.icsi.berkeley.edu/eecs225d/klatt.html http://www.icsi.berkeley.edu/eecs225d/klatt.html

04/08/04 Text-to-speech Text and Phonetic Analysis: What to say Prosody: How to say it Waveform synthesis: Making it sound right

04/08/04 Text and phonetic processing Homographs Letter-to-sound Abbreviations

04/08/04 Prosody Pauses Pitch Speech rate/ relative duration

04/08/04 Waveform generation Articulatory Synthesis – Simulation of mechanics of speech production Formant Synthesis – Source/filter model. Concatenative synthesis – Limited domain waveform concatenation – No waveform modification – With waveform modification

04/08/04 Waveform generation Use linear predictive coding to analyse signal into filter and residual, then excite with appropriate residual. Main benefit, compression.

04/08/04 One slide of speech acoustics Formants - bands of strong energy in the speech signal Spectrogram - representation of relation between time (x), frequency (y) and intensity The speech organs consist of a noise source and some resonant cavities. We speak by changing the shape of the cavities, making some parts of the source come out strong, others weaker.

04/08/04 Sound like a person Get a person to record whole vocabulary, then splice together the words to make sentences. But: speech is hard to cut up in such a way that it sews back together nicely.

04/08/04 Sound like a person who can read Grapheme to phoneme conversion. Input: text Output: phoneme string + annotations for stress and intonation. Spelling rules get you some of the way, but even in languages with regular spelling (English not among these) exceptions require the use of a dictionary.

04/08/04 Text Normalization Henry V Part I, Act II scene 11, Mr. X is, I believe V.I. Lenin and not Charles I.

04/08/04 Specialized text types Smith,Bobbie Q,3337 St Laurence St, Fort Worth,TX 71611-5484 (817) 839-3689 Anderson, W, 445 Sycamore Way NE, Lincoln, NE 98125- 5108,(212)404-9998 Raw Address

04/08/04 SABLE See rinss-slides

04/08/04 Sound like you understand Lexical stress and intonation matter very much, and tie in with pragmatics. The system doesn’t in fact understand enough to get this right. Best you can do is fake it. There are lots of cues available in the text, but mistakes are inevitable.

04/08/04 Rumpke Advert Rhetorical Systems Definitely wrong Possibly good enough

04/08/04 Multilingual and flexible Festival is open-architecture, and has been extended by lots of people It can even (easily) be made to speak in your voice.

04/08/04 Prosody

04/08/04 Boston It will be rainy today in Boston

04/08/04 Challenges for speech synthesis Improve overall speech quality Refine ways of organizing and collecting speech databases Improve the quality of the control signal

04/08/04 Sounds

04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.

Similar presentations

Presentation on theme: "04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.

Similar presentations

Presentation on theme: "04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University."— Presentation transcript:

Similar presentations

About project

Feedback