04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.

Slides:



Advertisements
Similar presentations
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Advertisements

Normal Aspects of Articulation. Definitions Phonetics Phonology Articulatory phonetics Acoustic phonetics Speech perception Phonemic transcription Phonetic.
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
High Level Prosody features: through the construction of a model for emotional speech Loic Kessous Tel Aviv University Speech, Language and Hearing
S. P. Kishore*, Rohit Kumar** and Rajeev Sangal* * Language Technologies Research Center International Institute of Information Technology Hyderabad **
5-Text To Speech (TTS) Speech Synthesis
PHONETICS AND PHONOLOGY
General Problems  Foreign language speakers of a target language cause a great difficulty to native speakers because the sounds they produce seems very.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
SOME SIMPLE MANIPULATIONS OF SOUND USING DIGITAL SIGNAL PROCESSING Richard M. Stern demo August 31, 2004 Department of Electrical and Computer.
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Back-End Synthesis* Julia Hirschberg (*Thanks to Dan, Jim, Richard Sproat, and Erica Cooper for slides)
1 Phonetics Study of the sounds of Speech Articulatory Acoustic Experimental.
03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital.
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
09/09/2005ENEE408G Fall 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 1: Digital Speech.
Digital signal Processing Digital signal Processing ECI Semester /2004 Telecommunication and Internet Engineering, School of Engineering, South.
Phonetics and Phonology.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
A Text-to-Speech Synthesis System
Natural Language Understanding
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
Harmonics, Timbre & The Frequency Domain
LE 460 L Acoustics and Experimental Phonetics L-13
Speech synthesis Recording and sampling Speech recognition Apr. 5
Speech & Language Modeling Cindy Burklow & Jay Hatcher CS521 – March 30, 2006.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Prepared by: Waleed Mohamed Azmy Under Supervision:
Acoustic Analysis of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
LING 001 Introduction to Linguistics Fall 2010 Sound Structure I: Phonetics Acoustic phonetics Jan. 27.
Speech Perception 4/4/00.
CS 551/651: Structure of Spoken Language Lecture 13: Text-to-Speech (TTS) Technology and Automatic Speech Recognition (ASR) John-Paul Hosom Fall 2008.
Speech analysis with Praat Paul Trilsbeek DoBeS training course June 2007.
Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.
Levels of Language 6 Levels of Language. Levels of Language Aspect of language are often referred to as 'language levels'. To look carefully at language.
Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp
Introduction to Computational Linguistics
Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003.
© 2013 by Larson Technical Services
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
1 Speech Processing. 2 Speech Processing: Text:  Spoken language processing Huang, Acero, Hon, Prentice Hall, 2000  Discrete time processing of speech.
Ways to generate computer speech Record a human speaking every sentence HAL will ever speak (not likely) Make a mathematical model of the human vocal.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
Natural Language Processing (NLP)
1 Acoustic Phonetics 3/28/00. 2 Nasal Consonants Produced with nasal radiation of acoustic energy Sound energy is transmitted through the nasal cavity.
Phonetics and Phonology.
2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science.
IIS for Speech Processing Michael J. Watts
G. Anushiya Rachel Project Officer
Mr. Darko Pekar, Speech Morphing Inc.
Text-To-Speech System for English
Speech and Language Processing
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Indian Institute of Technology Bombay
Presentation transcript:

04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04 Issues for text-to-speech It should sound like a person AND should sound like a person who can read AND it should sound like a person who understands what they are reading

04/08/04 Credits FESTIVAL: Alan W. Black, Paul Taylor, Simon King, Kevin Lenzo Huang, Acero and Huang: Spoken Language Processing Many web-based demos – stuttgart.de/~moehler/synthspeech/examples.html stuttgart.de/~moehler/synthspeech/examples.html –

04/08/04 Text-to-speech Text and Phonetic Analysis: What to say Prosody: How to say it Waveform synthesis: Making it sound right

04/08/04 Text and phonetic processing Homographs Letter-to-sound Abbreviations

04/08/04 Prosody Pauses Pitch Speech rate/ relative duration

04/08/04 Waveform generation Articulatory Synthesis – Simulation of mechanics of speech production Formant Synthesis – Source/filter model. Concatenative synthesis – Limited domain waveform concatenation – No waveform modification – With waveform modification

04/08/04 Waveform generation Use linear predictive coding to analyse signal into filter and residual, then excite with appropriate residual. Main benefit, compression.

04/08/04 One slide of speech acoustics Formants - bands of strong energy in the speech signal Spectrogram - representation of relation between time (x), frequency (y) and intensity The speech organs consist of a noise source and some resonant cavities. We speak by changing the shape of the cavities, making some parts of the source come out strong, others weaker.

04/08/04 Sound like a person Get a person to record whole vocabulary, then splice together the words to make sentences. But: speech is hard to cut up in such a way that it sews back together nicely.

04/08/04 Sound like a person who can read Grapheme to phoneme conversion. Input: text Output: phoneme string + annotations for stress and intonation. Spelling rules get you some of the way, but even in languages with regular spelling (English not among these) exceptions require the use of a dictionary.

04/08/04 Text Normalization Henry V Part I, Act II scene 11, Mr. X is, I believe V.I. Lenin and not Charles I.

04/08/04 Specialized text types Smith,Bobbie Q,3337 St Laurence St, Fort Worth,TX (817) Anderson, W, 445 Sycamore Way NE, Lincoln, NE ,(212) Raw Address

04/08/04 SABLE See rinss-slides

04/08/04 Sound like you understand Lexical stress and intonation matter very much, and tie in with pragmatics. The system doesn’t in fact understand enough to get this right. Best you can do is fake it. There are lots of cues available in the text, but mistakes are inevitable.

04/08/04 Rumpke Advert Rhetorical Systems Definitely wrong Possibly good enough

04/08/04 Multilingual and flexible Festival is open-architecture, and has been extended by lots of people It can even (easily) be made to speak in your voice.

04/08/04 Prosody

04/08/04 Boston It will be rainy today in Boston

04/08/04 Challenges for speech synthesis Improve overall speech quality Refine ways of organizing and collecting speech databases Improve the quality of the control signal

04/08/04 Sounds