6/14/20151 Speech Synthesis: Then and Now Julia Hirschberg CS 4706.

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

Speech Synthesis December 4, 2014 Gentle Reminders Final exam: Friday, December 12 th, 3:30 – 5:30 pm In this room! Final exam review: Wednesday, December.
PHONETICS AND PHONOLOGY
General Problems  Foreign language speakers of a target language cause a great difficulty to native speakers because the sounds they produce seems very.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
Introduction to Acoustics Words contain sequences of sounds Each sound (phone) is produced by sending signals from the brain to the vocal articulators.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
EE 225D, Section I: Broad background Synthesis/vocoding history (chaps 2&3) Recognition history (chap 4) Machine recognition basics (chap 5) Human recognition.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
CS 4705 Lecture 1 CS4705 Introduction to Natural Language Processing.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters.
Synthetic Audio A Brief Historical Introduction Generating sounds Synthesis can be “additive” or “subtractive” Additive means combining components (e.g.,
Looking at Spectrogram in Praat cs4706, Jan 30 Fadi Biadsy.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Back-End Synthesis* Julia Hirschberg (*Thanks to Dan, Jim, Richard Sproat, and Erica Cooper for slides)
6/28/20151 Predicting Phrasing and Accent Julia Hirschberg.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Digital signal Processing Digital signal Processing ECI Semester /2004 Telecommunication and Internet Engineering, School of Engineering, South.
Articulatory Synthesis of Singing Peter Birkholz Institute for Computer Science, University of Rostock Singing Synthesis Challenge 2007 at the Interspeech‘07,
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Arabic TTS (status & problems) O. Al Dakkak & N. Ghneim.
Introduction to Automatic Speech Recognition
Source/Filter Theory and Vowels February 4, 2010.
Speech synthesis Recording and sampling Speech recognition Apr. 5
Speech & Language Modeling Cindy Burklow & Jay Hatcher CS521 – March 30, 2006.
04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.
Speech & Language Development 1 Normal Development of Speech & Language Language...“Standardized set of symbols and the knowledge about how to combine.
Resonance, Revisited March 4, 2013 Leading Off… Project report #3 is due! Course Project #4 guidelines to hand out. Today: Resonance Before we get into.
Vowels, part 4 March 19, 2014 Just So You Know Today: Source-Filter Theory For Friday: vowel transcription! Turkish, British English and New Zealand.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Prepared by: Waleed Mohamed Azmy Under Supervision:
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Kishore Prahallad IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
Speech analysis with Praat Paul Trilsbeek DoBeS training course June 2007.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
II. LANGUAGE AND COMMUNICATION DOMAIN I can answer questions and talk with my teacher and friends. I can follow directions. Listening Comprehension Skill.
1 Components of Spoken Dialogue Systems Julia Hirschberg LSA 353.
Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Resonance October 23, 2014 Leading Off… Don’t forget: Korean stops homework is due on Tuesday! Also new: mystery spectrograms! Today: Resonance Before.
Speech Recognition MIT SMA 5508 Spring 2004 Larry Rudolph (MIT)
Speech Synthesis April 12, 2013 Speech Synthesis: A Basic Overview Speech synthesis is the generation of speech by machine. The reasons for studying.
Introduction to Computational Linguistics
Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003.
© 2013 by Larson Technical Services
12/8/20151 Introduction to the Course and to Speech Synthesis Julia Hirschberg.
Introduction to Digital Speech Processing Presented by Dr. Allam Mousa 1 An Najah National University SP_1_intro.
Ways to generate computer speech Record a human speaking every sentence HAL will ever speak (not likely) Make a mathematical model of the human vocal.
Lecture 1 Phonetics – the study of speech sounds
Spectrogram of “Being in a crowd is tiring.” Time Frequency.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
How can speech technology be used to help people with disabilities?
Mr. Darko Pekar, Speech Morphing Inc.
Phonetics Lauren Dobbs.
Why Study Spoken Language?
Meaningful Intonational Variation
Speech Generation: From Concept and from Text
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Why Study Spoken Language?
Spoken Language Processing:Summing Up
Artificial Intelligence 2004 Speech & Natural Language Processing
Looking at Spectrogram in Praat cs4706, Jan 30
Presentation transcript:

6/14/20151 Speech Synthesis: Then and Now Julia Hirschberg CS 4706

6/14/20152 Today Then: Early speech synthesizers Now: Overview of Modern TTS Systems Think about: how do we evaluate a synthesizer

6/14/20153 The First ‘Speaking Machine’ Wolfgang von Kempelen, Mechanismus der menschlichen Sprache nebst Beschreibung einer sprechenden Maschine, 1791 (in Deutsches Museum still and playable) First to produce whole words, phrases – in many languages

6/14/20154 Joseph Faber’s Euphonia, 1846

6/14/20155 Constructed 1835 w/pedal and keyboard control –Whispered and ordinary speech –Model of tongue, pharyngeal cavity with manipulable shape –Singing too: “God Save the Queen” Riesz’s 1937 synthesizer with almost natural vocal tract shape Forerunners of Modern Articulatory Synthesis: George Rosen’s DAVO synthesizer (1958) at MIT

6/14/20156

7 World’s Fair in NY, 1939 Requires much training to ‘play’ Purpose: coding/compression –Reduce bandwidth needed to transmit speech, so many phone calls can be sent over single line

6/14/20158

9

10 Answers: –These days a chicken leg is a rare dish. –It’s easy to tell the depth of a well. –Four hours of steady work faced us. ‘Automatic’ synthesis from spectrogram – but can also use hand-painted spectrograms as input Purpose: understand perceptual effect of spectral details

6/14/ Formant/Resonance/Acoustic Synthesis Parametric or resonance synthesis –Specify minimal parameters, e.g. f0 and first 3 formants –Pass electronic source signal thru filter Harmonic tone for voiced sounds Aperiodic noise for unvoiced Filter simulates the different resonances of the vocal tract E.g. –Walter Lawrence’s Parametric Artificial Talker (1953) for vowels and consonants –Gunnar Fant’s Orator Verbis Electris (1953) for vowels –Formant synthesis download (M$demo)Formant synthesis download M$demo

6/14/ Synthesis by Computer Beginnings ~1960; dominant from 1970—

6/14/ Concatenative Synthesis Most common type today First practical application in 1936: British Phone company’s Talking Clock –Optical storage for words, part-words, phrases –Concatenated to tell time E.g. And a ‘similar’ example from Radio Free Vestibule (1994) Bell Labs TTS (1977) (1985)

6/14/ Variants of Concatenative Synthesis Inventory units –Diphone synthesis (e.g. Festival) –Microsegment synthesis –“Unit Selection” – large, variable units Issues –How well do units fit together? –What is the perceived acoustic quality of the concatenated units? –Is post-processing on the output possible, to improve quality?

6/14/ Overview: Synthesizer I/O Front end: From input to control parameters –Acoustic/phonetic representations, naturally occurring text, constrained mark-up language, semantic/conceptual representations Back end: From control parameters to waveform –Articulatory, formant/acoustic, concatenative, (diphone, unit-selection/corpus, HMM) synthesis

TTS Production Levels Knowledge World Knowledge Syntax, semantics, lexicon Phonetics/phonology Acoustics/signal processing Task Text Normalization Pronunciation, intonation assignment Duration, f0, durations Waveform production 6/14/201516

6/14/ Text Normalization Issues Reading is what W. hates most. Reading is what Wilde hated most. The NAACP just elected a new president. In 1996 she sold 2010 shares and deposited $42 in her 401(k). The duck dove supply. Homographs, numbers, abbreviations

Pronunciation Issues Rules for disambiguation in context: bass Lexicon: comb, tomb, Punxsutawney Phil –Letter-to-Sound Rules Hand built Learned from data (pronunciation dictionary) Hard to get good accuracy and coverage – many exceptions –Dictionary of pronunciations More accurate New (Out-of-Vocabulary) words a problem 6/14/201518

6/14/ Intonation Assignment Issues: Phrasing Traditional: hand-built rules –Use punctuation: , New York, NY –Context/function word: no breaks after function word: He went to dinner. He came to and went to dinner. –Syntax: She favors the nuts and bolts approach. She went home and Dave stayed. Current: machine learning on large labeled corpus

6/14/ Intonation Assignment Issues: Accent Hand-built rules –Function/content distinction He went out the back door/He threw out the trash –Complex nominals: Main Street/Park Avenue city hall parking lot (stress shift) Statistical procedures trained on large corpora –Need lots of data –Why learn what you already know?

6/14/ Intonation Assignment Issues: Contours Simple rules –‘.’ = declarative contour –‘?’ = yes-no-question contour unless wh-word present at/near front of sentence Well then, how did he do it? And what do you know? Pretty monotonous in long stretches of speech Problem: no one knows how to assign other contours from text

6/14/ Phonological Specification Issues Task is to produce a phonological representation from phones and intonational assignment Align phones and f0 contour Specify durations and intensity Select/create appropriate acoustic realization from this specification: –Acoustic transformation –Concatenation: diphone, unit selection –HMM

Not Quite There Festival concatenative: Acuvoice concatenative:Acuvoice HMM synthesis (Rob Donovan): Rhetorical unit selection –(acquired by Nuance) AT&T Labs Naturally SpeakingNaturally Speaking 6/14/201523

6/14/ Next Class Project Phase I assigned: building a TTS System Introduction to Festival TTS