5-Text To Speech (TTS) Speech Synthesis

Slides:



Advertisements
Similar presentations
Normal Aspects of Articulation. Definitions Phonetics Phonology Articulatory phonetics Acoustic phonetics Speech perception Phonemic transcription Phonetic.
Advertisements

1 Analysis of Parameter Importance in Speaker Identity Ricardo de Córdoba, Juana M. Gutiérrez-Arriola Speech Technology Group Departamento de Ingeniería.
1 CS 551/651: Structure of Spoken Language Spectrogram Reading: Stops John-Paul Hosom Fall 2010.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.
S. P. Kishore*, Rohit Kumar** and Rajeev Sangal* * Language Technologies Research Center International Institute of Information Technology Hyderabad **
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
The Human Voice. I. Speech production 1. The vocal organs
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
1st and 2nd Generation Synthesis
1 Interspeech Synthesis of Singing Challenge, Aug 28, 2007 Formant-based Synthesis of Singing Sten Ternström and Johan Sundberg KTH Music Acoustics, Speech.
Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters.
Back-End Synthesis* Julia Hirschberg (*Thanks to Dan, Jim, Richard Sproat, and Erica Cooper for slides)
Hearing & Deafness (5) Timbre, Music & Speech Vocal Tract.
Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.
Chapter three Phonology
Hearing & Deafness (5) Timbre, Music & Speech.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Voice Transformation Project by: Asaf Rubin Michael Katz Under the guidance of: Dr. Izhar Levner.
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
A PRESENTATION BY SHAMALEE DESHPANDE
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Speech & Language Modeling Cindy Burklow & Jay Hatcher CS521 – March 30, 2006.
6-Text To Speech (TTS) Speech Synthesis
04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.
Computer Sound Synthesis 2
Phonetics: the generation of speech Phonemes “The shortest segment of speech that, if changed, would change the meaning of a word.” hog fog log *Phonemes.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Speech Science VII Acoustic Structure of Speech Sounds WS
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
LING 001 Introduction to Linguistics Fall 2010 Sound Structure I: Phonetics Acoustic phonetics Jan. 27.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Introduction to Linguistics Ms. Suha Jawabreh Lecture 9.
CS 551/651: Structure of Spoken Language Lecture 13: Text-to-Speech (TTS) Technology and Automatic Speech Recognition (ASR) John-Paul Hosom Fall 2008.
Speech analysis with Praat Paul Trilsbeek DoBeS training course June 2007.
Segmental encoding of prosodic categories: A perception study through speech synthesis Kyuchul Yoon, Mary Beckman & Chris Brew.
Structure of Spoken Language
Speech Synthesis April 12, 2013 Speech Synthesis: A Basic Overview Speech synthesis is the generation of speech by machine. The reasons for studying.
Stops Stops include / p, b, t, d, k, g/ (and glottal stop)
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
Sound Waveforms Neil E. Cotter Associate Professor (Lecturer) ECE Department University of Utah CONCEPT U AL TOOLS.
Ways to generate computer speech Record a human speaking every sentence HAL will ever speak (not likely) Make a mathematical model of the human vocal.
Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
1 Acoustic Phonetics 3/28/00. 2 Nasal Consonants Produced with nasal radiation of acoustic energy Sound energy is transmitted through the nasal cavity.
Speech Recognition with Matlab ® Neil E. Cotter ECE Department UNIVERSITY OF UTAH
G. Anushiya Rachel Project Officer
The Human Voice. 1. The vocal organs
Text-To-Speech System for English
Automated Detection of Speech Landmarks Using
The Human Voice. 1. The vocal organs
Speech and Language Processing
Speech Generation: From Concept and from Text
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Speech Conductor Team Six (see below)
Review of Catford.
Sound and Matlab® Neil E. Cotter ECE Department
Indian Institute of Technology Bombay
Sound and Matlab® Neil E. Cotter ECE Department
Presentation transcript:

5-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Phone Units Phone Sequence To Speech Speech Naturalness Concatenative Approaches Rule-Based Approaches

Speech Synthesis Concept Text Speech Text Text to Phone Sequence Phone Sequence to Speech Speech Natural Language Processing (NLP) Speech Processing

Phone Units Paragraph ( ) Sentence ( ) Word (Depends on the language. Usually more than 100,000) Syllable Diphone & Triphone Phoneme (Between 10 , 100)

Phone Units (Cont’d) Diphone : We model Transitions between two phonemes . . . . . p1 p2 p3 p4 p5 Diphone Phoneme

Phone Units (Cont’d) In farsi we have 30 Phoneme. so we have 30*30 Diphone Theoretically. Practically the only Diphone that we don’t have in farsi is /zho/ we have 27000 Triphone Theoretically. But practically we have about 15000 Triphone in farsi.

Phone Units (Cont’d) Syllable = Onset (Consonant) + Rhyme Syllable is a set of phonemes that exactly contains one vowel Syllables in Farsi : CV , CVC , CVCC We have about 4000 Syllables in farsi Syllables in English :V, CV , CVC ,CCVC, CCVCC, CCCVC, CCCVCC, . . . Number of Syllables in English is very much

Phone Sequence To Speech Concatenative Approaches : Trade-Off between Naturality And Memory usage and variety of desired functions Rule-Based Approaches : The most important Rule-Based approach is Klatt method

Phone Sequence To Speech (Cont’d) to primitive utterance primitive utterance to Natural Speech Text to Phone Sequence Speech Text NLP Speech Processing

Speech Naturalness Obviation of undesirable noise and distortion and dissociation from speech Prosody generation Speech energy Duration pitch Intonation Stress

Speech Naturalness (Cont’d) Intonation and Stress are very effective in speech naturalness Intonation : Variation of Pitch frequency along speaking Stress : Increasing the pitch frequency in a specific time

Concatenative Approaches In this approaches we store units of natural speech for reconstruction of desired speech We could select the appropriate phone unit for speech synthesis we can store compressed parameters instead of main waveform

Concatenative Approaches (Cont’d) Benefits of storing compressed parameters instead of main waveform Less memory use General state instead of a specific stored utterance Generating prosody easily

Concatenative Approaches (Cont’d) Phone Unit Type of Storing Paragraph Sentence Word Syllable Diphone Phoneme Main Waveform Coded/Main Waveform Coded Waveform

Concatenative Approaches (Cont’d) Pitch Synchronous Overlap-Add-Method (PSOLA) is a famous method in phoneme transmit smoothing Overlap-Add-Method is a standard DSP method PSOLA is a base action for Voice Conversion. In this method in analysis stage we select frames that are synchronous by pitch markers.

Rule-Based Approach Stages Determine the speech model and model parameters Determine type of phone units Determine some parameter amount for each phone unit Substitute sequence of phone units by its equivalent parameter sequence Put parameter sequence in speech model

KLATT 80 Model

KLATT 88 Model

THE KLSYN88 CASCADE PARALLEL FORMANT SYNTHESIZER FNP FNZ FTP FTZ F1 B1 BNP BNZ BTP BTZ DF1 DB1 F2 B2 F3 B3 F4 B4 F5 B5 GLOTTAL SOUND SOURCES NASAL POLE ZERO PAIR TRACHEAL POLE ZERO PAIR FIRST FORMANT RESONATOR SECOND FORMANT RESONATOR THIRTH FORMANT RESONATOR FOURTH FORMANT RESONATOR FIFTH FORMANT RESONATOR FILTERED IMPULSE TRAIN TL CASCADE VOCAL TRACT MODEL LARYNGEAL SOUND SOURCES F0 AV OO FL DI SPECTRAL TILT LOW-PAS RESONANTOR KL GLOTT 88 model (default) SS CP + NASAL FORMANT RESONATOR AH ANV ASPIRATION NOISE GENERATOR SO MODIFIED LF MODEL FIRST FORMANT RESONATOR A1V SECOND FORMANT RESONATOR B2F + - A2F FIRST DIFFERENCE PREEMPHASIS SECOND FORMANT RESONATOR A2V + THIRD FORMANT RESONATOR B3F A3F THIRTH FORMANT RESONATOR AF A3V FRICATION NOISE GENERATOR FOURTH FORMANT RESONATOR B4F A4F FOURTH FORMANT RESONATOR A4V FIFTH FORMANT RESONATOR B5F + - A5F TRACHEAL FORMANT RESONATOR ATV B6F F6 SIXTH FORMANT RESONATOR A6F AB PARALLEL VOCAL TRACT MODEL LYRYNGEAL SOUND SOURCES (NORMALLY NOT USED) BYPASS PATH PARALLEL VOCAL TRACT MODEL FRICATION SOUND SOURCES

Three Voicing Source Model In KLATT 88 The old KLSYN impulsive source The KLGLOTT88 model The modified LF model