© 2013 by Larson Technical Services

Slides:



Advertisements
Similar presentations
By: Hossein and Hadi Shayesteh Supervisor: Mr J.Connan.
Advertisements

SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg
Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Communicating with Robots using Speech: The Robot Talks (Speech Synthesis) Stephen Cox Chris Watkins Ibrahim Almajai.
Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Forschungszentrum Telekommunikation Wien [Telecommunications Research Center Vienna] Interfaces between Speech and Non-Speech Audio Technology Michael.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
CS 4705 Lecture 4 CS4705 Sound Systems and Text-to- Speech.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Auditory User Interfaces
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Digital signal Processing Digital signal Processing ECI Semester /2004 Telecommunication and Internet Engineering, School of Engineering, South.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
A Text-to-Speech Synthesis System
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
Position Paper for W3C Workshop on Internationalizing SSML The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML Myoung-Wan.
Speech Synthesis Markup Language -----Aim at Extension Dr. Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese.
Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,
W3C AC/WWW10 Hong Kong May /41 From Voice Browsers to Multimodal Systems Dave Raggett W3C Lead for Voice/Multimodal W3C & Openwave
SCRIPT WRITING TIPS TO COMPOSE AN EFFECTIVE AUDIO NARRATIVE.
Arabic TTS (status & problems) O. Al Dakkak & N. Ghneim.
Systems Analysis and Design in a Changing World, 6th Edition
Clinical Applications of Speech Technology Phil Green Speech and Hearing Research Group Dept of Computer Science University of Sheffield
Speech synthesis Recording and sampling Speech recognition Apr. 5
Toshiba (China) R&D Center LOU Xiaoyan, LI Jian Research and Development Center, Toshiba China Suggestions on Tone and Word Boundary of Mandarin for SSML.
Speech & Language Modeling Cindy Burklow & Jay Hatcher CS521 – March 30, 2006.
04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Conversational Applications Workshop Introduction Jim Larson.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
Introduction to VoiceXML 2.0 Rob Marchand Director of Product Management VoiceGenie Technologies Inc.
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
Project MELT Midterm Report ZACHARY LYTLE – TAM AYERS – MICHAEL GOHEEN 1.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
Virach Sornlertlamvanich Information R&D Division (iTech) National Electronics and Computer Technology Center (NECTEC) THAILAND 19 January 2001 Symposium.
Acknowledgements Prof. Mctear, Natural Language Processing, University of Ulster.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
(c) 2007 Larson Technical Services1 VoiceXML Overview James A. Larson Intel Corporation
Introduction to Computational Linguistics
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Letter to Phoneme Alignment Using Graphical Models N. Bolandzadeh, R. Rabbany Dept of Computing Science University of Alberta 1 1.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
© 2013 by Larson Technical Services
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Getting Started 1) Open Read & Write Gold 2) Open Word 3) Click on textHELP drop down arrow 4) Choose General Options.
Getting Started 1) Open Read & Write Gold 2) Open Word 3) Click on textHELP drop down arrow 4) Choose General Options.
VoiceXML – Speech Recognition Yousef Rabah. VoiceXML Markup Language Dialogs Dependencies Standalone Vs. Hosted Speaker Dependent Vs. Speaker Independent.
Natural Language and Speech (parts of Chapters 8 & 9)
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Natural Language Processing and Speech Enabled Applications
Text-To-Speech System for English
James A. Larson Intel Corporation
Using Speech Recognition for Input: A Powerful and Readily Available Tool Dr. Donna Olsen Instructional Technologist Central Wyoming College
Speech Generation: From Concept and from Text
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Voice To Text Conversion
Indian Institute of Technology Bombay
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

© 2013 by Larson Technical Services Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing © 2013 by Larson Technical Services

Speech Synthesis (Text-To-Speech, TTS) Structure Rules Structure Analysis Abbreviation and Acronym Database Text Normalization Pronunciation Lexicon Text-to-phoneme Conversion Prosody Rules Prosody Analysis Phoneme-to-sound Database Waveform Production © 2013 by Larson Technical Services

Concatenated vs. Parameter-based Speech Synthesis Isolate Phonemes “The dog barked” “red car” Concatenate er ed d k ah er dh eh d ao g b ah er k eh d “red car” Generate Speech er ed d k ah er Voice Parameters © 2013 by Larson Technical Services

© 2013 by Larson Technical Services Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: p, s Non-markup behavior: infer structure by automated text analysis Critic: I don’t want to use to use TTS, it’s too difficult to understand Response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults © 2013 by Larson Technical Services

Before and after Structure Analysis Before structure analysis Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass. After structure analysis <p> <s> Dr. Smith lives at 214 Elm Dr. </s> He weights 214 lb. <s> He plays bass guitar. </s> He also likes to fish; last week he caught a 19 lb. bass. </p> © 2013 by Larson Technical Services

© 2013 by Larson Technical Services Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: p, s Non-markup behavior: infer structure by automated text analysis Critic: I don’t want to use to use TTS, it’s too difficult to understand Response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs © 2013 by Larson Technical Services

After Text Normalization <p> <s> <sub alias= "doctor">Dr. </sub> Smith lives at 214 Elm <sub alias = "drive">Dr. </sub> </s> He weights 214<sub alias= "pounds"> lb. </sub> He plays bass guitar. He also likes to fish; last week he caught a 19 <sub alias= "pound"> lb. </sub> bass. </p> © 2013 by Larson Technical Services

© 2013 by Larson Technical Services Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: p, s Non-markup behavior: infer structure by automated text analysis Critic: I don’t want to use to use TTS, it’s too difficult to understand Response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs © 2013 by Larson Technical Services

After Text-to-Phoneme Conversion <p> <s> <sub alias = "doctor">Dr.</sub> Smith lives at <say-as interpret-as = “address"> 214 </sayas> Elm <sub alias = "drive">Dr. </sub> </s> He weighs <sayas interpret-as = "number">214 </sayas> <sub alias= "pounds"> lb.</sub> He plays <phoneme alphabet = "ipa" ph="beɪs">bass</phoneme> guitar. He also likes to fish; last week he caught a <sayas interpret-as= "number">19 </sayas> <sub alias= "pound"> lb. </sub> <phoneme alphabet = "ipa" ph="bæs">bass</phoneme>. </p> © 2013 by Larson Technical Services

Pronunciation Specification Within the text replace "creek" by “krik” With the phoneme commands <phoneme alphabet = "ipa" ph="krik"> creek </phoneme> In the pronunciation lexicon <lexeme> <grapheme>creek</grapheme> <phoneme>"krik" </phoneme> </lexeme> Designer has preference for how words should be spoken, e.g., creek, aluminum Phonetic spellings sometimes don’t have the desired effect © 2013 by Larson Technical Services

© 2013 by Larson Technical Services Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: p, s Non-markup behavior: infer structure by automated text analysis Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Critic: I don’t want to use to use TTS, it’s too difficult to understand Response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs © 2013 by Larson Technical Services

Prosody Analysis (Initial text) <prompt> Environmental control menu. Do you want to adjust the lighting or temperature? </prompt> © 2013 by Larson Technical Services

© 2013 by Larson Technical Services Prosody Analysis <prompt> Environmental control menu <break/> <emphasis level = "reduced" > do you want to adjust the </emphasis> <emphasis level = "strong"> lighting </emphasis> <break/> or <emphasis level = "strong"> temperature? </emphasis> </prompt> © 2013 by Larson Technical Services

© 2013 by Larson Technical Services Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: voice, audio* Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: paragraph, sentence Non-markup behavior: infer structure by automated text analysis *audio icons, branding, advertising Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Critic: I don’t want to use to use TTS, it’s too difficult to understand response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs © 2013 by Larson Technical Services

Prerecorded messages vs. Speech Synthesis Natural sounding Easy to understand Static data Tedious to record and tag Prerecorded messages Artificial sounding May be difficult to understand Computer-generated data Easy to specify Speech Synthesis (TTS)