Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen

Slides:



Advertisements
Similar presentations
By: Hossein and Hadi Shayesteh Supervisor: Mr J.Connan.
Advertisements

Lets Pronounce English
© Aastra 2012 CMG 7.5 Speech Attendant Sales Presentation.
Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Applying the Pronunciation Lexicon Specification to ASR & TTS 1 Patrizio Bergallo 1 Monday, August 20, 2007 SpeechTEK ASTS - Advances in Text-to-Speech.
The Case for Embedded Speech Recognition Jordan Cohen CTO Voice Signal Technologies VOX 2002.
Assistive Technology Training Online (ATTO) University at Buffalo – The State University of New York USDE# H324M Write:Outloud.
SSML extensions for multi-language usage Davide Bonardo W3C Workshop on Internationalizing SSML Crete, May 2006.
Estonia: job advertisements Tallinn School of Economics.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld.
DTV accessibility Digital Europe 28 October 2010.
Use voice tag in mobile By : Sara Ayad Aldehany To : Mariam Almie.
CONFIDENTIAL | © Nuance Communications, Inc. All rights reserved. ENTERPRISE SOLUTIONS 1 Parteek Singh.
Position Paper for W3C Workshop on Internationalizing SSML The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML Myoung-Wan.
Assistive Technology By: Roxanne Majeski, Oscar Guerin, Tasha Reaves, Elias Luna.
Assistive Technology Ability to be free. Quick Facts  Assistive technology is technology used by individuals with disabilities in order to perform functions.
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
1 SSML Extensions for TTS in Indian Languages II workshop on Internationalizing SSML May 2006, Greece Nixon Patel and Kishore Prahallad Bhrigus.
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
With this voice newspaper you can listen to news in Hindi and English. It helps to save your time by Listening while cooking or driving or walking or.
Speech & Language Modeling Cindy Burklow & Jay Hatcher CS521 – March 30, 2006.
Company Confidential © 2008 Nokia V1-Filename.ppt / YYYY-MM-DD / Initials 1 Mobile emulator technology: S60 Platform Software Development Kit for Symbian.
[1] Processing the Prosody of Oral Presentations Rebecca Hincks KTH, The Royal Institute of Technology Department of Speech, Music and Hearing The Unit.
How IPA is Used in SSML and PLS Paolo Baggia, Loquendo Wed. August 9 th, 2006.
TEXT-TO-SPEECH INTRODUCTION. What is text-to-speech? Text-to-speech (TTS) is a process where digital text is converted in to spoken words. –“Talking text”
W3C Workshop, Beijing, 2nd of November 2005 An extension to the SSML for diacritics auto-completion R&D Centre Vocal Services Section.
Du “Text-to-Speech” au multilinguïsme Isabel Meurisse Babel Technologies
Conversational Applications Workshop Introduction Jim Larson.
COMM1PCOMM1P Alan Woolrych Accessibility 9 COMM1P9COMM1P9 SCET MSc EC/ECA © Alan Woolrych 2001 Introduction Accessibility “Making Content Available to.
PrepTalk a Preprocessor for Talking book production Ted van der Togt, Dedicon, Amsterdam.
Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.
CapturaTalk4Android Demonstration Abi James
1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim
Group Members: Group Members:.  Introduction  Current Scenario  Proposed Solution  Block Diagram  Technical Implementation  Hardware & Software.
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
PHILIPS SPEECH PROCESSING Voic Association Vienna, Reimund Schmald Regional Sales Director GSM
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
By Wesley Lefsrud #  It is a digital resource to help those who struggle with ◦ Reading ◦ Writing ◦ Hearing ◦ Dyslexia ◦ Dysgraphia ◦ Studying.
Reading Aid for Visually Impaired Veera Raghavendra, Anand Arokia Raj, Alan W Black, Kishore Prahallad, Rajeev Sangal Language Technologies Research Center,
By David J. Horgan  Applications ◦ Software that creates spoken audio files from text. ◦ Useful for editing and proofing papers ◦ Creating sound files.
A Multimedia English Learning System Using HMMs to Improve Phonemic Awareness for English Learning Yen-Shou Lai, Hung-Hsu Tsai and Pao-Ta Yu Chun-Yu Chen.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
Stanford University, Fall 2006 Persuasive Technology & Ribbit Murad, Brian, Dahwun, Yeong Slide #1 Ribbit A conceptual.
PLS Considerations on using PLS for Slovenian Pronunciation Lexicon Construction Jerneja Žganec Gros Alpineon d.o.o., Ljubljana, Slovenia
WS-I Submission W3C XML Schema User Experiences Workshop June 2005 Redwood Shores, CA, USA Erik Johnson, Epicor Software.
Reading Comprehension Exercises Online: The Effects of Feedback, Proficiency and Interaction Betty, Frances, Gordon & Judith.
XML & varieties, e.g. VoiceXML By: Shawn Ramdass, Saji Abraham & Billy Santamorena.
Collaborator Revolutionizing the way you communicate and understand
NaviSpeech Presentation 2008 by Akos Viktoriusz. Concept description: NaviSpeech: a GPS based navigation software for blind and visually impaired users.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Luis Avila Tics. We have to recognize all the operating systems we have nowadays in the different smartphones Blackberry: Bb OS Iphone: iOS Nokia: symbian.
1.What is American English for “I think” ? 2. What did Noah Webster do in order to make American English different from British English?
Virtual Tutor Application v1.0 Ruth Agada Dr. Jie Yan Bowie State University Computer Science Department.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
SPEAKING PAL Speaking Pal - Application. BACKGROUND Today every person is recognized with the device or gadget,he carries; which is technically advanced.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
PLS for SSML Paolo Baggia Loquendo Workshop II on Internationalizing SSML.
 Participants will leave knowing how to determine if Kurzweil is appropriate for your students.  Participants will begin the discussion of how to.
Presented By Sharmin Sirajudeen S7 CS Reg No :
Outline  What is MySmartEye ?  Motivation, objectives.  Implementation.  Programming techniques.  Future Work.  Demo.
How can speech technology be used to help people with disabilities?
Alexa Programming.
WBLT Information The primary audience for this WBLT
ANJANA RAJ English Optional
Organizing Your Speech
VoiceXML An investigation Author: Mya Anderson
Presentation transcript:

Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen Nokia Technology Platforms / Audio Applications

Public 2 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Outline Introduction SSML extensions Conclusion

Public 3 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Background Audio Applications, Nokia Specific Symbian Software: Working on ASR and TTS Also implementing other audio applications & features for Nokia phones Group of ~20 people

Public 4 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Text-to-Speech in Mobile Phones Currently Nokia Series 60 phones support over 40 languages and have some voice user interface built-in. An example is the speaker independent name dialing (SIND) system In SIND user says a command and hears feedback using a Text-to-Speech system SIND has been (both the recognizer and synthesizer) internationalized for all Series 60 languages Benefit is that the user is able to use one’s own mother tongue It is important to be able to provide all customers the same features regardless of their native tongue Newly launched Nokia 5500 phone (and E50 phone) also has a more advanced (unit selection system) Text-to-Speech technology built in Can be used for reading text-messages and give more complicated feedback to the user than the formant synthesizer used in SIND

Public 5 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Text-to-Speech in Mobile Phones Internationalization (or localization) of a TTS systems is a time consuming process Development requires knowledge about human speech production system, language being developed, good software skills etc. There exist between 3000 to 8000 languages in the world and it’s impossible to built a native TTS system for all of them If SSML can be used to provide support how foreign, “unsupported” words should be pronounced and handled, it would be beneficial for the applications using TTS

Public 6 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Use of SSML for Multilingual Text-to-Speech Extension to “language element”: Element could also contain a list of languages instead of one language These languages could be considered as fallback languages in case the first choice is not supported by the system For example: Italian synthesizer can pronounce Finnish better than English system or Finnish synthesizer can be used to synthesize Estonian quite well

Public 7 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Use of SSML for Multilingual Text-to-Speech Element read: This element could be used to control the selection of the pre-processor in a TTS system The idea is that if there is e.g. a Finnish word in the middle of English text, it would be processed using a Finnish pre-processor to get the correct pronunciation The actual TTS voice would remain the same Element read would separate the spoken language and read language The benefit would be that the voice of the system would remain the same and the user of this new tag needs no information about how the word should be pronounced

Public 8 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Conclusions Read element to separate spoken and written language Don’t change the voice (speaker) List of preferred languages (fallback languages) Lexicon element (PLS by Paolo Baggia) for example containing rules for abbreviations (SMS)