Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005.

Slides:



Advertisements
Similar presentations
Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.
Advertisements

Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Applying the Pronunciation Lexicon Specification to ASR & TTS 1 Patrizio Bergallo 1 Monday, August 20, 2007 SpeechTEK ASTS - Advances in Text-to-Speech.
1 SSML The Internationalization of the W3C Speech Synthesis Markup Language SpeechTek 2007 – C102 – Daniel C. Burnett.
Embedding Knowledge in HTML Some content from a presentations by Ivan Herman of the W3c.
SSML extensions for multi-language usage Davide Bonardo W3C Workshop on Internationalizing SSML Crete, May 2006.
Information Retrieval in Practice
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
1 XML: Document Type Definitions 2 Road Map  Introduction to DTDs  What’s a DTD?  Why are they important?  What will we cover?  Our First DTD 
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Upgrading to XHTML DECO 3001 Tutorial 1 – Part 1 Presented by Ji Soo Yoon 19 February 2004 Slides adopted from
1 The World Wide Web. 2  Web Fundamentals  Pages are defined by the Hypertext Markup Language (HTML) and contain text, graphics, audio, video and software.
1 HTML’s Transition to XHTML. 2 XHTML is the next evolution of HTML Extensible HTML eXtensible based on XML (extensible markup language) XML like HTML.
Overview of Search Engines
Position Paper for W3C Workshop on Internationalizing SSML The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML Myoung-Wan.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
CPSC 203 Introduction to Computers Lab 39, 40 By Jie (Jeff) Gao.
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
Creating a Simple Page: HTML Overview
XP Tutorial 7New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with Cascading Style Sheets Creating a Style for Online Scrapbooks.
JEITA Speech Group1 Issues of SSML in Japanese Wataru IMATAKE (ANIMO LIMITED) Makoto AKABANE (Sony Computer Entertainment Inc.) Kazuyo TANAKA (Tsukuba.
Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen
How IPA is Used in SSML and PLS Paolo Baggia, Loquendo Wed. August 9 th, 2006.
W3C Workshop, Beijing, 2nd of November 2005 An extension to the SSML for diacritics auto-completion R&D Centre Vocal Services Section.
Conversational Applications Workshop Introduction Jim Larson.
WORKING WITH XSLT AND XPATH
PrepTalk a Preprocessor for Talking book production Ted van der Togt, Dedicon, Amsterdam.
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
Learning Web Design: Chapter 4. HTML  Hypertext Markup Language (HTML)  Uses tags to tell the browser the start and end of a certain kind of formatting.
Section 4.1 Format HTML tags Identify HTML guidelines Section 4.2 Organize Web site files and folder Use a text editor Use HTML tags and attributes Create.
SSML 1.1: The Internationalization of SSML Daniel C. Burnett August 9, 2006.
XML About XML Things to be known Related Technologies XML DOC Structure Exploring XML.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
An Overview 1 Pamela Harrod, DMS 546/446 Presentation, March 17, 2008.
UKOLN is supported by: Approaches to Metadata Quality Marieke Guy QA Focus A centre of expertise in digital information management
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
XML eXtensible Markup Language. Topics  What is XML  An XML example  Why is XML important  XML introduction  XML applications  XML support CSEB.
The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
CP3024 Lecture 9 XML: Extensible Markup Language.
Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)
PLS Considerations on using PLS for Slovenian Pronunciation Lexicon Construction Jerneja Žganec Gros Alpineon d.o.o., Ljubljana, Slovenia
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
PLS Considerations on using PLS for Slovenian Pronunciation Lexicon Construction Jerneja Žganec Gros Alpineon d.o.o., Ljubljana, Slovenia
Reading Flash. Training target: Read the following reading materials and use the reading skills mentioned in the passages above. You may also choose some.
Unit #7 Charts Questions? Comments?. MS PPT 2007: Presentations Made Easy; Planning and Preparing PowerPoint allows you to create a professional presentation.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
Games: XML Presented by: Idham bin Mat Desa Mohd Sharizal bin Hamzah Mohd Radzuan bin Mohd Shaari Shukor bin Nordin.
Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley What did we learn so far? 1.Computer hardware and software 2.Computer experience.
Chapter 1 Introduction to HTML, XHTML, and CSS HTML5 & CSS 7 th Edition.
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
XML CORE CSC1310 Fall XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe.
Speech Recognition Created By : Kanjariya Hardik G.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
PLS for SSML Paolo Baggia Loquendo Workshop II on Internationalizing SSML.
Presented By Sharmin Sirajudeen S7 CS Reg No :
HTML Structure & syntax
Information Retrieval in Practice
Working with Cascading Style Sheets
Section 4.1 Section 4.2 Format HTML tags Identify HTML guidelines
Chapter 11 Designing Effective Output
PHONETICS.
Specifying, Compiling, and Testing Grammars
XML.
Introduction to World Wide Web
CSE591: Data Mining by H. Liu
English Pronunciation
Oxford Language Dictionaries Online
Presentation transcript:

Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

2 W3C SSML workshop 2-3 Nov 05 - Beijing Overview Introduction to Pronunciation Lexicon Pronunciation Alphabets The PLS language Issues for the workshop

3 W3C SSML workshop 2-3 Nov 05 - Beijing Introduction to Pronunciation Lexicon Specification The PLS spec is about “Pronunciation Lexicon”: –How to pronounce words and phrases –How to deal with the variability of pronunciations by country, region, person, etc. –How to spell abbreviations and acronyms Two main uses: –Speech Synthesis (SSML documents) –Speech Recognition (SRGS grammars) –Other uses are possible (embedded or referenced in other mark-up)

4 W3C SSML workshop 2-3 Nov 05 - Beijing The TTS perspective A TTS engine’s job is to transform an “input text” into speech, this involves a lot of processing, including: –Text normalization –Word pronunciation (lexical stress, phonetic transcription) –Sentence structure (intonation, rhythm) –Sentence level modification in phonetic transcription (co-articulation) –Computation of prosodic parameters –Generation of the acoustic signal SSML documents enable TTS enhancement, acting on several levels of processing through SSML markup elements PLS improves SSML on text normalization and phonetic transcription

5 W3C SSML workshop 2-3 Nov 05 - Beijing An SSML example document This is a simple SSML document: This is an enhancement of the same example: The title of the movie is: "La vita è bella" (Life is beautiful), which is directed by Roberto Benigni. The title of the movie is: La vita è bella (Life is beautiful), which is directed by Roberto Benigni

6 W3C SSML workshop 2-3 Nov 05 - Beijing An SSML example with PLS This is a simple SSML document that references an external Pronunciation Lexicon: PLS factorizes all the changes in an external document TTS engine loads the PLS document(s) and applies it(them) transparently to the SSML document An application may define contextual PLS documents to be used in different points of the interaction The title of the movie is: "La vita è bella" (Life is beautiful), which is directed by Roberto Benigni.

7 W3C SSML workshop 2-3 Nov 05 - Beijing The ASR perspective An ASR engine’s job is to transform an audio signal into a textual or semantic representation of the meaning of the sentence Using SRGS grammars constrains the sentences to be recognized and improves ASR performance PLS improves ASR performance by allowing multiple pronunciations of words, phrases, abbreviations, text normalization

8 W3C SSML workshop 2-3 Nov 05 - Beijing An SRGS example grammar This is a very simple SRGS grammar: The grammar recognizes sentences like: –“Boston Massachusetts” or “Miami Florida” but also: –“Boston Florida” or “Fargo Massachusetts” <grammar xmlns=" xml:lang="en-US" version="1.0" root="city_state" mode="voice"> Boston Miami Fargo Florida North Dakota Massachusetts

9 W3C SSML workshop 2-3 Nov 05 - Beijing An SRGS example with PLS This is a simple SRGS grammar that references an external Pronunciation Lexicon: The grammar allows different pronunciations of words to accommodate many different speakers <grammar xmlns=" xml:lang="en-US" version="1.0" root="city_state" mode="voice"> Boston Miami Fargo Florida North Dakota Massachusetts

10 W3C SSML workshop 2-3 Nov 05 - Beijing PLS allows you… to create Pronunciation Lexicons to be used by both ASR and TTS to take into account different usages: –For TTS: to improve reading proper names –For ASR: to give multiple pronunciations –For TTS/ASR: to expand abbreviations and acronyms to exchange Pronunciation Lexicons between different applications (interoperability) to use contextual Pronunciation Lexicons in different points of the application The PLS is a W3C standard language! PLS saves application developers time/money for creating good speech applications!

11 W3C SSML workshop 2-3 Nov 05 - Beijing Phonetic Alphabets To describe the pronunciation of a word/phrase, you need a phonetic alphabet An alphabet contains symbols to represent speech sounds, just like in a dictionary, e.g. Cracked /krakt/ adj. 1 having cracks. 2 (predic.) slang crazy The PLS spec suggests to use either: –a standard pronunciation alphabet, such as IPA (defined by the International Phonetic Association, see: –other alphabets: SAMPA which is an ASCII-way of encoding IPA and X-SAMPA Pying, JEITA, etc

12 W3C SSML workshop 2-3 Nov 05 - Beijing IPA – Chart IPA was founded in 1886 It is the major international association of phoneticians The IPA alphabet provides symbols making possible the phonemic transcription of all known languages IPA characters can be encoded in Unicode by supplementing ASCII with characters from other ranges, particularly: –IPA extensions (0250–02AF) –Latin Extended-A ( F) See the detailed:

13 W3C SSML workshop 2-3 Nov 05 - Beijing SAMPA – SAM Phonetic Alphabet Developed for phonetic transcription in a EU founded project called Speech Assessment Methods (SAM) It is ASCII based (easy to write). It is an “ASCII-ization” of IPA Recently, Prof. John C. Wells proposed an alphabet called “X-SAMPA”, which encodes all the IPA symbols in ASCII format A few examples: –“thin”IPA: / θɪn /X-SAMPA: / TIn / –“thing”IPA: / θɪŋ /X-SAMPA: / TIN / –“flabbergasted”IPA: / ’fl æ bəgɑːstɪ d/X-SAMPA: / / – “Weltanshauung”IPA: /’ vɛltʔan,ʃaʊʊŋ /X-SAMPA: / ”vElt?an%SaUUN / – en-GB :“vice versa”IPA: / va ɪ sə ’ v ɜ ːsə / X-SAMPA: / / it-IT :“vice versa” IPA: /’ viʧe ’ vɛrsa / X-SAMPA: / ”vitSe ”vErsa /

14 W3C SSML workshop 2-3 Nov 05 - Beijing Phonetic Alphabets – Issues How to write pronunciation in a reliable and easy way? Problems with fonts, word processors, browsers There are very few tools to help with writing pronunciation and to let you listen to what you have written The standardization process may push the creation of tools and the improvement of the coverage by word processors. Has IPA any uses for Asian languages? Are there standard phonetic alphabets for Asian languages? Such as pinyin, jyutping or jeita? Should they be referenced in a standard way, like “ipa”?

15 W3C SSML workshop 2-3 Nov 05 - Beijing The PLS language PLS is an XML language The container element is, attributes: –version (required): "1.0" –xmlns (required): " " –alphabet (optional): "ipa" (default value) –xml:lang (optional):“ en-US ” or “ zh-CN ” or “ jp ” Example: <lexicon version="1.0" xmlns=" alphabet="ipa" xml:lang=“zh-CN"> The current PLS is monolingual!

16 W3C SSML workshop 2-3 Nov 05 - Beijing The PLS language - metadata Metadata (annotation of the document for other uses, …) can be of two varieties: – element (for compatibility with other markup, like SRGS and SSML) – element (which contains the annotations either RDF format or other formats) Example of metadata: <lexicon version="1.0" xmlns=" alphabet="ipa" xml:lang="en-US”> <rdf:RDF xmlns:rdf = " xmlns:dc = " <rdf:Descriptionhttp://purl.org/dc/elements/1.1/ rdf:about="" dc:title="Pronunciation lexicon for W3C terms“ dc:description="This lexicon contains common pronunciations for many W3C acronyms and abbreviations, such as I18N, WSDL or WAI" dc:publisher="W3C“ dc:language="en-US“ dc:date=" “ dc:rights="Copyright 2002 W3C“ dc:format="application/pls+xml"> The W3C Voice Browser Working Group

17 W3C SSML workshop 2-3 Nov 05 - Beijing The PLS language – The element is the container of a lexicon entry. It is composed of: –One or more elements that indicate the words/phrases to be matched in the input –One or more either or elements that indicate the possible pronunciations or expansions respectively First considerations: –More elements may be present  this means that all of them will match the pronunciations –More elements may be present  this means that several pronunciations are in alternative –A mixture of and elements may be present  there is a preference mechanism to choose the single one for TTS

18 W3C SSML workshop 2-3 Nov 05 - Beijing The PLS language – The element contains CDATA that represents orthographies: –Regional spelling variations e.g. "colour" and "color"; –Free spelling variations e.g. "judgment" and "judgement" –Traditional vs Modern spellings e.g. for example in German it is common to replace "ö" with "oe". –Alternate writing systems, e.g. Japanese uses a mixture of Han ideographs (Kanji), and phonemic spelling systems e.g. Katakana or Hiragana for representing the orthography of a word or phrase <lexicon version="1.0" xmlns=" xml:lang="jp" alphabet="ipa"> nihongo 日本語 にほんご <!– Here you can insert the pronunciation of “nihongo”. in IPA language it could be: " nɪhɒŋɒ " -->v Is an explicit “orthography” attribute useful? Is it redundant?

19 W3C SSML workshop 2-3 Nov 05 - Beijing The PLS language – The elements are contained inside contains CDATA specifying the pronunciation in a given pronunciation alphabet: –An “ alphabet ” attribute may be specified to override the alphabet of the whole lexicon –A “ prefer ” attribute may be present to indicate precedence among pronunciations Example of lexeme for Sepulveda: Sepulveda sə'pʌlvɪdə

20 W3C SSML workshop 2-3 Nov 05 - Beijing The PLS language – Other examples Example for more than one pronunciation of the word “huge”: <lexicon version="1.0" xmlns=" xml:lang=“en-US" alphabet="ipa"> huge hju:ʤ ju:ʤ Example for the Japanese word “nihongo” with different spellings: <lexicon version="1.0" xmlns=" xml:lang="jp" alphabet="ipa"> nihongo 日本語 にほんご nɪhɒŋɒ

21 W3C SSML workshop 2-3 Nov 05 - Beijing The PLS language – The elements are contained inside is used to indicate the pronunciation of an acronym or an abbreviated term in the form of other orthographies. may contain –A “ prefer ” attribute to indicate precedence among pronunciations Both and may occur in a Example of lexeme with both and : <lexicon version="1.0" xmlns=" " alphabet="ipa" xml:lang="en"> W3C World Wide Web Consortium

22 W3C SSML workshop 2-3 Nov 05 - Beijing Use Cases/Future Issues The current version of PLS can deal with: Multiple Pronunciations for ASR Homographs Abbreviations But it cannot deal with: Homophones Part of speech annotations (and other contextual information) Grouping lexemes and external references  Too challenging tasks to be solved for PLS version 1.0

23 W3C SSML workshop 2-3 Nov 05 - Beijing Issues for the workshop Monolingual lexicon? Orthography attribute: Useful or redundant? Mandate new phonetic alphabets?

24 W3C SSML workshop 2-3 Nov 05 - Beijing Quick demo of SSML+PLS Mobile device (with embedded TTS) By GPRS, the device connects to a server: –It donwloads News for news site (RSS) –Transformation in SSML –Returned to the mobile device The device then: –Shows the news on the screen –Read the SSML document (which includes a lexicon) using the TTS engine

25 W3C SSML workshop 2-3 Nov 05 - Beijing Use Cases – Multiple pronunciations More than one pronunciation for a word (very common for ASR) Example of two pronunciations for the word “Newton”: <lexicon version="1.0“ xmlns=" alphabet="ipa" xml:lang="en"> Newton nju:'tən nu:'tən

26 W3C SSML workshop 2-3 Nov 05 - Beijing Use Cases – Multiple Orthographies More than one orthography for a word (common for ASR and TTS) Example of two orthographies for colour/color: <lexicon version="1.0" xmlns=" alphabet="ipa" xml:lang="en"> color colour 'kʌlə

27 W3C SSML workshop 2-3 Nov 05 - Beijing Final Remarks The usage of PLS: –Simplifies the development of a speech application –Improves the performance of speech recognition (in a standard way) –Enhances TTS output A standard language for PLS enables the exchange of pronunciations between applications