1 SSML The Internationalization of the W3C Speech Synthesis Markup Language SpeechTek 2007 – C102 – Daniel C. Burnett
2 Overview SSML 1.0 Why SSML 1.1? SSML 1.1 scope Selected features Examples –voice/xml:lang –pronunciation alphabets – element For more info...
3 SSML 1.0 W3C Recommendation in 2004 Widely implemented – the primary authoring API for TTS engines Many extensions
4 Why SSML 1.1? 1.0 extensions are primarily to address language-related phenomena Workshops in China, Greece, and India to understand motivations for these extensions –How to correct tones for East Asian languages? –How to handle transliteration for Indian languages? –How to indicate word boundaries for written languages that do not display them? –How to precisely control voice and language changes?
5 SSML 1.1 scope Provide broadened language support –For Mandarin, Cantonese, Hindi*, Arabic*, Russian*, Korean*, and Japanese, we will identify and address language phenomena that must be addressed to enable support for the language. Where possible we will address these phenomena in a way that is most broadly useful across many languages. We have chosen these languages because of their economic impact and expected group expertise and contribution. –We will also consider phenomena of other languages for which there is both sufficient economic impact and group expertise and contribution. Fix incompatibilities with other Voice Browser Working Group languages, including PLS, SRGS, and VoiceXML 2.0/2.1. Out of scope: –VCR-like controls: fast-forward, rewind, pause, resume –New values. Collecting requirements for future work is okay * provided there is sufficient group expertise and contribution for these languages
6 SSML 1.1 scope – some workshop topics In scope –Token/word boundaries –Phonetic alphabets –Tones –Part of Speech support –Text w/multiple languages (separate control of xml:lang and voice) –Subword annotation (partial) –Syllable-level markup (partial) Out of scope –Providing number, case, gender info –Simplified/alternate/SMS text –Transliteration –Expressive (emotion) elements –Enhanced prosody rate control
7 Selected new features SSML 1.1 is a Working Draft – everything from this point on is subject to change Improved lexicon activation control Better linkage with PLS lexicons Clearer separation between xml:lang (document text content) and voice selection Improved author control of behavior upon xml:lang/voice selection mismatch Introduction of a Pronunciation Alphabet Registry to allow use of standardized pinyin, jyutping, and other language-specific pronunciation alphabets in addition to the IPA default New element for marking word boundaries
8 Examples – voice/xml:lang Next few examples demonstrate some of the new SSML 1.1 features that provide –Clearer separation between xml:lang (document text content) and voice selection –Improved author control of behavior upon xml:lang/voice selection mismatch
9 I want a big pepperoni pizza. Simple example Will find voices that can read US English, each time. Voice changes are scoped, so the same voice is used for “I want” and “pizza.” The “name” and “gender” values are requests only, and not required in order for voice selection to be successful.
10 I want a big pepperoni pizza. “required” attribute Now the name and gender attributes, respectively, are required rather than merely requested. “required” attribute lists *all* required voice selection features, so the two inner voices might not be able to speak English If one of the inner voices cannot read/speak English, processor can decide what to do (skip the text, try to read it anyway, or change voice)
11 I want a big pepperoni pizza. “onlangfailure” attribute Now, when any text is encountered that cannot be spoken by the currently selected voice, it will be skipped by the processor. The voice *will not* change. Other options are “processorchoice”, “ignorelang”, and “changevoice”.
12 <voice languages=“en-US” onvoicefailure=“keepexisting”> I want a big pepperoni pizza. “onvoicefailure” attribute What if the processor can’t find a voice that meets the required criteria? In the above example, the processor will keep the voice it had. This attribute is scoped as well. Other options are “priorityselect” and “processorchoice”.
13 Language and accent <voice languages=“zh-cmn:en-US en:en-US” onvoicefailure=“keepexisting”> 我想要 a big pepperoni pizza. First request is for a voice that can speak both English and Mandarin Chinese with a US-English accent If voice selection is successful, the voice will be able to speak both the Chinese text and the final “pizza.” Note that the female voice need not speak either language (as written).
14 Examples – pronunciation alphabets Developing a new Pronunciation Alphabet Registry Experts can register pronunciation alphabets for their languages Can also register historically used alphabets such as ARPAbet and Worldbet First entries will likely be pinyin, jyutping 此 <phoneme alphabet=“pinyin“ ph=“chu4"> 处 不准照相。
15 Examples – element element helps resolve ambiguities for languages that may not visually separate words. Markup is allowed within but does not cause word separation (unlike in the rest of SSML) => allows for sub- word,, etc. <!-- Ambiguous sentence is 南京市长江大桥 --> 南京市 长江大桥 南京市长 江大桥
16 For more info... Information about the Voice Browser Working Group can be found at Current SSML drafts: – –