SSML 1.1: The Internationalization of SSML Daniel C. Burnett August 9, 2006.

Slides:



Advertisements
Similar presentations
U.S. Government Language Requirements U.S. Government Language Requirements 7 September 2000 Everette Jordan Department of Defense
Advertisements

Tones Review / 四声复习 As you know and we have discussed many times, accurate tones are extremely important in Chinese. The ability to pronounce tones correctly,
University of Michigan Flint Zhong, Yan
Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Applying the Pronunciation Lexicon Specification to ASR & TTS 1 Patrizio Bergallo 1 Monday, August 20, 2007 SpeechTEK ASTS - Advances in Text-to-Speech.
1 SSML The Internationalization of the W3C Speech Synthesis Markup Language SpeechTek 2007 – C102 – Daniel C. Burnett.
SSML extensions for multi-language usage Davide Bonardo W3C Workshop on Internationalizing SSML Crete, May 2006.
HTML5 and CSS3 Illustrated Unit B: Getting Started with HTML
Solutions for Multilingual Literature by XSL Formatter 6,800 known languages.
Project 1 Introduction to HTML.
Unit 2, cont. September 14 HTML,Validating your pages, Publishing your site.
HTML 4 - Introduction HTML stands for Hyper Text Markup Language. It is the standard format for documents on the World Wide Web. Just as Microsoft Word.
1st Project Introduction to HTML.
Chapter 2 Introduction to HTML5 Internet & World Wide Web How to Program, 5/e Copyright © Pearson, Inc All Rights Reserved.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Collecting Primary Language Information LINKED-DISC - provincial database system for early childhood intervention Services Herb Chan.
Position Paper for W3C Workshop on Internationalizing SSML The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML Myoung-Wan.
Chapter ONE Introduction to HTML.
Speech Synthesis Markup Language -----Aim at Extension Dr. Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese.
UNLIMITED. SIMULTANEOUS. NO CHECK-OUT. eREFERENCE.
Unicode & W3C Jataayu Software C. Kumar January 2007.
Toshiba (China) R&D Center LOU Xiaoyan, LI Jian Research and Development Center, Toshiba China Suggestions on Tone and Word Boundary of Mandarin for SSML.
Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005.
JEITA Speech Group1 Issues of SSML in Japanese Wataru IMATAKE (ANIMO LIMITED) Makoto AKABANE (Sony Computer Entertainment Inc.) Kazuyo TANAKA (Tsukuba.
How IPA is Used in SSML and PLS Paolo Baggia, Loquendo Wed. August 9 th, 2006.
W3C Workshop, Beijing, 2nd of November 2005 An extension to the SSML for diacritics auto-completion R&D Centre Vocal Services Section.
Conversational Applications Workshop Introduction Jim Larson.
HTML (HyperText Markup Language)
1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim
XHTML. Introduction to XHTML What Is XHTML? – XHTML stands for EXtensible HyperText Markup Language – XHTML is almost identical to HTML 4.01 – XHTML is.
Language / Locale IDs M. Davis, IBM A. Phillips, webMethods.
English Linguistics: An Introduction
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
1 A Study on Implementation of Southern-Min Taiwanese Tone Sandhi System Iu n Un-gian Lau Kiat-gak Li Sheng-an.
XP 2 HTML Tutorial 1: Developing a Basic Web Page.
WORLD GEOGRAPHY Oct. 21, Today Unit 5 - Language.
Sorting it all out: An introduction to collation Cathy Wissink Michael Kaplan Globalization Infrastructure and Font Technology Windows International Microsoft.
A Look at HTML (and XHTML). Types of Web Applications.
Introduction to Markup Languages January 31, 2002.
Common Terms in the Internet Adnan Iqbal MCS-MIT-WD-A+ The College of Law.
VoiceXML Version 2.0 Jon Pitcherella. What is it? A W3C standard for specifying interactive voice dialogues. Uses a “voice” browser to interpret documents,
Games: XML Presented by: Idham bin Mat Desa Mohd Sharizal bin Hamzah Mohd Radzuan bin Mohd Shaari Shukor bin Nordin.
An Introduction to S3ML Beijing InfoQuick SinoVoice Speech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin.
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
Quick Overview on Tones
XP 1 HTML Tutorial 1: Developing a Basic Web Page.
EXPRESS YOURSELF. NEUTRAL ACCENT Neutral accent is a way of speaking a language without regionalism. Accent means variation in pronunciation and it should.
Phonetics, part III: Suprasegmentals October 18, 2010.
Pitch Tracking + Prosody January 19, 2012 Homework! For Tuesday: introductory course project report Background information on your consultant and the.
HTML5 and CSS3 Illustrated Unit B: Getting Started with HTML.
PLS for SSML Paolo Baggia Loquendo Workshop II on Internationalizing SSML.
Presented By Sharmin Sirajudeen S7 CS Reg No :
HTML PROJECT #1 Project 1 Introduction to HTML. HTML Project 1: Introduction to HTML 2 Project Objectives 1.Describe the Internet and its associated key.
Lesson 1 Dialogue 1 Grammar University of Michigan Flint Zhong, Yan.
HTML5 Basics.
RADEXT WG RADIUS Attribute Guidelines
Ni hao, Mandarin – Workshop 2 CESC Asian Studies Term 1, 2015
Project 1 Introduction to HTML.
(2) Suprasegmentals The features such as pitch, stress, and length, which are used simultaneously with units larger than segments, are called “suprasegmentals.”
Job Google Job Title: Linguistic Project Manager
Internationalising SSML Perspectives from the Local Language Speech Technology Initiative Ksenia Shalonova & Roger Tucker Outside Echo Ltd 19 November.
Presentation Title goes here, Long if needed
presentation Title Goes Here
XML.
presentation Title Goes Here
Language Geography.
Presentation Title goes here, Long if needed
University of Michigan Flint Zhong, Yan
Presentation transcript:

SSML 1.1: The Internationalization of SSML Daniel C. Burnett August 9, 2006

SSML 1.0 Widely used Convenient for many languages However,...

Chinese tones Mandarin is syllable-based, with tone movement a distinguishing feature of the syllable 妈 (mā) 麻 (má) 马 (mă) 骂 (mà) IPA is cumbersome when only the tone needs to be corrected –Eg., correcting Tone Sandhi 你好 ni3 hao3  ni2 hao3

Chinese word boundaries Word boundaries are not given in typical writing 這一晚會如常舉行 – 這一 晚會 如常 舉行 means “This banquet is held as usual” – 這一晚 會 如常 舉行 means “Tonight will be held as usual”

Chinese names Chinese characters are pronounced differently (in a consistent manner) in names, particularly family names Cantonese example: 單明明 單 /daan1/  /sin6/ (surname) 明明 /ming4 ming4/  /ming4 ming2/ (given name)

Japanese Ruby Ruby is a typesetter’s annotation used in everyday print media. It –disambiguates Kanji text (Chinese characters) –does this by giving the pronunciation Every Japanese person knows how to read it Why not use it for pronunciation?

Mixed languages “Tonight’s movie is ‘La vita è bella’.” Japanese and Chinese use the same characters, but often with very different meanings How should mixed-language text be annotated? How do you change the language without changing the voice? –What does this question really mean?

“Oh, and one more thing...” Korean/Hungarian need for PoS Sub-word level prosody annotation (eg., contrastive stress at syllable level in Hungarian) Text with missing diacritics (eg., Polish SMS text) Other simplified/non-traditional text Better support for highly-agglutinative languages

SSML 1.1 Two workshops to solicit such examples SSML subgroup of W3C Voice Browser Working Group –Has met twice –Expects to release requirements later this year

SSML subgroup “charter” “... For Mandarin, Cantonese, Hindi*, Arabic*, Russian*, Korean, and Japanese, we will identify and address language phenomena that must be addressed to enable support for the language. Where possible we will address these phenomena in a way that is most broadly useful across many languages. We have chosen these languages because of their economic impact and expected group expertise and contribution....” * provided there is sufficient group expertise and contribution for these languages

Some possible requirements Pronunciation scripts Word boundary Name identification Language indication Lexicon activation

Pronunciation scripts Today, values other than IPA are permitted but not standardized New requirement might be: –to establish registry (eg., at IANA) for standardizing values for Pinyin Jyutping Ruby etc.

Word boundary New requirement might be –to provide mechanism to eliminate word segmentation ambiguities Note that white space is insufficient because –some languages (such as Vietnamese) use white space for syllable segmentation –some languages (such as Urdu) use white space for other purposes

Name identification New requirement might be –to provide a mechanism to identify content as a proper noun or a name

Language indication xml:lang is used in all XML languages to mean the language of the content Successor to RFC3266 clarifies region and dialect encoding language – script – region – variant – extension – private_use “zh-Hans-CN” New requirements might be –To clarify that xml:lang only indicates the language of the content –To specify that selection of voice and language are independent and that TTS vendors must document supported combinations of language and voice

Lexicon activation Today, implicit lexicon activation in SSML based on –Language –Document order New requirement might be –Support explicit author control over which lexicons are used for which portions of the SSML document

Get involved W3C Voice Browser Working Group –Responsible for VoiceXML, SSML, SRGS, and many other speech-related standards SSML subgroup –Seeking experts in Russian, Hebrew, and Arabic Visit for more infohttp://

Divider page title goes here