Speech Synthesis Markup Language -----Aim at Extension Dr. Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese.

Slides:



Advertisements
Similar presentations
The organization of sound in time
Advertisements

TEL: FAX: WEBSITE: © 2002 iFLYTEK. All rights reserved. This presentation is for informational.
Proposals for Extending SSML 1.0 from the Point-of-View of Hungarian TTS Developers Géza Németh, Géza Kiss, Bálint Tóth Laboratory of Speech Technology,
Corpus Linguistics: Counting words, texts or features Mike Scott, University of Liverpool Corpus Linguistics Summer Institute June-July 2008.
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.
Sub-Project I Prosody, Tones and Text-To-Speech Synthesis Sin-Horng Chen (PI), Chiu-yu Tseng (Co-PI), Yih-Ru Wang (Co-PI), Yuan-Fu Liao (Co-PI), Lin-shan.
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
1 SSML The Internationalization of the W3C Speech Synthesis Markup Language SpeechTek 2007 – C102 – Daniel C. Burnett.
Giving Presentations Some hints and tips. Know your audience b How big will it be? b What will the composition be?  Age, sex, background etc. b What.
SSML extensions for multi-language usage Davide Bonardo W3C Workshop on Internationalizing SSML Crete, May 2006.
Analyzing Students’ Pronunciation and Improving Tonal Teaching Ropngrong Liao Marilyn Chakwin Defense.
Unit 6 Teaching Pronunciation
Poetry.
Course Overview Lecture 1 Spoken Language Processing Prof. Andrew Rosenberg.
Evaluation of Speak Project 2b Due March 24th. Overview Experiments to evaluate performance of your audioconference (proj2) Focus not only on how your.
Spoken Language Generation Project II Synthesizing Emotional Speech in Fairy Tales.
1 Phonetics Study of the sounds of Speech Articulatory Acoustic Experimental.
Modern speech synthesis: communication aid personalisation Sarah Creer Stuart Cunningham Phil Green Clinical Applications of Speech Technology University.
Producing Emotional Speech Thanks to Gabriel Schubiner.
Recognizing Emotions in Facial Expressions
Position Paper for W3C Workshop on Internationalizing SSML The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML Myoung-Wan.
Arabic TTS (status & problems) O. Al Dakkak & N. Ghneim.
Toshiba (China) R&D Center LOU Xiaoyan, LI Jian Research and Development Center, Toshiba China Suggestions on Tone and Word Boundary of Mandarin for SSML.
Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen
STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.
Conversational Applications Workshop Introduction Jim Larson.
SSML 1.1: The Internationalization of SSML Daniel C. Burnett August 9, 2006.
Prepared by: Waleed Mohamed Azmy Under Supervision:
Emotional Embodied Conversational Agent Name : Ranjeet Singh FAN : sing0258 Student-Id :
1 What is HTML? Standardized codes Web pages SGML Descriptive markup Tags.
UHL 2332 Academic Report Writing Oral Presentation.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Connecting Connexions: Organizing and Integrating Open Learning Content with Topic Maps Darina Dicheva Lars Johnsen.
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.
Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
1 Language and Social Variation. 2 1.Introduction: In the previous lecture, we focused on the variation in language use in different geographical areas.
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.
© 2013 by Larson Technical Services
VoiceUNL : a proposal to represent speech control mechanisms within the Universal Networking Digital Language Mutsuko Tomokiyo (GETA-CLIPS-IMAG) & Gérard.
Welcome back to Public Speaking class!
Close Reading Tips and Tricks. Understanding Questions It is vital that you always use your own words. Only include a quote if you are asked to ‘pick.
A quick walk through phonetic databases Read English –TIMIT –Boston University Radio News Spontaneous English –Switchboard ICSI transcriptions –Buckeye.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
S PEECH T ECHNOLOGY Answers to some Questions. S PEECH T ECHNOLOGY WHAT IS SPEECH TECHNOLOGY ABOUT ?? SPEECH TECHNOLOGY IS ABOUT PROCESSING HUMAN SPEECH.
A Primer on Reading Terminology. AUTOMATICITY Readers construct meaning through recognition of words and passages (strings of words). Proficient readers.
An Introduction to S3ML Beijing InfoQuick SinoVoice Speech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin.
Chunking, Summary, & Annotation. Reading Strategies Chunking Summarization Annotation Hint: They all work together!!!!
Developing Website Content Project Process Content Strategy and Management Writing for the Web Tips and Tricks Discussion Items.
INTONATION Islam M. Abu Khater.
Facial Expressions and Emotions Mental Health. Total Participants Adults (30+ years old)328 Adults (30+ years old) Adolescents (13-19 years old)118 Adolescents.
Unit 6 Unit 6 Teaching Pronunciation. Teaching aims able to understand the role of pronunciation in language learning able to know the goal of teaching.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
Week-11 (Lecture-1) Introduction to HTML programming: A web based markup language for web. Ex.
2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science.
PLS for SSML Paolo Baggia Loquendo Workshop II on Internationalizing SSML.
INTONATION And IT’S FUNCTIONS
Presented By Sharmin Sirajudeen S7 CS Reg No :
Lecture Overview Prosodic features (suprasegmentals)
Two Third Tone Sandhi in Mandarin Chinese
MIR Lab: R&D Foci and Demos ( MIR實驗室:研發重點及展示)
Speech Science I Perry C. Hanavan.
August 15, 2008, presented by Rio Akasaka
Extended responses Learning Intention: To understand how to attack and write an extended response.
"Tone is a particular way of expressing feelings or attitudes that will influence how the reader feels about the characters, events, and outcomes. Speakers.
Reading with Expression
Chinese.
Presentation transcript:

Speech Synthesis Markup Language -----Aim at Extension Dr. Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Brief Introduction to Evolution of SSML  The original SSML (not W3C SSML)  STML  JSML  SABLE  W3C SSML  … National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

The original SSML  Mark phrase boundaries  Emphasis words  Specify pronunciations  Include other sound files National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

STML  Developed by Edinburgh and Bell Labs  Based on the original SSML  Aimed at giving the same basic impressions to listeners, not sounding identical on different systems National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

JSML  Developed by Sun  XML based  Include Elements to mark the paragraphs and sentences Elements to control the pronunciations Elements to represent markers National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

SABLE  Developed by Edinburgh and Bell Labs  Based on STML and JSML  The stated aims Synthesizer control  Text structure  Speech pronunciation Multilinguality Easy of Use Portable Extensibility National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

W3C SSML  Key design criteria Consistency Interoperability Generality Internationalization Generation and Readability Implementable National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

What we want from markup language  Controlling  Sharing  Extended to multimedia National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Which level we should focus  Text analysis module  Prosody module  Acoustic module

Sharing Text-analysisacousticProsody-analysis Text-analysisacousticProsody-analysis Sys1 Sys2 SSML Data Structure1 Data Structure2 National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Text level for Mandarin  Word boundary  Pronunciation with tone  POS  Dialect?

Prosody level for Mandarin  Tone sandhi  Rhythm ?

Extensions to expressive synthesis  Emotion and Style  Others National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Current elements related to prosody and style in SSML  "voice" Element"voice" Element  "emphasis" Element"emphasis" Element  "break" Element"break" Element  "prosody" Element"prosody" Element

Emotion and Style  Emotion Anger, happy, surprise, sad, fear, … Depend on speaker ’ s psychological and physical states Local effects on prosody  Style News, comments, … Depend on semantics of sentences Global effects on prosody

Personalized Voice  Element : voice “ gender ” : “ age ” : “ name ” : “ variant ” : sample :  他说: “ 什么意 思? ”  她回答: “ 没什么 意思。 ”

Extension?  To make it more expressive Background music VTTS  Combined with talking head and some other media information … We only can see the element “ mark “ National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Thanks!

 Element: Level: 0-..; paragraph, phrase, POS:  