Download presentation
Presentation is loading. Please wait.
Published byErica Butler Modified over 9 years ago
1
Speech Synthesis Markup Language -----Aim at Extension Dr. Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
2
Brief Introduction to Evolution of SSML The original SSML (not W3C SSML) STML JSML SABLE W3C SSML … National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
3
The original SSML Mark phrase boundaries Emphasis words Specify pronunciations Include other sound files National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
4
STML Developed by Edinburgh and Bell Labs Based on the original SSML Aimed at giving the same basic impressions to listeners, not sounding identical on different systems National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
5
JSML Developed by Sun XML based Include Elements to mark the paragraphs and sentences Elements to control the pronunciations Elements to represent markers National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
6
SABLE Developed by Edinburgh and Bell Labs Based on STML and JSML The stated aims Synthesizer control Text structure Speech pronunciation Multilinguality Easy of Use Portable Extensibility National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
7
W3C SSML Key design criteria Consistency Interoperability Generality Internationalization Generation and Readability Implementable National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
8
What we want from markup language Controlling Sharing Extended to multimedia National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
9
Which level we should focus Text analysis module Prosody module Acoustic module
10
Sharing Text-analysisacousticProsody-analysis Text-analysisacousticProsody-analysis Sys1 Sys2 SSML Data Structure1 Data Structure2 National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
11
Text level for Mandarin Word boundary Pronunciation with tone POS Dialect?
12
Prosody level for Mandarin Tone sandhi Rhythm ?
13
Extensions to expressive synthesis Emotion and Style Others National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
14
Current elements related to prosody and style in SSML 3.2.1 "voice" Element"voice" Element 3.2.2 "emphasis" Element"emphasis" Element 3.2.3 "break" Element"break" Element 3.2.4 "prosody" Element"prosody" Element
15
Emotion and Style Emotion Anger, happy, surprise, sad, fear, … Depend on speaker ’ s psychological and physical states Local effects on prosody Style News, comments, … Depend on semantics of sentences Global effects on prosody
16
Personalized Voice Element : voice “ gender ” : “ age ” : “ name ” : “ variant ” : sample : 他说: “ 什么意 思? ” 她回答: “ 没什么 意思。 ”
17
Extension? To make it more expressive Background music VTTS Combined with talking head and some other media information … We only can see the element “ mark “ National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
18
Thanks!
19
Element: Level: 0-..; paragraph, phrase, POS:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.