Download presentation
Presentation is loading. Please wait.
Published byShanna Cook Modified over 9 years ago
1
Position Paper for W3C Workshop on Internationalizing SSML The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML 2005. 11. 3. Myoung-Wan Koo †‡ and Du-Seong Chang † KT † /KAIT ‡
2
The Value Networking Company 1/9 Introduction Multiple pronunciation problem Same word but different pronunciations Newton: /nju:tən/ v.s. /nu:tən/ Same spelling but different pronunciations (homograph) refuse: /r ɪ 'fju:z/ v.s. /'refju:s/ <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> Newton nju:tən nu:tən refuse r ɪ 'fju:z 'refju:s
3
The Value Networking Company 2/9 Multiple pronunciation in SSML&PLS SSML The Speech Synthesis Markup Language Specification Version 1.0 Pronunciation information in SSML Phoneme element Lexicon element PLS Pronunciation Lexicon Specification Version 1.0 Pronunciation information in PLS Phoneme element Prefer attribute They doesn’t fully support the pronunciation lexicon for multiple pronunciations and agglutinative language. Part-Of-Speech information is needed
4
The Value Networking Company 3/9 Pronunciation information in PLS (1/2) Pronunciation Lexicon Specification Version 1.0/Feb 2005/W3C Voice Browser Working Group It allow interoperable specification of pronunciation information for either ASR and TTS engines within voice browsing applications. It is expected to handle multiple pronunciation. Example of PLS <lexicon version="1.0" xmlns=“http://www.w3.org/2005/01/pronunciation-lexicon’ alphabet="ipa" xml:lang="en-US"> tomato təmei ̥ɾ ou
5
The Value Networking Company 4/9 Pronunciation information in PLS (2/2) Prefer attribute of phoneme element Give one pronunciation high priority among pronunciation candidates. Effective in speech synthesis Only in multiple pronunciations for same orthography Not in homograph problem refuse: verb/r ɪ 'fju:z/ v.s. noun/'refju:s/ No information for ASR systems. <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> Newton nju:tən nu:tən
6
The Value Networking Company 5/9 Typical Korean TTS system structure Morphological Analyzer Grapheme-to- Phoneme Prosody Analysis Waveform production Pronunciation Dictionary morphemePOSPronunciation ……… ……… ……… Text Morpheme Dictionary morphemePOS1POS2… ………… ………… ………… Speech Morphemes, POS Phonemes, POS Phonemes, Prosody Structural Information
7
The Value Networking Company 6/9 POS for resolving multiple pronunciations POS information can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems. The word “refuse” can have two different pronunciations depending on pos information. Proposal: POS attribute <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> refuse r ɪ 'fju:z refuse 'refju:s
8
The Value Networking Company 7/9 POS information for LVCSR Large vocabulary continuous speech recognition of agglutinative language Basic unit is morpheme (pseudo-morpheme) for reducing the vocabulary size. Many homographs in the recognition dictionary. POS information help system to get a proper pronunciation in a dictionary as well as to resolve multiple pronunciations in some words. It reduce the search time since POS information could cut the wrong word connection in the first stage, not in the semantic interpretation stage.
9
The Value Networking Company 8/9 Proposals Proposal 1: POS attribute of phoneme element Optional attribute Proposal 2: POS element Lexeme element contain optional POS elements. POS values: language-specific Type: allow vendor-specific POS type? Outstanding POS set: Penn Treebank, Sejong project (Korean) <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> refuse r ɪ 'fju:z verb
10
The Value Networking Company 9/9 Conclusion No element or attribute for resolving multiple pronunciations In current SSML, PLS POS information can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems. Can reduce the search time in a large vocabulary recognition system. Can be effective in agglutinative language. Proposals POS element POS attribute
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.