Presentation is loading. Please wait.

Presentation is loading. Please wait.

Position Paper for W3C Workshop on Internationalizing SSML The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML 2005. 11. 3. Myoung-Wan.

Similar presentations


Presentation on theme: "Position Paper for W3C Workshop on Internationalizing SSML The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML 2005. 11. 3. Myoung-Wan."— Presentation transcript:

1 Position Paper for W3C Workshop on Internationalizing SSML The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML 2005. 11. 3. Myoung-Wan Koo †‡ and Du-Seong Chang † KT † /KAIT ‡

2 The Value Networking Company 1/9 Introduction  Multiple pronunciation problem  Same word but different pronunciations  Newton: /nju:tən/ v.s. /nu:tən/  Same spelling but different pronunciations (homograph)  refuse: /r ɪ 'fju:z/ v.s. /'refju:s/ <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> Newton nju:tən nu:tən refuse r ɪ 'fju:z 'refju:s

3 The Value Networking Company 2/9 Multiple pronunciation in SSML&PLS  SSML  The Speech Synthesis Markup Language Specification Version 1.0  Pronunciation information in SSML  Phoneme element  Lexicon element  PLS  Pronunciation Lexicon Specification Version 1.0  Pronunciation information in PLS  Phoneme element  Prefer attribute  They doesn’t fully support the pronunciation lexicon for multiple pronunciations and agglutinative language.  Part-Of-Speech information is needed

4 The Value Networking Company 3/9 Pronunciation information in PLS (1/2)  Pronunciation Lexicon Specification  Version 1.0/Feb 2005/W3C Voice Browser Working Group  It allow interoperable specification of pronunciation information for either ASR and TTS engines within voice browsing applications.  It is expected to handle multiple pronunciation.  Example of PLS <lexicon version="1.0" xmlns=“http://www.w3.org/2005/01/pronunciation-lexicon’ alphabet="ipa" xml:lang="en-US"> tomato təmei ̥ɾ ou

5 The Value Networking Company 4/9 Pronunciation information in PLS (2/2)  Prefer attribute of phoneme element  Give one pronunciation high priority among pronunciation candidates.  Effective in speech synthesis  Only in multiple pronunciations for same orthography  Not in homograph problem refuse: verb/r ɪ 'fju:z/ v.s. noun/'refju:s/  No information for ASR systems. <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> Newton nju:tən nu:tən

6 The Value Networking Company 5/9 Typical Korean TTS system structure Morphological Analyzer Grapheme-to- Phoneme Prosody Analysis Waveform production Pronunciation Dictionary morphemePOSPronunciation ……… ……… ……… Text Morpheme Dictionary morphemePOS1POS2… ………… ………… ………… Speech Morphemes, POS Phonemes, POS Phonemes, Prosody Structural Information

7 The Value Networking Company 6/9 POS for resolving multiple pronunciations  POS information can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems.  The word “refuse” can have two different pronunciations depending on pos information.  Proposal: POS attribute <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> refuse r ɪ 'fju:z refuse 'refju:s

8 The Value Networking Company 7/9 POS information for LVCSR  Large vocabulary continuous speech recognition of agglutinative language  Basic unit is morpheme (pseudo-morpheme) for reducing the vocabulary size.  Many homographs in the recognition dictionary.  POS information help system to get a proper pronunciation in a dictionary as well as to resolve multiple pronunciations in some words.  It reduce the search time since POS information could cut the wrong word connection in the first stage, not in the semantic interpretation stage.

9 The Value Networking Company 8/9 Proposals  Proposal 1: POS attribute of phoneme element  Optional attribute  Proposal 2: POS element  Lexeme element contain optional POS elements.  POS values: language-specific  Type: allow vendor-specific POS type?  Outstanding POS set: Penn Treebank, Sejong project (Korean) <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> refuse r ɪ 'fju:z verb

10 The Value Networking Company 9/9 Conclusion  No element or attribute for resolving multiple pronunciations  In current SSML, PLS  POS information  can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems.  Can reduce the search time in a large vocabulary recognition system.  Can be effective in agglutinative language.  Proposals  POS element  POS attribute


Download ppt "Position Paper for W3C Workshop on Internationalizing SSML The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML 2005. 11. 3. Myoung-Wan."

Similar presentations


Ads by Google