Download presentation
Presentation is loading. Please wait.
Published byRosemary Daniels Modified over 9 years ago
1
NLP Research Group Meeting ( 27. March. 2004 ) Text Processing Front End for Indian Language TTS System Text Processing Front End Speech Synthesizer Phonetic Units Prosodic Markings Input Text Speech - SUSMITHA & ROHIT KUMAR - Basic Block Diagram of a Text to Speech System
2
NLP Research Group Meeting ( 27. March. 2004 ) Basic Design Text Processing Front End for Indian Language TTS System Throughout all these conversions Indexing is maintained Converter Text Normalization Expands Non Standard Words to Standard Words Unicode to WX NLP Modules Phonetizer Unicodes Font Text Normalized Unicodes WX Phonetic Unit Sequences & Prosodic Markings
3
NLP Research Group Meeting ( 27. March. 2004 ) IIITH_Converter Font_IDUnicode2StringString2UnicodeGetIndexes IIITH_AmarUjalaIIITH_BhaskarIIITH_Vartha Base Class Public Virtual AmarUjala Bhaskar Jagran Naidunia Shusha Shashi Yogesh Eenadu Vartha Hemalatha WLHemalatha ISCII UTF8 WX List of Fonts Currently Handled Text Processing Front End for Indian Language TTS System Converters Public …………… Derived Classes (one for each converter)
4
NLP Research Group Meeting ( 27. March. 2004 ) Mapping TableIndex Creation Specialized BlocksIndex Adjustment Movement Blocks Deletion Blocks Substitution Blocks Text Processing Front End for Indian Language TTS System Converters (continued..)
5
NLP Research Group Meeting ( 27. March. 2004 ) vector stringvector IIITH_Converter Text Processing Front End for Indian Language TTS System Converters (continued..) Notation1_IndexNotation2_index vector No temporary files, no junk, no system calls, very portable, etc. Simple, Easy to use, Pluggable modules (we used them frequently for InXight work & also for PICOPETA) Also Unicode to UTF8 Converter has been developed is being used in Web Content Unifier
6
NLP Research Group Meeting ( 27. March. 2004 ) FontWX Aaja Aja 36.4 CatIsa daSaMloka cAra ° digarI C seMtigreda ka kA tapm aana tApaMana hO hE.. Text Processing Front End for Indian Language TTS System Indexing Example
7
NLP Research Group Meeting ( 27. March. 2004 ) Types of Token Handled Numbers(11.221) Abbreviations(Mr., Dr.) Punctuations(+,-) Normal Words Tokenizer Token Expansion Unicodes Indexes Unicodes Updated Indexes Text Processing Front End for Indian Language TTS System Text Normalization Filter Text Normalization Token Identifier
8
NLP Research Group Meeting ( 27. March. 2004 ) NormalizeText IIITH_TextNormalization Text Processing Front End for Indian Language TTS System Text Normalization (continued…) Unicodes Updated Indexes vector Unicodes Indexes vector Most of Normalization Operations are language independent. The language dependent things (e.g. number tables, abbreviations, etc.) are kept in separate file in a standard path and the Text Normalization module loads the appropriate file depending upon the Language ID provided to it Quite easy to extend to new Indian Languages Allows continuous Improvements with evaluations
9
NLP Research Group Meeting ( 27. March. 2004 ) IIITH_NLPModule IIITH_HindiIVSIIITH_Wx2Z ProcessGetIndex Virtual Base Class Public Text Processing Front End for Indian Language TTS System NLP Modules Public Derived Classes Process WX (or Z) Updated Indexes string WX Indexes string
10
NLP Research Group Meeting ( 27. March. 2004 ) LangID ? IIITH_HindiIVS IIITH_Wx2Z Hindi Z WX Lang ID Telugu Text Processing Front End for Indian Language TTS System NLP Modules (continued..) Currently deployed NLP Modules NLP Modules to be developed / deployed 1.Borrowed / Foreign Words handling 2.Clause Boundary
11
NLP Research Group Meeting ( 27. March. 2004 ) Currently the Phonetizer is a part of the synthesis engine Bringing Phonetizer & Syllabifier modules outside the core engine because we can use these modules for several other purposes also Modifying the Synthesis engine to support new phonetizer and Indexing Thorough Testing and Evaluation of TN Modules & continous improvements Developing a proper API (LIBs and DLLs) for using these Integration of new modules with LMDS (for PICOPETA) & with RAVI Experimenting with Prosodic Marking (better pauses for a start) Text Processing Front End for Indian Language TTS System Moving Further…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.