NLP Research Group Meeting ( 27. March. 2004 ) Text Processing Front End for Indian Language TTS System Text Processing Front End Speech Synthesizer Phonetic.

NLP Research Group Meeting ( 27. March. 2004 ) Text Processing Front End for Indian Language TTS System Text Processing Front End Speech Synthesizer Phonetic Units Prosodic Markings Input Text Speech - SUSMITHA & ROHIT KUMAR - Basic Block Diagram of a Text to Speech System

NLP Research Group Meeting ( 27. March. 2004 ) Basic Design Text Processing Front End for Indian Language TTS System Throughout all these conversions Indexing is maintained Converter Text Normalization Expands Non Standard Words to Standard Words Unicode to WX NLP Modules Phonetizer Unicodes Font Text Normalized Unicodes WX Phonetic Unit Sequences & Prosodic Markings

NLP Research Group Meeting ( 27. March. 2004 ) IIITH_Converter Font_IDUnicode2StringString2UnicodeGetIndexes IIITH_AmarUjalaIIITH_BhaskarIIITH_Vartha Base Class Public Virtual AmarUjala Bhaskar Jagran Naidunia Shusha Shashi Yogesh Eenadu Vartha Hemalatha WLHemalatha ISCII UTF8 WX List of Fonts Currently Handled Text Processing Front End for Indian Language TTS System Converters Public …………… Derived Classes (one for each converter)

NLP Research Group Meeting ( 27. March. 2004 ) Mapping TableIndex Creation Specialized BlocksIndex Adjustment Movement Blocks Deletion Blocks Substitution Blocks Text Processing Front End for Indian Language TTS System Converters (continued..)

NLP Research Group Meeting ( 27. March. 2004 ) vector stringvector IIITH_Converter Text Processing Front End for Indian Language TTS System Converters (continued..) Notation1_IndexNotation2_index vector  No temporary files, no junk, no system calls, very portable, etc.  Simple, Easy to use, Pluggable modules (we used them frequently for InXight work & also for PICOPETA)  Also Unicode to UTF8 Converter has been developed is being used in Web Content Unifier

NLP Research Group Meeting ( 27. March. 2004 ) FontWX Aaja Aja 36.4 CatIsa daSaMloka cAra ° digarI C seMtigreda ka kA tapm aana tApaMana hO hE.. Text Processing Front End for Indian Language TTS System Indexing Example

NLP Research Group Meeting ( 27. March. 2004 ) Types of Token Handled  Numbers(11.221)  Abbreviations(Mr., Dr.)  Punctuations(+,-)  Normal Words Tokenizer Token Expansion Unicodes Indexes Unicodes Updated Indexes Text Processing Front End for Indian Language TTS System Text Normalization Filter Text Normalization Token Identifier

NLP Research Group Meeting ( 27. March. 2004 ) NormalizeText IIITH_TextNormalization Text Processing Front End for Indian Language TTS System Text Normalization (continued…) Unicodes Updated Indexes vector Unicodes Indexes vector  Most of Normalization Operations are language independent.  The language dependent things (e.g. number tables, abbreviations, etc.) are kept in separate file in a standard path and the Text Normalization module loads the appropriate file depending upon the Language ID provided to it  Quite easy to extend to new Indian Languages  Allows continuous Improvements with evaluations

NLP Research Group Meeting ( 27. March. 2004 ) IIITH_NLPModule IIITH_HindiIVSIIITH_Wx2Z ProcessGetIndex Virtual Base Class Public Text Processing Front End for Indian Language TTS System NLP Modules Public Derived Classes Process WX (or Z) Updated Indexes string WX Indexes string

NLP Research Group Meeting ( 27. March. 2004 ) LangID ? IIITH_HindiIVS IIITH_Wx2Z Hindi Z WX Lang ID Telugu Text Processing Front End for Indian Language TTS System NLP Modules (continued..) Currently deployed NLP Modules NLP Modules to be developed / deployed 1.Borrowed / Foreign Words handling 2.Clause Boundary

NLP Research Group Meeting ( 27. March. 2004 )  Currently the Phonetizer is a part of the synthesis engine  Bringing Phonetizer & Syllabifier modules outside the core engine because we can use these modules for several other purposes also  Modifying the Synthesis engine to support new phonetizer and Indexing  Thorough Testing and Evaluation of TN Modules & continous improvements  Developing a proper API (LIBs and DLLs) for using these  Integration of new modules with LMDS (for PICOPETA) & with RAVI  Experimenting with Prosodic Marking (better pauses for a start) Text Processing Front End for Indian Language TTS System Moving Further…

NLP Research Group Meeting ( 27. March. 2004 ) Text Processing Front End for Indian Language TTS System Text Processing Front End Speech Synthesizer Phonetic.

Similar presentations

Presentation on theme: "NLP Research Group Meeting ( 27. March. 2004 ) Text Processing Front End for Indian Language TTS System Text Processing Front End Speech Synthesizer Phonetic."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

NLP Research Group Meeting ( 27. March. 2004 ) Text Processing Front End for Indian Language TTS System Text Processing Front End Speech Synthesizer Phonetic.

Similar presentations

Presentation on theme: "NLP Research Group Meeting ( 27. March. 2004 ) Text Processing Front End for Indian Language TTS System Text Processing Front End Speech Synthesizer Phonetic."— Presentation transcript:

Similar presentations

About project

Feedback