Kishore Prahallad IIIT Hyderabad 1 Building a Limited Domain Voice Using Festvox (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 1 Building a Limited Domain Voice Using Festvox (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009) Kishore Prahallad Email: kishore@iiit.ac.in International Institute of Information Technology (IIIT) Hyderabad, India & Language Technologies Institute, Carnegie Mellon University

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 2 Objective Objective: To provide introduction to the inner details of Festival Synthesis system Best Resources: Documentation of Festival, Festvox and Speech Tools and their mailing lists Topics: –Festival, Festvox and Speech Tools –Modules and data structures in Festival –Synthesis Flow –Building a limited domain voice

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 3 Festival & Speech Tools Festival –Full text to speech system –Multi-lingual –A general framework for building new voices in existing and new languages –APIs: Shell Level, C++ Library, Emacs interface Speech Tools –A set of modules for common tasks found in speech processing Example: Feature Extraction –Interface: Stand alone executables and a set of library calls linked into user programs

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 4 Festvox Voice building tool Interface created on top of Festival and Speech Tools to build voices

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 5 How Festival, Festvox & Speech Tools are Related Speech Tools Festival Multi-lingual Synthesis Engine Festvox Environment To build voices

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 6 Output of Festvox Speech Tools Festival Multi-lingual Synthesis Engine Festvox Environment To build voices Voice Festvox uses SpeechTools and Festival to create a new voice The Voice created is put back into Festival framework to synthesize text

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 7 User Interface with Festival Speech Tools Festival Multi-lingual Synthesis Engine Festvox Environment To build voices Voice User World

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 8 Some Festival-Specific Terminology Utterance: *Name* of a data structure used in Festival Segment: A phone is referred to as segment

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 9 Basic Modules of Festival TTS system There are many modules in the Festival system - the basic modules used for text-to-speech are: Token_POS –basic token identification Token –Apply the token to word rules (handle non-standard words) POS –A standard part of speech tagger Phrasify –A Chunker, detect the phrase boundaries Word –Implements letter to sound rules Tokens: White Space separated European language: Space, CR, newline, tab, vertical tab etc.. Asian Languages: No white space separators – Use dictionaries Punctuation: The boy----was usually late-----but arrived on time!! We have orange/apple/banana flavors

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 10 Basic Modules of Festival TTS system contd.. Pauses –Prediction of pauses, inserting silences. Intonation –Prediction of accents: Which syllables have accent (stress) PostLex –Post lexicon rules that can modify segments based on their context. This is used for things like vowel reduction, contractions, etc. Duration –Prediction of durations of segments. Int_Targets –Realization of F0 contour: given the accents/tones generate an F0 contour. Wave_Synth –A general function that in turn calls the appropriate method to actually generate the waveform.

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 11 Data Structure in Festival Utterance: A dashboard data structure (as all modules read/write on a common memory) *Utterance* is the input and the output of every module in the Festival Module Utterance

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 12 Utterance consist of ? *Items* and *Relations* Items: –It is an object to store strings representing word, segment etc. Relation: –A graph which links the items –For example: “syllable” is a relation which links the items storing segment-names together

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 13 What Each Module Does to an Utterance Each module access *items* and *relations* in an utterance and generate new features, items and relations in the same utterance –For ex: Token_POS Input: Utterance with one item - a string representing a sentences Output: Utterance with multiple items – each item represents a token Synthesis process in Festival is viewed as applying a set of modules to an utterance

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 14 Synthesis Flow Modules June 25 Relations Text

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 15 Synthesis Flow Modules June 25 Relations Text June25 Token Tokenize TwentyFifthJune Word Token2Word POSNum Noun Num

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 16 Synthesis Flow TwentyFifthJune Word POSNum Noun Num 110 1 jh uu nt w e nt iif i f th Syllable Segment Word Wave SynthesizeWave

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 17 Installation of Festival & Festvox Step 1: Install Speech tools Step 2: Install Festival –Synthesize text in English to check the sound card, rate of speech etc. Step 3: Install Festvox Detailed Notes available from course web site

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 18 Building Limited Domain Unit selection is applied to a limited with restricted vocabulary High quality speech systems Units are words –Implementation in Festival: The units are still phone, but are restricted to be coming from a specific word –/p/ from “Pennsylvania” is differentiated from /p/ from “Pittsburgh” –To synthesize “Pittsburgh” all the phones should come from the word “Pittsburgh” (there may be many examples of the same word). Talking clock, Weather Prediction, Rail/Air Inquiry Systems http://www.cs.cmu.edu/~awb/papers/ICSLP2000_ldom/index.html

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 19 Limited Domain Setup (http://festvox.org/bsv/bsv-ldom-ch.html)http://festvox.org/bsv/bsv-ldom-ch.html 1. Set the Environment: $FESTVOXDIR/src/ldom/setup_ldom iiit time pra #This would give a talking clock set up. #To change it to any another domain, all you have to do is to replace "etc/time.data" #with the domain specific training sentences. #For non-english languages, these sentences are transliterated in English. 2. Generate Prompts –Synthesize the sentence which *you* are going to speak –How can you synthesize? – mostly applicable to English languages only –Why Synthesize at all? – To *prompt* you what to speak festival -b festvox/build_ldom.scm '(build_prompts "etc/txt.done.data")' 3. Record prompts –For new languages, switch off the * playing of the prompt* by commenting na_play in bin/prompt_them bin/prompt_them etc/txt.done.data 4. Label Automatically –Uses dynamic programming for labeling the speech –Labeling builds the correspondence between the text and the speech bin/make_labs prompt-wav/*.wav 4.1 Manually correct the labeling errors emulabel etc/emu_lab time0001

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 20 Contd… 5. Generate Pitch markers bin/make_pm_wave wav/*.wav 6. Correct the pitch markers bin/make_pm_fix pm/*.pm 7. Generate Mel Cepstral coefficients bin/make_mcep wav/*.wav 8. Generate Utterance Structure festival -b festvox/build_ldom.scm '(build_utts "etc/txt.done.data")' 9. Cluster the units festival -b festvox/build_ldom.scm '(build_clunits "etc/txt.done.data")' 10. Test the voice. festival festvox/iiit_time_pra_ldom '(voice_iiit_time_pra_ldom)' To see the units selected (set! utt (SayText "abhii samaya hai....") (clunits::units_selected utt "-")

Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad 21 References http://festvox.org 11-752 CMU course slides –http://festvox.org/festtut/http://festvox.org/festtut/ 11-752 CMU Course Lecture Notes –http://festvox.org/festtut/notes/festtut_toc.htmlhttp://festvox.org/festtut/notes/festtut_toc.html Building Synthetic Voices –http://www.festvox.org/bsv/http://www.festvox.org/bsv/ The Festival Speech Synthesis System –http://www.festvox.org/docs/manual-1.4.3/festival_toc.htmlhttp://www.festvox.org/docs/manual-1.4.3/festival_toc.html Edinburgh Speech Tools Library –http://www.festvox.org/docs/speech_tools-1.2.0/book1.htmhttp://www.festvox.org/docs/speech_tools-1.2.0/book1.htm

Kishore Prahallad IIIT Hyderabad 1 Building a Limited Domain Voice Using Festvox (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Similar presentations

Presentation on theme: "Kishore Prahallad IIIT Hyderabad 1 Building a Limited Domain Voice Using Festvox (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kishore Prahallad IIIT Hyderabad 1 Building a Limited Domain Voice Using Festvox (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Similar presentations

Presentation on theme: "Kishore Prahallad IIIT Hyderabad 1 Building a Limited Domain Voice Using Festvox (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)"— Presentation transcript:

Similar presentations

About project

Feedback