Download presentation
Presentation is loading. Please wait.
Published byWillis Austin Fitzgerald Modified over 9 years ago
2
Communicating with Robots using Speech: The Robot Talks (Speech Synthesis) Stephen Cox (sjc@cmp.uea.ac.uk) Chris Watkins (cw@cmp.uea.ac.uk) Ibrahim Almajai (ima@cmp.uea.ac.uk)
3
2 July 20 th 2005 Aims of Session q To learn something about text to speech synthesis i.e. making a computer read from a text. q To get the MBROLA speech synthesis system to produce a synthetic utterance by specifying a sequence of phonemes and their durations q To adjust the durations (lengths) of the phonemes to make the speech sound more natural q To adjust the pitch of the phonemes to alter the meaning of the sentence.
4
3 July 20 th 2005Phonemes q The sounds of a language can be described by phonemes q A phoneme is the smallest sound unit that makes a difference to a word e.g. “p” and “b” are different phonemes in English because “pat” and “bat” are different words. q To describe English, we need about 45 different phonemes. q Linguists use a special set of symbols for each phoneme and these symbols together form the International Phonetic Alphabet (IPA) e.g. q These symbols can’t be typed at the keyboard, so we replace each IPA symbol with one or more keyboard characters (this is called the SAMPA notation) aI s p i: t “recognize speech” e r e k n “recognize speech” r e k @ n aI s p i: tS
5
4 July 20 th 2005 How is Speech Synthesis Done (no 1)? q We record a large database of speech from a single speaker using high-quality equipment. q We then label the speech with phoneme symbols. q Here is an example of a fragment of an utterance:
6
5 July 20 th 2005 How is Speech Synthesis Done (no 2)? q Now suppose we want to synthesise the phrase “recognise speech” q First, we have to convert it to a sequence of phoneme symbols. There are dictionaries that can do this for us: q We should also specify the duration (length) of each phoneme and the pitch (how high or low) q The speech synthesiser programme then searches through its database to find the best sequence of phonemes q It joins the waveform segments of these phonemes together and plays out the resulting waveform: Notice: the speech sounds unnatural because: è the durations of the phonemes are all the same è the pitch is the same all the way through “recognize speech” r e k @ n aI s p i: tS
7
6 July 20 th 2005 Voice Pitch q Males have deeper voices than females: we call the highness or lowness of the voice its pitch. q Voice pitch is very similar to pitch in music q Pitch is measured in units of Hertz (the number of vibrations per second). Typically, a male speaker’s pitch is in the range 80-150 Hz and a female’s 120-280 Hz q Pitch is used in speech to convey meaning, emotion, emphasis etc.
8
7 July 20 th 2005 How Can Pitch Affect the Meaning of What is Said? Suppose someone said to you: “Is Glasgow the capital of Scotland?” You might reply “Edinburgh is the capital of Scotland” Pitch of your voice Now suppose they said: “Is Edinburgh the capital of England?” You might reply “Edinburgh is the capital of Scotland”
9
8 July 20 th 2005 What You Are Going To Do q Get familiar with the MBROLA speech synthesis software q Try synthesising some single words. To do this, you need to è Figure out what the sequence of phonemes in the word is. We have given you some examples to enable you to do this. è Type the sequence into the synthesiser software. Use a duration of (say) 50 (milliseconds) for each phoneme. è You don’t need to put in pitch at this stage. q Play with the phoneme durations to get a more natural sound for the word. q Now try synthesising the sentence “Edinburgh is the capital of Scotland”. Start with no pitch information q Now add some pitch information to make two sentences. One should answer the question: “Is Glasgow the capital of Scotland?” and the other the question “Is Edinburgh the capital of England?”.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.