Presentation is loading. Please wait.

Presentation is loading. Please wait.

( Text to Speech & Voice Recognation )

Similar presentations


Presentation on theme: "( Text to Speech & Voice Recognation )"— Presentation transcript:

1 ( Text to Speech & Voice Recognation )
Matakuliah : Aplikasi Multimedia untuk Penerjemahan II Teknologi Bahasa ( Text to Speech & Voice Recognation ) Iwan Sonjaya,MT Slide : Arry Akhmad Arman Institut Teknologi Bandung

2 How small can you go? Still convenient?

3 Apa “Teknologi Bahasa”?

4 Komponen Teknologi Bahasa
Text to Speech Recogni- tion NLP: Language Translator

5 Apa “Text to Speech”? Text to Speech Text Ucapan

6 Indonesian Text to Speech System Intonation Model Diphone Database
Phonemes Phoneme Speech to Phoneme Converter to Speech Converter

7 Konversi Teks ke Ucapan
Bapak membeli 5 kerang seharga Rp 200,- Text Normalization Exception Dictionary Lookup Letter-to-Phoneme Conversion eme ext-to-Phone Te Prosody Generation Speech Parameter Speech Waveform Production ch eme-to-Speec Phone

8 Konversi Teks ke Ucapan
Saya membeli 5 kerang seharga Rp 200,- Text Normalization saya membeli lima kerang seharga dua ratus rupiah Exception Dictionary Lookup Letter-to-Phoneme Conversion eme ext-to-Phone Te Prosody Generation Speech Parameter Speech Waveform Production ch eme-to-Speec Phone

9 Konversi Teks ke Ucapan
Saya membeli 5 kerang seharga Rp 200,- Text Normalization saya membeli lima kerang seharga dua ratus rupiah *|s|* => |s| *|a| =>|a| *|n|~g => |n| *|n|g => |ñ| n|g|* => |blank| Exception Dictionary Lookup Letter-to-Phoneme Conversion eme ext-to-Phone Te ~n|g|* => |g| Prosody Generation Speech Parameter Speech Waveform Production ch eme-to-Speec Phone

10 Konversi Teks ke Ucapan
Saya membeli 5 kerang seharga Rp 200,- Text Normalization saya membeli lima kerang seharga dua ratus rupiah *|s|* => |s| *|a| =>|a| *|n|~g => |n| *|n|g => |ñ| teknik => /t//E//k/ /n//i//k/ Exception Dictionary Lookup Letter-to-Phoneme Conversion eme ext-to-Phone Te n|g|* => |blank| ~n|g|* => |g| Prosody IT => /a//i//t//i/ Generation Speech Parameter Generation Speech Waveform Production ch eme-to-Speec Phone

11 Konversi Teks ke Ucapan
Saya membeli 5 kerang seharga Rp 200,- Text Normalization saya membeli lima kerang seharga dua ratus rupiah *|s|* => |s| *|a| =>|a| *|n|~g => |n| *|n|g => |ñ| teknik => /t//E//k/ /n//i//k/ Exception Dictionary Lookup Letter-to-Phoneme Conversion eme ext-to-Phone Te n|g|* => |blank| ~n|g|* => |g| |_||s||a||y||a| … |_||k||e||r||a||ñ| |_|… Prosody Generation Speech Parameter Speech Waveform Production ch eme-to-Speec Phone

12 Konversi Teks ke Ucapan
Saya membeli 5 kerang seharga Rp 200,- Text Normalization saya membeli lima kerang seharga dua ratus rupiah Exception Dictionary Lookup *|s|* => |s| *|a| =>|a| *|n|~g => |n| *|n|g => |ñ| n|g|* => |blank| ~n|g|* => |g| |_||s||a||y||a| … |_||k||e||r||a|| ñ | |_|… Letter-to-Phoneme Conversion Prosody Generation teknik => /t//E//k/ /n//i//k/ |_| ,100ms |s| ,60ms, 97Hz eme ext-to-Phone Te |a| ,85ms, 100Hz …. |r| ,55ms, 110Hz |a| ,90ms, 114Hz | ñ|, 87ms ,117Hz Speech Parameter Generation Speech Waveform Production ch eme-to-Speec Phone

13 Konversi Teks ke Ucapan
Saya membeli 5 kerang seharga Rp 200,- Text Normalization saya membeli lima kerang seharga dua ratus rupiah Exception Dictionary Lookup *|s|* => |s| *|a| =>|a| *|n|~g => |n| *|n|g => |ñ| n|g|* => |blank| ~n|g|* => |g| |_||s||a||y||a| … |_||k||e||r||a|| ñ | |_|… Letter-to-Phoneme Conversion Prosody Generation teknik => /t//E//k/ /n//i//k/ |_| ,100ms |s| ,60ms, 97Hz eme ext-to-Phone Te |a| ,85ms, 100Hz …. |r| ,55ms, 110Hz |a| ,90ms, 114Hz | ñ|, 87ms ,117Hz Speech Parameter Generation Speech Waveform Production ch eme-to-Speec Phone

14 Teknik Pembangkitan Ucapan
• Formant Synthesizer (penentuan parameter frekuensi untuk setiap fonem) • Concatenation (rekaman kata yang disambung) – Word concatenation (terbatas) – Diphone Concatenation (teknik yang saat ini digunakan untuk TTS Bahasa Indonesia) – Unit Selection (today’s most uptodate TTS) • Articulatory Model (penentuan parameter fisik alat-alat ucap manusia untuk setiap fonem)

15 [Teknik Pembangkitan Ucapan]
Formant Synthesizer

16 Formant Synthesizer [Teknik Pembangkitan Ucapan] Formant /a/ F1 F2 F3
180 272 390 Formant Synthesizer /a/ module

17 Diphone Concatenation
[Teknik Pembangkitan Ucapan] Diphone Concatenation _|s = wav11 s|a = wav23 a|y = wav54 y|a =wav167 a|_ =wav365 _/s s/a a/y y/a Diphone Concate- nation Engine /s//a/y/a/ Diphone Sequencer a/_

18

19

20 SPEECH RECOGNITION

21 Speech recognition is a process by which a computer takes a speech signal (recorded using a microphone) and converts it into words in real-time. It is achieved by following certain steps and the software responsible for it is known as a ‘Speech Recognition System’ SR systems are usually implemented in the form of dictation software and intelligent assistants in personal computers, smartphones, web browsers and many other devices.

22 Apa “Speech Recognition”?
Ucapan Text

23 Speech Recognition System

24 CHALLENGES IN THE DESIGN OF A SR SYSTEM
SR systems have to deal with a large number of challenges like :- The speaker’s voice is often accompanied by surrounding noise which makes their accurate recognition difficult. A speaker may speak a number of different words and all of these words have to be accurately recognized. Accent of speaking varies from person to person and this is a very big challenge A speaker may speak something very quickly and all of the words spoken have to be individually recognized accurately.

25 TYPES OF SR SYSTEMS Speaker Dependent SR systems : Work by learning the unique characteristics of a single person’s voice and depend on the speaker for training. Speaker Independent SR systems : Designed to recognize anyone’s voice, so no training is involved.

26 BASIC PRINCIPLES OF SPEECH RECOGNITION
The smallest unit of spoken language is known as a Phoneme. The English language contains approximately 44 phonemes representing all the vowels and consonants that we use for speech. We can take the example of a typical word such as moon which can be broken down into three phonemes: m, ue, n.

27 To interpret speech we must have a way of identifying the components of spoken words and phonemes act as identifying markers within speech. An algorithm has to be used to interpret the speech further. The Hidden Markov Model is a commonly used mathematical model used to do this. To create a speech recognition engine, a large database of models is created to match each phoneme. When a comparison is performed, the most likely match is determined between the spoken phoneme and the stored one, and further computations are performed.

28 Popular Voice/Speech Recognition Software
It seems that in researching this topic, Dragon NaturallySpeaking is the most popular software used. They even have an app for your iphone! It has a 99% accuracy level, which is the best out there. This software is that it is expensive (about $200), and it uses a lot of computer memory.

29 Benefits of Voice/Speech Recognition Software
Voice recognition software helps children with physical and mental disabilities stay on par with their peers, and puts them on a more equal level. They are able to get the same information as other students, even if they have trouble reading, and they are able to communicate their ideas, even if they have trouble writing/typing. It saves them time as well, as many students with these disabilities would take much longer to read and write without this software, and not get as accurate results.

30 Weaknesses of Voice Recognition Software
Although voice recognition technology has come a long way, it still has some flaws. For example, even though you can talk fairly conversationally and still have high accuracy, there are always issues with having 100% accuracy, especially if you have a thick accent. It is also necessary that you do speech to text in a quiet room, where background noise doesn't interfere with the recognition of what you are saying. Also, a significant amount of hardware space is taken up by these programs, since they need to have an extensive vocabulary. Depending on your computer, this can be harmful to it. This software can also have difficulty with homonyms, so when you say “there,” they could interpret it as “they're” or “their” as well.

31 The Future of Voice/Speech Recognition Software
Scientists are currently working on a universal voice recognition translator of sorts, where people of any language can speak, and what they say can be translated into any language, in both speech and text formats. Though far in the future, it may also be possible for computers to not only recognize what you are saying, but understand what you are saying and communicate back with you as well. (crazy!)

32

33 Terimakasih….. Untuk mahasiswa/i yang tidak ngantuk dan tetap konsentrasi Mengikuti Perkuliahan. Sampai berjumpa minggu depan …….. (Dalam perkuliahan dan dosen yang sama)


Download ppt "( Text to Speech & Voice Recognation )"

Similar presentations


Ads by Google