( Text to Speech & Voice Recognation )

Slides:



Advertisements
Similar presentations
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Advertisements

                      Digital Audio 1.
5-Text To Speech (TTS) Speech Synthesis
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Auditory User Interfaces
Software Engineering Rekayasa Perangkat Lunak Kuliah 05.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Natural Language Understanding
Some Voice Enable Component Group member: CHUAH SIONG YANG LIM CHUN HEAN Advisor: Professor MICHEAL Project Purpose: For the developers,
1 “ Speech ” EMPOWERED COMPUTING Greenfield Business Centre, 20 th September, 2006.
Speech & Language Modeling Cindy Burklow & Jay Hatcher CS521 – March 30, 2006.
04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Practical AT session 3 WP4-D4.2. Prepared by: Shams Eldin Mohamed Ahmed Hassan Speech, Text and Braille AT.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
Helynn Boughner EDU 674 Prof. Klein.  Is any technology that can help a person do a task. It can be as high- tech, as a computer system that speaks the.
Software Engineering SM ? 1. Outline of this presentation What is SM The Need for SM Type of SM Size Oriented Metric Function Oriented Metric 218/10/2015.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Sound (Suara) Kuliah 04.
CSCI-100 Introduction to Computing Hardware Part II.
Performance Comparison of Speaker and Emotion Recognition
Higher Vision, language and movement. Strong AI Is the belief that AI will eventually lead to the development of an autonomous intelligent machine. Some.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Speech Recognition Created By : Kanjariya Hardik G.
Notes for Speech Recognition. Speech Recognition Continuous Speech Recognition (CSR) is the software that allows users to speak normally and input data.
بسم الله الرحمن الرحيم 1. الملتقى العلمي الأول لقسم اللغة الانجليزية C OMMON C HALLENGES F ACING E NGLISH L EARNERS.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Speech User Interface 10/26/2010. Pervasive Information Access Information & Services I-Land vision by Streitz, et. al.
IIS for Speech Processing Michael J. Watts
How can speech technology be used to help people with disabilities?
Chapter 15 Recording and Editing Sound
G. Anushiya Rachel Project Officer
Software Engineering Rekayasa Perangkat Lunak
Teknologi Bahasa Indonesian Sign Language to Speech Converter
Speech Recognition
Natural Language Processing and Speech Enabled Applications
Speech Recognition There are different kinds of voice or speech "engines" that take the sounds of your voice and match it with words. The engine is software.
Automatic Speech Recognition
Sound (Suara) Kuliah 04.
Text-To-Speech System for English
Artificial Intelligence for Speech Recognition
A presentation on Basics of Speech Recognition Systems
Aplikasi Multimedia 2 untuk Penerjemahan
Introduction CSE 1310 – Introduction to Computers and Programming
Rekayasa Perangkat Lunak
Sound (Suara) Kuliah 04.
Sound (Suara) Kuliah 04.
                      Digital Audio 1.
Mentors: Christine Lisetti and Ugan Yasavur
Sound (Suara) Kuliah 04.
Speech Recognition There are different kinds of voice or speech "engines" that take the sounds of your voice and match it with words. The engine is software.
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Introduction Multimedia Pertemuan 1 Interactive Multimedia
Informatique et Phonétique
Objective of This Course
Command Me Specification
Rekayasa Perangkat Lunak
Assistive Technology to Support Reading and Writing
Phoneme Recognition Using Neural Networks by Albert VanderMeulen
Indian Institute of Technology Bombay
Artificial Intelligence
Information Retrieval
Chapter 9 Audio.
Presentation transcript:

( Text to Speech & Voice Recognation ) Matakuliah : Aplikasi Multimedia untuk Penerjemahan II Teknologi Bahasa ( Text to Speech & Voice Recognation ) Iwan Sonjaya,MT Slide : Arry Akhmad Arman Institut Teknologi Bandung

How small can you go? Still convenient?

Apa “Teknologi Bahasa”?

Komponen Teknologi Bahasa Text to Speech Recogni- tion NLP: Language Translator

Apa “Text to Speech”? Text to Speech Text Ucapan

Indonesian Text to Speech System Intonation Model Diphone Database Phonemes Phoneme Speech to Phoneme Converter to Speech Converter

Konversi Teks ke Ucapan Bapak membeli 5 kerang seharga Rp 200,- Text Normalization Exception Dictionary Lookup Letter-to-Phoneme Conversion eme ext-to-Phone Te Prosody Generation Speech Parameter Speech Waveform Production ch eme-to-Speec Phone

Konversi Teks ke Ucapan Saya membeli 5 kerang seharga Rp 200,- Text Normalization saya membeli lima kerang seharga dua ratus rupiah Exception Dictionary Lookup Letter-to-Phoneme Conversion eme ext-to-Phone Te Prosody Generation Speech Parameter Speech Waveform Production ch eme-to-Speec Phone

Konversi Teks ke Ucapan Saya membeli 5 kerang seharga Rp 200,- Text Normalization saya membeli lima kerang seharga dua ratus rupiah *|s|* => |s| *|a| =>|a| *|n|~g => |n| *|n|g => |ñ| n|g|* => |blank| Exception Dictionary Lookup Letter-to-Phoneme Conversion eme ext-to-Phone Te ~n|g|* => |g| Prosody Generation Speech Parameter Speech Waveform Production ch eme-to-Speec Phone

Konversi Teks ke Ucapan Saya membeli 5 kerang seharga Rp 200,- Text Normalization saya membeli lima kerang seharga dua ratus rupiah *|s|* => |s| *|a| =>|a| *|n|~g => |n| *|n|g => |ñ| teknik => /t//E//k/ /n//i//k/ Exception Dictionary Lookup Letter-to-Phoneme Conversion eme ext-to-Phone Te n|g|* => |blank| ~n|g|* => |g| Prosody IT => /a//i//t//i/ Generation Speech Parameter Generation Speech Waveform Production ch eme-to-Speec Phone

Konversi Teks ke Ucapan Saya membeli 5 kerang seharga Rp 200,- Text Normalization saya membeli lima kerang seharga dua ratus rupiah *|s|* => |s| *|a| =>|a| *|n|~g => |n| *|n|g => |ñ| teknik => /t//E//k/ /n//i//k/ Exception Dictionary Lookup Letter-to-Phoneme Conversion eme ext-to-Phone Te n|g|* => |blank| ~n|g|* => |g| |_||s||a||y||a| … |_||k||e||r||a||ñ| |_|… Prosody Generation Speech Parameter Speech Waveform Production ch eme-to-Speec Phone

Konversi Teks ke Ucapan Saya membeli 5 kerang seharga Rp 200,- Text Normalization saya membeli lima kerang seharga dua ratus rupiah Exception Dictionary Lookup *|s|* => |s| *|a| =>|a| *|n|~g => |n| *|n|g => |ñ| n|g|* => |blank| ~n|g|* => |g| |_||s||a||y||a| … |_||k||e||r||a|| ñ | |_|… Letter-to-Phoneme Conversion Prosody Generation teknik => /t//E//k/ /n//i//k/ |_| ,100ms |s| ,60ms, 97Hz eme ext-to-Phone Te |a| ,85ms, 100Hz …. |r| ,55ms, 110Hz |a| ,90ms, 114Hz | ñ|, 87ms ,117Hz … Speech Parameter Generation Speech Waveform Production ch eme-to-Speec Phone

Konversi Teks ke Ucapan Saya membeli 5 kerang seharga Rp 200,- Text Normalization saya membeli lima kerang seharga dua ratus rupiah Exception Dictionary Lookup *|s|* => |s| *|a| =>|a| *|n|~g => |n| *|n|g => |ñ| n|g|* => |blank| ~n|g|* => |g| |_||s||a||y||a| … |_||k||e||r||a|| ñ | |_|… Letter-to-Phoneme Conversion Prosody Generation teknik => /t//E//k/ /n//i//k/ |_| ,100ms |s| ,60ms, 97Hz eme ext-to-Phone Te |a| ,85ms, 100Hz …. |r| ,55ms, 110Hz |a| ,90ms, 114Hz | ñ|, 87ms ,117Hz … Speech Parameter Generation Speech Waveform Production ch eme-to-Speec Phone

Teknik Pembangkitan Ucapan • Formant Synthesizer (penentuan parameter frekuensi untuk setiap fonem) • Concatenation (rekaman kata yang disambung) – Word concatenation (terbatas) – Diphone Concatenation (teknik yang saat ini digunakan untuk TTS Bahasa Indonesia) – Unit Selection (today’s most uptodate TTS) • Articulatory Model (penentuan parameter fisik alat-alat ucap manusia untuk setiap fonem)

[Teknik Pembangkitan Ucapan] Formant Synthesizer

Formant Synthesizer [Teknik Pembangkitan Ucapan] Formant /a/ F1 F2 F3 180 272 390 171 293 377 180 272 390 Formant Synthesizer /a/ module

Diphone Concatenation [Teknik Pembangkitan Ucapan] Diphone Concatenation _|s = wav11 s|a = wav23 a|y = wav54 y|a =wav167 a|_ =wav365 _/s s/a a/y y/a Diphone Concate- nation Engine /s//a/y/a/ Diphone Sequencer a/_

SPEECH RECOGNITION

Speech recognition is a process by which a computer takes a speech signal (recorded using a microphone) and converts it into words in real-time. It is achieved by following certain steps and the software responsible for it is known as a ‘Speech Recognition System’ SR systems are usually implemented in the form of dictation software and intelligent assistants in personal computers, smartphones, web browsers and many other devices.

Apa “Speech Recognition”? Ucapan Text

Speech Recognition System

CHALLENGES IN THE DESIGN OF A SR SYSTEM SR systems have to deal with a large number of challenges like :- The speaker’s voice is often accompanied by surrounding noise which makes their accurate recognition difficult. A speaker may speak a number of different words and all of these words have to be accurately recognized. Accent of speaking varies from person to person and this is a very big challenge A speaker may speak something very quickly and all of the words spoken have to be individually recognized accurately.

TYPES OF SR SYSTEMS Speaker Dependent SR systems : Work by learning the unique characteristics of a single person’s voice and depend on the speaker for training. Speaker Independent SR systems : Designed to recognize anyone’s voice, so no training is involved.

BASIC PRINCIPLES OF SPEECH RECOGNITION The smallest unit of spoken language is known as a Phoneme. The English language contains approximately 44 phonemes representing all the vowels and consonants that we use for speech. We can take the example of a typical word such as moon which can be broken down into three phonemes: m, ue, n.

To interpret speech we must have a way of identifying the components of spoken words and phonemes act as identifying markers within speech. An algorithm has to be used to interpret the speech further. The Hidden Markov Model is a commonly used mathematical model used to do this. To create a speech recognition engine, a large database of models is created to match each phoneme. When a comparison is performed, the most likely match is determined between the spoken phoneme and the stored one, and further computations are performed.

Popular Voice/Speech Recognition Software It seems that in researching this topic, Dragon NaturallySpeaking is the most popular software used. They even have an app for your iphone! It has a 99% accuracy level, which is the best out there. This software is that it is expensive (about $200), and it uses a lot of computer memory.

Benefits of Voice/Speech Recognition Software Voice recognition software helps children with physical and mental disabilities stay on par with their peers, and puts them on a more equal level. They are able to get the same information as other students, even if they have trouble reading, and they are able to communicate their ideas, even if they have trouble writing/typing. It saves them time as well, as many students with these disabilities would take much longer to read and write without this software, and not get as accurate results.

Weaknesses of Voice Recognition Software Although voice recognition technology has come a long way, it still has some flaws. For example, even though you can talk fairly conversationally and still have high accuracy, there are always issues with having 100% accuracy, especially if you have a thick accent. It is also necessary that you do speech to text in a quiet room, where background noise doesn't interfere with the recognition of what you are saying. Also, a significant amount of hardware space is taken up by these programs, since they need to have an extensive vocabulary. Depending on your computer, this can be harmful to it. This software can also have difficulty with homonyms, so when you say “there,” they could interpret it as “they're” or “their” as well.

The Future of Voice/Speech Recognition Software Scientists are currently working on a universal voice recognition translator of sorts, where people of any language can speak, and what they say can be translated into any language, in both speech and text formats. Though far in the future, it may also be possible for computers to not only recognize what you are saying, but understand what you are saying and communicate back with you as well. (crazy!)

Terimakasih….. Untuk mahasiswa/i yang tidak ngantuk dan tetap konsentrasi Mengikuti Perkuliahan. Sampai berjumpa minggu depan …….. (Dalam perkuliahan dan dosen yang sama)