HCI : Speech /Speaker Recognition System Dr. Bharti W. Gawali Associate Professor Department of Computer Science & Information Technology Dr.Babasaheb Ambedkar Marathwada University Aurangabad Email id: bharti_rokade@yahoo.co.in Dr.Bharti Gawali 06/03/2012
This tutorial will focus on : Introduction to speech Processing Salient features of Speech Recognition System Feature extraction methods Speaker recognition System Some handouts Dr.Bharti Gawali 06/03/2012
Introduction The fundamental purpose of speech is communication, i.e., the transmission of messages. In the case of speech, the fundamental analog form of the message is an acoustic waveform, which we call the speech signal. Dr.Bharti Gawali 06/03/2012
Production of speech When we speak, we let air pass from our lungs through our mouth and nasal cavity, and this air stream is restricted and changed with our tongue and lips. This contractions and expansions of the lungs, produces an acoustic wave, a sound. The sound forms, the vowels and consonants, are usually called phones. The phones are combined together into words. Dr.Bharti Gawali 06/03/2012
A block diagram of Human Speech Production Dr.Bharti Gawali 06/03/2012 A block diagram of Human Speech Production
SPEECH CHAIN The complete process of producing and perceiving speech from the formulation of a message in the brain of a talker, to the creation of the speech signal, and finally to the understanding of the message by a listener we have a speech chain from message, to speech signal, to understanding. Dr.Bharti Gawali 06/03/2012
Speech Chain Message formulation Language code Neuromuscular controls Vocal tracts System Generation of acoustic wave Transmission channel Neural transduction (feature extraction) Language translation Message Understanding Dr.Bharti Gawali 06/03/2012
. Layers for describing speech Acoustics Phonetics Phonology Morphology Syntax Semantics Dr.Bharti Gawali 06/03/2012
Speech signal with silence Events Of Speech Speech signal with silence Dr.Bharti Gawali 06/03/2012
Digital Representation of Speech This process of analog-to-digital conversion has two steps: sampling and quantization (Digitization). A signal is sampled by measuring its amplitude at a particular time; the sampling rate is the number of samples taken per second. In order to accurately measure a wave, it is necessary to have at least two samples in each cycle: one measuring the positive part of the wave and one measuring the negative part. Dr.Bharti Gawali 06/03/2012
Change in resonance changes sound. Production of speech sound spectrum, due to resonances in the vocal tract, called formants. Change in resonance changes sound. Thus speech wave s(n) = convolution of the source (e(n))* impulse response function of the filter h (n). In frequency domain: Dr.Bharti Gawali 06/03/2012
Speech processing can be divided into the following categories Speech recognition, which deals with analysis of the linguistic content of a speech signal. Speaker recognition, where the aim is to recognize the identity of the speaker. Speech coding, a specialized form of data compression, is important in the telecommunication area. Speech synthesis: the artificial synthesis of speech, which usually means computer-generated speech. Speech enhancement: enhancing the intelligibility and/or perceptual quality of a speech signal, like audio noise reduction for audio signals. Dr.Bharti Gawali 06/03/2012
Speech Recognition Basics Speech recognition is the process of deriving the sequence of speech sounds best matching the input speech signal. It is characterized by the size and shape of filter ( vocal cavity). The following definitions are the basics needed for understanding speech recognition technology. Utterance Vocabularies Training Dr.Bharti Gawali 06/03/2012
Approaches to speech recognition Template-based approaches In which unknown speech is compared against a set of prerecorded words (templates) Knowledge-based approaches In which “expert” knowledge about variations in speech is hand-coded into a system. Statistical-based approaches In which variations in speech are modeled statistically (e.g., by Hidden Markov Models, or HMMs), using automatic learning procedures Dr.Bharti Gawali 06/03/2012
Types of Speech Recognition Isolated Words Example: "start”, “stop”, “ON”, “OF” Connected Words Example: 9766869081 Continuous Speech Example: Today I am presenting a lecture. Spontaneous Speech Example: commentators. Dr.Bharti Gawali 06/03/2012
Isolated Word Dr.Bharti Gawali 06/03/2012
Continuous Sentences Dr.Bharti Gawali 06/03/2012
Signal Sentence Hypothesis Feature Extraction Matching Acoustic Model Acoustic domain Matching Symbolic domain Language Model Speech recognition is a special case of pattern recognition. Sentence Hypothesis Dr.Bharti Gawali Block Diagram of speech recognition 06/03/2012
Feature Extraction Technique The speech feature extraction in a categorization problem is about reducing the dimensionality of the input vector while maintaining the discriminating power of the signal. As we know from fundamental formation of speaker identification and verification system, that the number of training and test vector needed for the classification problem grows with the dimension of the given input so we need feature extraction of speech signal. Dr.Bharti Gawali 06/03/2012
Cont.… Following are some feature extraction. Linear Discriminate Analysis(LDA) Mel-frequency cepstrum (MFFCs) Dynamic time warping Independent Component Analysis (ICA) Linear Predictive coding Cepstral Analysis Filter bank analysis Kernel based feature extraction method Wavelet Dr.Bharti Gawali 06/03/2012
Speech Recognition Enables Many Applications Voice based IVR systems and services that can remain available 24x7 Indexing of audio recordings such as internet (Google) search and may be, searching of audio recordings Hands-busy or eyes-busy applications, such as where the user has objects to manipulate or equipment to control. Telephony, where speech recognition is used for example in spoken dialogue systems for entering digits, recognizing words to accept collect calls, finding out airplane or train information, and call-routing etc. interaction between computers and humans with some disability resulting in the inability to type, or the inability to speak Dr.Bharti Gawali 06/03/2012
Speech Recognition Software CMU Sphinx Homepage: http://www.speech.cs.cmu.edu/sphinx/Sphinx.html Praat Homepage: www.fon.hum.uva.nl/praat/download_win.html HTK htk.eng.cam.ac.uk/download.shtml Matlab SFS Dr.Bharti Gawali 06/03/2012
Challenges in Speech Recognition Speaking style: clear, spontaneous, slurred or sloppy Speaking rate: fast or slow speech Speaking rate can change within a single sentence Emotional state: happy, sad, etc. Emphasis: stressed speech vs unstressed speech Accents, dialects, foreign words Environmental or background noise Even the same person never speaks exactly the same way twice Large vocabulary and infinite language Absence of word boundary markers in continuous speech Inherent ambiguities: “I scream” or “Ice cream”? Dr.Bharti Gawali 06/03/2012
PERFORMANCE OF SYSTEMS The performance of speech recognition systems is usually specified in terms of accuracy and speed. Accuracy may be measured in terms of performance accuracy which is usually rate with word error rate (WER), whereas speed is measured with the real time factor. Where S is the number of substitutions, D is the number of the deletions, I is the number of the insertions, N is the number of words in the reference Dr.Bharti Gawali 06/03/2012
Speaker Recognition System It is a process of VALIDATING a user’s claim to an identity USING CHARACTERISTICS EXTRACTED FROM THEIR VOICE. It started four decades back. Uses acoustic features of speech that is different in two individuals. The acoustic patterns reflect both anatomy And learned behavioral patterns. Dr.Bharti Gawali 06/03/2012
Each speaker recognition system has two phases: Enrollment and verification. During enrollment, the speaker's voice is recorded and typically a number of features are extracted to form a voice print, template, or model. In the verification phase, a speech sample or "utterance" is compared against a previously created voice print. For identification systems, the utterance is compared against multiple voice prints in order to determine the best match(es) while verification systems compare an utterance against a single voice print. Because of the process involved, verification is faster than identification. Dr.Bharti Gawali 06/03/2012
Block diagram of Typical Speaker verification system Model Generation Threshold Criterion Input Signal Processing Accepted Pattern Matching Decision Logic Rejected Dr.Bharti Gawali 06/03/2012
There are two basic modes of speaker verification: Text independent mode (Voice characteristics of speaker) Text dependent mode ( predetermined text is used) Text prompted speaker verification (system prompts to speaker) Dr.Bharti Gawali 06/03/2012
Gaussian mixture models, pattern matching algorithms, neural networks, Technology The various technologies used to process and store voice prints include frequency estimation, hidden Markov models, Gaussian mixture models, pattern matching algorithms, neural networks, matrix representation, Vector Quantization and decision trees. Dr.Bharti Gawali 06/03/2012
Searching IVRS Database Telephonic Card Name of Crop Symptom Call Connect to IVRS Searching IVRS Database Continue Call Reply from Machine Call Ended Farmer IVRS System Dr.Bharti Gawali 06/03/2012
The Speech Recognition Tool Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012
Books for Speech Recognition Fundamentals of Speech Recognition". L. Rabiner & B. Juang. 1993. ISBN: 0130151572. "How to Build a Speech Recognition Application". B. Balentine, D. Morgan, and W. Meisel. 1999. ISBN: 0967127815. "Speech Recognition : Theory and C++ Implementation". C. Becchetti and L.P. Ricotti. 1999. ISBN: 0471977306. "Applied Speech Technology". A. Syrdal, R. Bennett, S. Greenspan. 1994. ISBN: 0849394562. "Speech Recognition : The Complete Practical Reference Guide". P. Foster, T. Schalk. 1993. ISBN: 0936648392 "Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition". D. Jurafsky, J. Martin. 2000. ISBN: 0130950696 Dr.Bharti Gawali 06/03/2012
Contd.. "Discrete-Time Processing of Speech Signals (IEEE Press Classic Reissue)". J. Deller, J. Hansen, J. Proakis. 1999. ISBN: 0780353862. Statistical Methods for Speech Recognition (Language, Speech, and Communication)". F. Jelinek. 1999. ISBN: 0262100665. Digital Processing of Speech Signals" L. Rabiner, R. Schafer. 1978. ISBN: 0132136031 Foundations of Statistical Natural Language Processing". C. Manning, H. Schutze. 1999. ISBN: 0262133601. "Designing Effective Speech Interfaces". S. Weinschenk, D. T. Barker. 2000. ISBN: 0471375454. Dr.Bharti Gawali 06/03/2012
Dr.Bharti Gawali 06/03/2012