HCI : Speech /Speaker Recognition System

Slides:

Advertisements

Similar presentations

Acoustic/Prosodic Features

Advertisements

Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.

Natural Language Processing - Speech Processing -

Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

Neural Net Algorithms for SC Vowel Recognition Presentation for EE645 Neural Networks and Learning Algorithms Spring 2003 Diana Stojanovic.

4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,

Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.

COMP 4060 Natural Language Processing Speech Processing.

Auditory User Interfaces

Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.

Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,

Why is ASR Hard? Natural speech is continuous

A PRESENTATION BY SHAMALEE DESHPANDE

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Natural Language Understanding

Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.

ISSUES IN SPEECH RECOGNITION Shraddha Sharma

Introduction to Automatic Speech Recognition

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Speaker Recognition By Afshan Hina.

Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.

7-Speech Recognition Speech Recognition Concepts

1 Computational Linguistics Ling 200 Spring 2006.

Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.

Juan Ortega 10/20/09 NTS490. Speaker recognition is the computing task of validating a user’s claimed identity using characteristics extracted from their.

Voice Recognition All Talk No Walk.

Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

1 Speech Synthesis User friendly machine must have complete voice communication abilities Voice communication involves Speech synthesis Speech recognition.

Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.

Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.

Introduction to Digital Speech Processing Presented by Dr. Allam Mousa 1 An Najah National University SP_1_intro.

Performance Comparison of Speaker and Emotion Recognition

© 2013 by Larson Technical Services

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.

Predicting Voice Elicited Emotions

BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.

INTRODUCTION TO BIOMATRICS ACCESS CONTROL SYSTEM Prepared by: Jagruti Shrimali Guided by : Prof. Chirag Patel.

Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.

RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.

By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.

Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.

INTRODUCTION TO APPLIED LINGUISTICS

PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.

2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science.

Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.

Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.

BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.

Chapter 1: Introduction to audio signal processing KH WONG, Rm 907, SHB, CSE Dept. CUHK,

Speech Recognition

ARTIFICIAL NEURAL NETWORKS

Artificial Intelligence for Speech Recognition

Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan

HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs

Ch.1: Introduction to audio signal processing

Kocaeli University Introduction to Engineering Applications

A maximum likelihood estimation and training on the fly approach

Artificial Intelligence 2004 Speech & Natural Language Processing

Presentation transcript:

HCI : Speech /Speaker Recognition System Dr. Bharti W. Gawali Associate Professor Department of Computer Science & Information Technology Dr.Babasaheb Ambedkar Marathwada University Aurangabad Email id: bharti_rokade@yahoo.co.in Dr.Bharti Gawali 06/03/2012

This tutorial will focus on : Introduction to speech Processing Salient features of Speech Recognition System Feature extraction methods Speaker recognition System Some handouts Dr.Bharti Gawali 06/03/2012

Introduction The fundamental purpose of speech is communication, i.e., the transmission of messages. In the case of speech, the fundamental analog form of the message is an acoustic waveform, which we call the speech signal. Dr.Bharti Gawali 06/03/2012

Production of speech When we speak, we let air pass from our lungs through our mouth and nasal cavity, and this air stream is restricted and changed with our tongue and lips. This contractions and expansions of the lungs, produces an acoustic wave, a sound. The sound forms, the vowels and consonants, are usually called phones. The phones are combined together into words. Dr.Bharti Gawali 06/03/2012

A block diagram of Human Speech Production Dr.Bharti Gawali 06/03/2012 A block diagram of Human Speech Production

SPEECH CHAIN The complete process of producing and perceiving speech from the formulation of a message in the brain of a talker, to the creation of the speech signal, and finally to the understanding of the message by a listener we have a speech chain from message, to speech signal, to understanding. Dr.Bharti Gawali 06/03/2012

Speech Chain Message formulation Language code Neuromuscular controls Vocal tracts System Generation of acoustic wave Transmission channel Neural transduction (feature extraction) Language translation Message Understanding Dr.Bharti Gawali 06/03/2012

. Layers for describing speech Acoustics Phonetics Phonology Morphology Syntax Semantics Dr.Bharti Gawali 06/03/2012

Speech signal with silence Events Of Speech Speech signal with silence Dr.Bharti Gawali 06/03/2012

Digital Representation of Speech This process of analog-to-digital conversion has two steps: sampling and quantization (Digitization). A signal is sampled by measuring its amplitude at a particular time; the sampling rate is the number of samples taken per second. In order to accurately measure a wave, it is necessary to have at least two samples in each cycle: one measuring the positive part of the wave and one measuring the negative part. Dr.Bharti Gawali 06/03/2012

Change in resonance changes sound. Production of speech sound spectrum, due to resonances in the vocal tract, called formants. Change in resonance changes sound. Thus speech wave s(n) = convolution of the source (e(n))* impulse response function of the filter h (n). In frequency domain: Dr.Bharti Gawali 06/03/2012

Speech processing can be divided into the following categories Speech recognition, which deals with analysis of the linguistic content of a speech signal. Speaker recognition, where the aim is to recognize the identity of the speaker. Speech coding, a specialized form of data compression, is important in the telecommunication area. Speech synthesis: the artificial synthesis of speech, which usually means computer-generated speech. Speech enhancement: enhancing the intelligibility and/or perceptual quality of a speech signal, like audio noise reduction for audio signals. Dr.Bharti Gawali 06/03/2012

Speech Recognition Basics Speech recognition is the process of deriving the sequence of speech sounds best matching the input speech signal. It is characterized by the size and shape of filter ( vocal cavity). The following definitions are the basics needed for understanding speech recognition technology. Utterance Vocabularies Training Dr.Bharti Gawali 06/03/2012

Approaches to speech recognition Template-based approaches In which unknown speech is compared against a set of prerecorded words (templates) Knowledge-based approaches In which “expert” knowledge about variations in speech is hand-coded into a system. Statistical-based approaches In which variations in speech are modeled statistically (e.g., by Hidden Markov Models, or HMMs), using automatic learning procedures Dr.Bharti Gawali 06/03/2012

Types of Speech Recognition Isolated Words Example: "start”, “stop”, “ON”, “OF” Connected Words Example: 9766869081 Continuous Speech Example: Today I am presenting a lecture. Spontaneous Speech Example: commentators. Dr.Bharti Gawali 06/03/2012

Isolated Word Dr.Bharti Gawali 06/03/2012

Continuous Sentences Dr.Bharti Gawali 06/03/2012

Signal Sentence Hypothesis Feature Extraction Matching Acoustic Model Acoustic domain Matching Symbolic domain Language Model Speech recognition is a special case of pattern recognition. Sentence Hypothesis Dr.Bharti Gawali Block Diagram of speech recognition 06/03/2012

Feature Extraction Technique The speech feature extraction in a categorization problem is about reducing the dimensionality of the input vector while maintaining the discriminating power of the signal. As we know from fundamental formation of speaker identification and verification system, that the number of training and test vector needed for the classification problem grows with the dimension of the given input so we need feature extraction of speech signal. Dr.Bharti Gawali 06/03/2012

Cont.… Following are some feature extraction. Linear Discriminate Analysis(LDA) Mel-frequency cepstrum (MFFCs) Dynamic time warping Independent Component Analysis (ICA) Linear Predictive coding Cepstral Analysis Filter bank analysis Kernel based feature extraction method Wavelet Dr.Bharti Gawali 06/03/2012

Speech Recognition Enables Many Applications Voice based IVR systems and services that can remain available 24x7 Indexing of audio recordings such as internet (Google) search and may be, searching of audio recordings Hands-busy or eyes-busy applications, such as where the user has objects to manipulate or equipment to control. Telephony, where speech recognition is used for example in spoken dialogue systems for entering digits, recognizing words to accept collect calls, finding out airplane or train information, and call-routing etc. interaction between computers and humans with some disability resulting in the inability to type, or the inability to speak Dr.Bharti Gawali 06/03/2012

Speech Recognition Software CMU Sphinx Homepage: http://www.speech.cs.cmu.edu/sphinx/Sphinx.html Praat Homepage: www.fon.hum.uva.nl/praat/download_win.html HTK htk.eng.cam.ac.uk/download.shtml Matlab SFS Dr.Bharti Gawali 06/03/2012

Challenges in Speech Recognition Speaking style: clear, spontaneous, slurred or sloppy Speaking rate: fast or slow speech Speaking rate can change within a single sentence Emotional state: happy, sad, etc. Emphasis: stressed speech vs unstressed speech Accents, dialects, foreign words Environmental or background noise Even the same person never speaks exactly the same way twice Large vocabulary and infinite language Absence of word boundary markers in continuous speech Inherent ambiguities: “I scream” or “Ice cream”? Dr.Bharti Gawali 06/03/2012

PERFORMANCE OF SYSTEMS The performance of speech recognition systems is usually specified in terms of accuracy and speed. Accuracy may be measured in terms of performance accuracy which is usually rate with word error rate (WER), whereas speed is measured with the real time factor. Where S is the number of substitutions, D is the number of the deletions, I is the number of the insertions, N is the number of words in the reference Dr.Bharti Gawali 06/03/2012

Speaker Recognition System It is a process of VALIDATING a user’s claim to an identity USING CHARACTERISTICS EXTRACTED FROM THEIR VOICE. It started four decades back. Uses acoustic features of speech that is different in two individuals. The acoustic patterns reflect both anatomy And learned behavioral patterns. Dr.Bharti Gawali 06/03/2012

Each speaker recognition system has two phases: Enrollment and verification. During enrollment, the speaker's voice is recorded and typically a number of features are extracted to form a voice print, template, or model. In the verification phase, a speech sample or "utterance" is compared against a previously created voice print. For identification systems, the utterance is compared against multiple voice prints in order to determine the best match(es) while verification systems compare an utterance against a single voice print. Because of the process involved, verification is faster than identification. Dr.Bharti Gawali 06/03/2012

Block diagram of Typical Speaker verification system Model Generation Threshold Criterion Input Signal Processing Accepted Pattern Matching Decision Logic Rejected Dr.Bharti Gawali 06/03/2012

There are two basic modes of speaker verification: Text independent mode (Voice characteristics of speaker) Text dependent mode ( predetermined text is used) Text prompted speaker verification (system prompts to speaker) Dr.Bharti Gawali 06/03/2012

Gaussian mixture models, pattern matching algorithms, neural networks, Technology The various technologies used to process and store voice prints include frequency estimation, hidden Markov models, Gaussian mixture models, pattern matching algorithms, neural networks, matrix representation, Vector Quantization and decision trees. Dr.Bharti Gawali 06/03/2012

Searching IVRS Database Telephonic Card Name of Crop Symptom Call Connect to IVRS Searching IVRS Database Continue Call Reply from Machine Call Ended Farmer IVRS System Dr.Bharti Gawali 06/03/2012

The Speech Recognition Tool Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012

Books for Speech Recognition Fundamentals of Speech Recognition". L. Rabiner & B. Juang. 1993. ISBN: 0130151572. "How to Build a Speech Recognition Application". B. Balentine, D. Morgan, and W. Meisel. 1999. ISBN: 0967127815. "Speech Recognition : Theory and C++ Implementation". C. Becchetti and L.P. Ricotti. 1999. ISBN: 0471977306. "Applied Speech Technology". A. Syrdal, R. Bennett, S. Greenspan. 1994. ISBN: 0849394562. "Speech Recognition : The Complete Practical Reference Guide". P. Foster, T. Schalk. 1993. ISBN: 0936648392 "Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition". D. Jurafsky, J. Martin. 2000. ISBN: 0130950696 Dr.Bharti Gawali 06/03/2012

Contd.. "Discrete-Time Processing of Speech Signals (IEEE Press Classic Reissue)". J. Deller, J. Hansen, J. Proakis. 1999. ISBN: 0780353862. Statistical Methods for Speech Recognition (Language, Speech, and Communication)". F. Jelinek. 1999. ISBN: 0262100665. Digital Processing of Speech Signals" L. Rabiner, R. Schafer. 1978. ISBN: 0132136031 Foundations of Statistical Natural Language Processing". C. Manning, H. Schutze. 1999. ISBN: 0262133601. "Designing Effective Speech Interfaces". S. Weinschenk, D. T. Barker. 2000. ISBN: 0471375454. Dr.Bharti Gawali 06/03/2012

Dr.Bharti Gawali 06/03/2012