Guided By, DINAKAR DAS.C.N ( Assistant professor ECE ) Presented by, ARUN.V.S S7 EC ROLL NO: 2 1.

Slides:



Advertisements
Similar presentations
Automatic Speech Recognition
Advertisements

Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Building an ASR using HTK CS4706
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.
Automatic Speech Recognition Slides now available at
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Natural Language Processing - Speech Processing -
Application of HMMs: Speech recognition “Noisy channel” model of speech.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
COMP 4060 Natural Language Processing Speech Processing.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Natural Language Understanding
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
SoundSense: Scalable Sound Sensing for People-Centric Application on Mobile Phones Hon Lu, Wei Pan, Nocholas D. lane, Tanzeem Choudhury and Andrew T. Campbell.
Eng. Shady Yehia El-Mashad
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Center for Human Computer Communication Department of Computer Science, OG I 1 Designing Robust Multimodal Systems for Diverse Users and Mobile Environments.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.
Automatic speech recognition What is the task? What are the main difficulties? How is it approached? How good is it? How much better could it be? 2/34.
Automatic Speech Recognition
Mr. Darko Pekar, Speech Morphing Inc.
Automatic Speech Recognition
ARTIFICIAL NEURAL NETWORKS
Spoken Digit Recognition
Artificial Intelligence for Speech Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
3.0 Map of Subject Areas.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Speech Processing Speech Recognition
CRANDEM: Conditional Random Fields for ASR
Statistical Models for Automatic Speech Recognition
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Command Me Specification
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Artificial Intelligence 2004 Speech & Natural Language Processing
Dr. Babasaheb Ambedkar Marathwada University, Aurangabad
Presentation transcript:

Guided By, DINAKAR DAS.C.N ( Assistant professor ECE ) Presented by, ARUN.V.S S7 EC ROLL NO: 2 1

INTRODUCTION The field of Automatic Speech Recognition (ASR) is about 60 years old. First speech recognizer was invented at BELL LABS in 1950 Development of ASR increased gradually until the invention of Hidden Markov Models 2

Content Speech Recognition Speech Recogniton based on HMM Architecture of HMM based speech recognition system Application Advantages and disadvantages Conclusion 3

SPEECH RECOGNITION Speech recognition task Speech recognition system concept Efficiency Lmitations 4

5 Speech recognition task Getting a computer to understand spoken language By “understand” we might mean React appropriately Convert the input speech into another medium, e.g. Text

SPEECH RECOGNITION CONCEPT 6

7 SPEECH RECOGNITION IN COMPUTERS Digitization Acoustic analysis of the speech signal Linguistic interpretation Acoustic waveformAcoustic signal Speech recognition

EFFICIENCY Clean environment 99.5% Noisy environment 88% 8

LMITATIONS IN SPEECH RECOGNITION Digitization Converting analogue signal into digital representation Signal processing Separating speech from background noise Phonetics Variability in human speech Phonology Recognizing individual sound distinctions (similar phonemes) Lexicology and syntax Disambiguating homophones Features of continuous speech Syntax and pragmatics Interpreting prosodic features Pragmatics Filtering of performance errors 9

Digitization  Analogue to digital conversion Sampling and quantizing Use filters to measure energy levels for various points on the frequency spectrum Knowing the relative importance of different frequency bands (for speech) makes this process more efficient E.g. high frequency sounds are less informative, so can be sampled using a broader bandwidth (log scale) 10

11 Separating speech from background noise Noise cancelling microphones Two mics, one facing speaker, the other facing away Ambient noise is roughly same for both mics Knowing which bits of the signal relate to speech Spectrograph analysis

Variability in individuals’ speech Variation among speakers due to Vocal range (f0, and pitch range) Voice quality (growl, whisper, physiological elements such as nasality, adenoidality, etc) ACCENT !!! (especially vowel systems, but also consonants, allophones, etc.) Variation within speakers due to Health, emotional state Ambient conditions Speech style: formal read vs spontaneous 12

13 Speaker-(in)dependent systems Speaker-dependent systems Require “training” to “teach” the system your individual idiosyncracies The more the merrier, but typically nowadays 5 or 10 minutes is enough User asked to pronounce some key words which allow computer to infer details of the user’s accent and voice Fortunately, languages are generally systematic More robust But less convenient And obviously less portable Speaker-independent systems Language coverage is reduced to compensate need to be flexible in phoneme identification Clever compromise is to learn on the fly

14 (Dis)continuous speech Discontinuous speech much easier to recognize Single words tend to be pronounced more clearly Continuous speech involves contextual coarticulation effects Weak forms Assimilation Contractions

15 Interpreting prosodic features Pitch, length and loudness are used to indicate “stress” All of these are relative On a speaker-by-speaker basis And in relation to context Pitch and length are phonemic in some languages

16 Performance errors Performance “errors” include Non-speech sounds Hesitations False starts, repetitions Filtering implies handling at syntactic level or above Some disfluencies are deliberate and have pragmatic effect – this is not something we can handle in the near future

ARCHITECTURE OF HMM BASED SPEECH RECOGNITION SYSTEM 17

HMM based speech recognition system Receiving and digitizing the input speech signal. Extracting features for all input speech signals using MFCC algorithm will convert and sort each signal’s features into a feature vector. Classifying the feature vectors into the phonetic based categories at each frame using HMM algorithm. Finally, performing a Viterbi search which is an algorithm to compute the optimal (most likely) state sequence in HMM given a sequence of observed outputs. 18

19 HMM Model in Speech The most common model used for speech is constrained, allowing a state to transition only to itself or to a single succeeding state.

APPLICATION Banking Phone dialing system Computer 20

ADVANTAGES A speech-enabled IVR gives users much greater flexibility Call routers become easier for users Users can provide open-ended input 21

CONCLUSION Hmm consider speech signal as a piecewise stationary or short time stationary signal Popular due to they can be trained automatically Implemented with the help of a MATLAB 22

REFERENCE Garfinkel (1998). Retrieved on 10th February 2009, n.html M. A. M. Abu Shariah, R. N. Ainon, R. Zainuddin, and O. O. Khalifa,“Human Computer Interaction Using Isolated-Words SpeechRecognition Technology,” IEEE Proceedings of The InternationalConference on Intelligent and Advanced Systems (ICIAS’07), KualaLumpur, Malaysia, pp – 1178, M.Z., Bhotto and M.R., Amin, “Bangali Text Dependent SpeakerIdentification Using MelFrequency Cepstrum Coefficientand VectorQuantization”. 3rd InternationalConference on Electrical and ComputerEngineering, Dhaka, Bangladesh, pp , recognition.aspx 23

24