ECE 8443 – Pattern Recognition Objectives: Acoustic Modeling Language Modeling Feature Extraction Search Pronunciation Modeling Resources: J.P.: Speech.

Slides:



Advertisements
Similar presentations
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Advertisements

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
ECE 8443 – Pattern Recognition Objectives: Course Introduction Typical Applications Resources: Syllabus Internet Books and Notes D.H.S: Chapter 1 Glossary.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
2004/11/161 A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition LAWRENCE R. RABINER, FELLOW, IEEE Presented by: Chi-Chun.
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Natural Language Processing - Speech Processing -
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Speech Recognition. What makes speech recognition hard?
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
COMP 4060 Natural Language Processing Speech Processing.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 LING 439/539: Statistical Methods in Speech and Language Processing Ying Lin Department of Linguistics University of Arizona.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
"Dude, Where's My... Signals and Systems Textbook?" Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Graphical models for part of speech tagging
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
7-Speech Recognition Speech Recognition Concepts
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines Jon Hamaker and Joseph Picone Institute for.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Madhulika Pannuri Intelligent Electronic Systems Human and Systems Engineering Center for Advanced Vehicular Systems An overview of work done so far in.
Joseph Picone, PhD Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering An Overview of Statistical.
Network Training for Continuous Speech Recognition Author: Issac John Alphonso Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
INSTITUTE FOR SIGNAL AND INFORMATION PROCESSING Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi State.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Neural Networks Si Wu Dept. of Informatics PEV III 5c7 Spring 2008.
ECE 8443 – Pattern Recognition EE 8524 – Speech Signal Processing Objectives: Word Graph Generation Lattices Hybrid Systems Resources: ISIP: Search ISIP:
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Sridhar Raghavan and Joseph Picone URL:
LECTURE 11: Advanced Discriminant Analysis
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
LECTURE 23: INFORMATION THEORY REVIEW
LECTURE 15: REESTIMATION, EM AND MIXTURES
Presentation transcript:

ECE 8443 – Pattern Recognition Objectives: Acoustic Modeling Language Modeling Feature Extraction Search Pronunciation Modeling Resources: J.P.: Speech Recognition - Part I T.H.: Speech Recognition - Part II ISIP: ASR Tutorial ISIP: ASR Resources IEEE: SP Magazine J.P.: Speech Recognition - Part I T.H.: Speech Recognition - Part II ISIP: ASR Tutorial ISIP: ASR Resources IEEE: SP Magazine LECTURE 29: EXAMPLES OF PATTERN RECOGNITION IN SPEECH RECOGNITION Audio: URL:

ECE 8443: Lecture 29, Slide 1 Speech technology has quietly become a pervasive influence in our daily lives despite widespread concerns about research progress over the past 20 years. The ability of a hidden Markov models to explain (and predict) variations in the acoustic signal have been the cornerstone of this progress. Other statistical modeling techniques (e.g., SVMs, finite state machines, entropy-based language modeling) have had significant impact. Generative models have given way to discriminative models that attempt to directly optimize objective measures such as word error rate. Why research human language technology? “Language is the preeminent trait of the human species.” “I never met someone who wasn’t interested in language.” “I decided to work on language because it seemed to be the hardest problem to solve.” Fundamental challenge: diversity of data that often defies mathematical descriptions or physical constraints. Solution: Integration of multiple knowledge sources. In this lecture we will focus on use of pattern recognition in high performance speech recognition systems. Introduction

ECE 8443: Lecture 29, Slide 2 Traditional Output:  best word sequence  time alignment of information Other Outputs:  word graphs  N-best sentences  confidence measures  metadata such as speaker identity, accent, and prosody Applications:  Information localization  data mining  emotional state  stress, fatigue, deception Speech Recognition Is Information Extraction

ECE 8443: Lecture 29, Slide 3 What Makes Acoustic Modeling So Challenging?

ECE 8443: Lecture 29, Slide 4 Regions of overlap represent classification error Reduce overlap by introducing acoustic and linguistic context Comparison of “aa” in “lOck” and “iy” in “bEAt” for conversational speech What Makes Acoustic Modeling So Challenging?

ECE 8443: Lecture 29, Slide 5 Statistical Approach: Noisy Communication Channel Model

ECE 8443: Lecture 29, Slide 6 Given an observation sequence, O, and a word sequence, W, we want minimal uncertainty about the correct answer (i.e., minimize the conditional entropy): To accomplish this, the probability of the word sequence given the observation must increase. The mutual information, I(W;O), between W and O: Two choices: minimize H(W) or maximize I(W;O) Information Theoretic Basis

ECE 8443: Lecture 29, Slide 7 Maximizing the mutual information is equivalent to choosing the parameter set to maximize: Maximization implies increasing the numerator term (maximum likelihood estimation – MLE) or decreasing the denominator term (maximum mutual information estimation – MMIE) The latter is accomplished by reducing the probabilities of incorrect, or competing, hypotheses. Relationship to Maximum Likelihood Methods

ECE 8443: Lecture 29, Slide 8 Core components: transduction feature extraction acoustic modeling (hidden Markov models) language modeling (statistical N- grams) search (Viterbi beam) knowledge sources Our focus will be on the acoustic modeling components of the system. Speech Recognition Architectures

ECE 8443: Lecture 29, Slide 9 Signal Processing in Speech Recognition

ECE 8443: Lecture 29, Slide 10 Feature Extraction in Speech Recognition

ECE 8443: Lecture 29, Slide 11 Adding More Knowledge to the Front End

ECE 8443: Lecture 29, Slide 12 Noise Compensation Techniques

ECE 8443: Lecture 29, Slide 13 Acoustic Modeling: Hidden Markov Models

ECE 8443: Lecture 29, Slide 14 Training Recipes Are Complex And Iterative

ECE 8443: Lecture 29, Slide 15 Bootstrapping Is Key In Parameter Reestimation

ECE 8443: Lecture 29, Slide 16 Controlling Model Complexity

ECE 8443: Lecture 29, Slide 17 Data-Driven Parameter Sharing Is Crucial

ECE 8443: Lecture 29, Slide 18 Context-Dependent Acoustic Units

ECE 8443: Lecture 29, Slide 19 What we haven’t talked about: duration models, adaptation, normalization, confidence measures, posterior-based scoring, hybrid systems, discriminative training, and much, much more… Applications of these models to language (Hazen), dialog (Phillips, Seneff), machine translation (Vogel, Papineni), and other HLT applications Machine learning approaches to human language technology are still in their infancy (Bilmes) A mathematical framework for integration of knowledge and metadata will be critical in the next 10 years. Information extraction in a multilingual environment -- a time of great opportunity! Summary

ECE 8443: Lecture 29, Slide 20 Language Modeling in Speech Recognition

ECE 8443: Lecture 29, Slide 21 Demonstrations Audio Demonstrations: Why is speech recognition so difficult?Audio Demonstrations: Phonetic Units: Context is very important in speech recognitionPhonetic Units: State of the Art: Example of high performance speech recognition systemsState of the Art:

ECE 8443: Lecture 29, Slide 22 Useful textbooks: 1.X. Huang, A. Acero, and H.W. Hon, Spoken Language Processing - A Guide to Theory, Algorithm, and System Development, Prentice Hall, ISBN: , D. Jurafsky and J.H. Martin, SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice-Hall, ISBN: , F. Jelinek, Statistical Methods for Speech Recognition, MIT Press, ISBN: , L.R. Rabiner and B.W. Juang, Fundamentals of Speech Recognition, Prentice-Hall, ISBN: , J. Deller, et. al., Discrete-Time Processing of Speech Signals, MacMillan Publishing Co., ISBN: , R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, Second Edition, Wiley Interscience, ISBN: , 2000 (supporting material available at 7.D. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge University Press, Relevant online resources: 1.“Intelligent Electronic Systems,” Center for Advanced Vehicular Systems, Mississippi State University, Mississippi State, Mississippi, USA, June Internet-Accessible Speech Recognition Technology,” June “Speech and Signal Processing Demonstrations,” ftware/demonstrations, June ftware/demonstrations 4.“Fundamentals of Speech Recognition,” 463, September Appendix: Relevant Publications