Joseph Picone, PhD Human and Systems Engineering Professor, Electrical and Computer Engineering URL:

Slides:



Advertisements
Similar presentations
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Advertisements

ECE 8443 – Pattern Recognition Objectives: Acoustic Modeling Language Modeling Feature Extraction Search Pronunciation Modeling Resources: J.P.: Speech.
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
An Overview of Machine Learning
Supervised Learning Recap
ECE 8443 – Pattern Recognition Objectives: Course Introduction Typical Applications Resources: Syllabus Internet Books and Notes D.H.S: Chapter 1 Glossary.
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
... NOT JUST ANOTHER PUBLIC DOMAIN SOFTWARE PROJECT ...
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
Support Vector Machine Applications Electrical Load Forecasting ICONS Presentation Spring 2007 N. Sapankevych 20 April 2007.
Applications of Risk Minimization to Speech Recognition Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Ryan Irwin Intelligent Electronics Systems Human and Systems Engineering Center for Advanced Vehicular Systems URL:
This week: overview on pattern recognition (related to machine learning)
Hybrid Systems for Continuous Speech Recognition Issac Alphonso Institute for Signal and Information Processing Mississippi State.
Joseph Picone, PhD Human and Systems Engineering Professor, Electrical and Computer Engineering Bridging the Gap in Human and Machine Performance HUMAN.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines Jon Hamaker and Joseph Picone Institute for.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
IMPROVING RECOGNITION PERFORMANCE IN NOISY ENVIRONMENTS Joseph Picone 1 Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Signal Processing Emphasis Group Robert Moorhead Roger King Joe Picone Nick Younan Jim Fowler Lori Bruce Jenny Du.
Sanjay Patil 1 and Ryan Irwin 2 Graduate research assistant 1, REU undergrad 2 Human and Systems Engineering URL:
Sanjay Patil 1 and Ryan Irwin 2 Intelligent Electronics Systems, Human and Systems Engineering Center for Advanced Vehicular Systems URL:
Sanjay Patil 1 and Ryan Irwin 2 Intelligent Electronics Systems, Human and Systems Engineering Center for Advanced Vehicular Systems URL:
Sanjay Patil and Ryan Irwin Intelligent Electronics Systems, Human and Systems Engineering Center for Advanced Vehicular Systems URL:
Network Training for Continuous Speech Recognition Author: Issac John Alphonso Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
Christopher M. Bishop, Pattern Recognition and Machine Learning.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Biointelligence Laboratory, Seoul National University
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
A Sparse Modeling Approach to Speech Recognition Using Kernel Machines Jon Hamaker Institute for Signal and Information Processing.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
M. Liu, T. Stanley, J. Baca and J. Picone Intelligent Electronic Systems Center for Advanced Vehicular Systems Mississippi State University URL:
Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Combining Unsupervised Feature Selection.
The Relation Between Speech Intelligibility and The Complex Modulation Spectrum Steven Greenberg International Computer Science Institute 1947 Center Street,
NTU & MSRA Ming-Feng Tsai
Network Training for Continuous Speech Recognition Author: Issac John Alphonso Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng.
English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of new variational inference.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Applications of Risk Minimization to Speech Recognition Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Sanjay Patil and Ryan Irwin Intelligent Electronics Systems Human and Systems Engineering Center for Advanced Vehicular Systems URL:
Sanjay Patil and Ryan Irwin Intelligent Electronics Systems, Human and Systems Engineering Center for Advanced Vehicular Systems URL:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
LECTURE 16: SUPPORT VECTOR MACHINES
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
LECTURE 17: SUPPORT VECTOR MACHINES
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
HUMAN AND SYSTEMS ENGINEERING:
Network Training for Continuous Speech Recognition
Presentation transcript:

Joseph Picone, PhD Human and Systems Engineering Professor, Electrical and Computer Engineering URL: Bridging the Gap in Human and Machine Performance HUMAN AND SYSTEMS ENGINEERING:

Evolution For Better Infrastructure 10 years at MS State Public Domain Speech Recognition Jumpstarted in 1997 by a DoD grant Center for Advanced Vehicular Systems State funded to support Nissan Three Complementary Thrusts Extension center colocated with Nissan in Canton, Mississippi Statewide economic development Assist first-tier suppliers Introduction to Human and Systems Engineering Page 1 of 7

A Virtual Tour of CAVS at Mississippi State University Page 2 of 7 Introduction to Human and Systems Engineering

Intelligent Electronic Systems At A Glance Computer Networking: Wireless Communications Intelligent Sensors Collaborative Vehicles Intelligent Systems: Speech Processing Machine Learning Dialog Systems Human Factors and Ergonomics Integrative Activities: Challenge X Capstone Design Experiences Introduction to Human and Systems Engineering Page 3 of 7

Phase I Testbed: Campus Bus Networking Instrument the campus bus system to collect real- time data Modular architecture to support a variety of sensors and high speed data communications Introduction to Human and Systems Engineering Page 4 of 7

Dialog Systems Applications in Automotive Noise robustness in both environments to improve recognition performance Advanced statistical models and machine learning technology In-vehicle dialog systems improve information access. Advanced user interfaces enhance workforce training and increase manufacturing efficiency. Introduction to Human and Systems Engineering Page 5 of 7

Speaker Verification Via Metadata Extraction Recognition of emotion, stress, fatigue, and other voice qualities are possible from enhanced descriptions of the speech signal Fundamentally the same statistical modeling problem as other speech applications Fatigue analysis from voice under development under an SBIR (from Shriberg, et al., IEEE Spectrum, April 2003) Introduction to Human and Systems Engineering Page 6 of 7

The Challenge X Program Competition created by automotive industry, government, and academic partners Challenges university-level engineering students to decrease total energy consumption and emissions Maintain or exceed vehicle utility and performance Cooperative venture between industry and universities Faculty Advisor: G. Marshall Molen Introduction to Human and Systems Engineering Page 7 of 7

APPLICATIONS OF RISK MINIMIZATION TO SPEECH RECOGNITION Jon Hamaker, Aravind Ganapathiraju and Joseph Picone Intelligent Electronic Systems Human and Systems Engineering URL:

ABSTRACT: Statistical techniques based on Hidden Markov models (HMMs) with Gaussian emission densities have dominated the signal processing and pattern recognition literature for the past 20 years. However, HMMs suffer from an inability to learn discriminative information and are prone to overfitting and over ‑ parameterization. In this presentation, we will review our attempts to apply notions of risk minimization into pattern recognition problems such as speech recognition. New approaches based on probabilistic Bayesian learning are shown to provide an order of magnitude reduction in complexity over comparable approaches based on HMMs and Support Vector Machines. BIOGRAPHY: Joseph Picone is currently a Professor in the Department of Electrical and Computer Engineering at Mississippi State University and an Academic Thrust Leader at the Center for Advanced Vehicular Systems. For the past 15 years he has been promoting open source speech technology. He has previously been employed by Texas Instruments and AT&T Bell Laboratories. Dr. Picone received his Ph.D. in Electrical Engineering from Illinois Institute of Technology in He is a Senior Member of the IEEE and a registered Professional Engineer. Abstract and Biography Applications of Risk Minimization Page 1 of 10

Optimal decision surface is obviously a line Introduce two more data points How much can we trust isolated data points? Can we integrate prior knowledge about data, confidence, or willingness to take risk? Generalization and Risk Optimal decision surface is still a line (good generalizaton) Optimal decision surface changes abruptly Applications of Risk Minimization Page 2 of 10

Deterding Vowel Data: 11 vowels spoken in “h*d” context; 10 log area parameters; 528 train, 462 SI test Approach% Error# Parameters SVM: Polynomial Kernels49% K-Nearest Neighbor44% Gaussian Node Network44% SVM: RBF Kernels35%83 SVs Separable Mixture Models30% RVM: RBF Kernels30%13 RVs Static Pattern Classification With SVMs Applications of Risk Minimization Page 3 of 10

Applications of SVMs to Conversational Speech Information SourceHMMHybrid TranscriptionSegmentationADSWBADSWB N-BestHypothesis N-Best N-Best + Ref.Reference6.6— N-Best + Ref Notes: SVMs not exposed to alternative segmentations during training (closed-loop) SVM performance is high when there is no mismatch between the training and evaluation conditions Complexity (parameter count) approaches HMMs Applications of Risk Minimization Page 4 of 10

A kernel-based learning machine Incorporates an automatic relevance determination (ARD) prior over each weight (MacKay) A flat (non-informative) prior over  completes the Bayesian specification Relevance Vector Machines Applications of Risk Minimization Page 5 of 10

The goal in training becomes finding: Estimation of the “sparsity” parameters is inherent in the optimization – no need for a held-out set! A closed-form solution to this maximization problem is not available. Iteratively reestimate Iterative Reestimation of Hyperparameters Applications of Risk Minimization Page 6 of 10

Deterding Vowel Data: 11 vowels spoken in “h*d” context; 10 log area parameters; 528 train, 462 SI test Approach% Error# Parameters SVM: Polynomial Kernels49% K-Nearest Neighbor44% Gaussian Node Network44% SVM: RBF Kernels35%83 SVs Separable Mixture Models30% RVM: RBF Kernels30%13 RVs RVM and SVM Comparison — Static Patterns Applications of Risk Minimization Page 7 of 10

RVMs yield a large reduction in the parameter count while attaining superior performance Computational costs mainly in training for RVMs but is still prohibitive for larger sets – O(N 3 ) vs. O(N 2 ) for SVMs and O(N) for HMMs ApproachError Rate Avg. # Parameters Training Time Testing Time SVM16.4% hours30 mins RVM16.2%1230 days1 min RVM and SVM Comparison — Alphadigits Applications of Risk Minimization Page 8 of 10

Approach Error Rate Avg. # Parameters Training Time Testing Time SVM 15.5%9943 hours1.5 hours RVM Constructive 14.8%725 days5 mins RVM Reduction 14.8%746 days5 mins Data increased to 10,000 training vectors Reduction method has been trained up to 100k vectors (on toy task). Not possible for Constructive method Preliminary Results on Learning Applications of Risk Minimization Page 9 of 10

Reduction of complexity at the same level of performance is interesting: Results hold across tasks RVMs have been trained on 100,000 vectors Results suggest integrated training is critical Risk minimization provides a family of solutions: Is there a better solution than minimum risk? What is the impact on complexity and robustness? Applications to other problems? Speech/Non-speech classification? Speaker adaptation? Language modeling? Summary — Practical Risk Minimization? Applications of Risk Minimization Page 10 of 10

Applications to Speech Recognition: 1.J. Hamaker and J. Picone, “Advances in Speech Recognition Using Sparse Bayesian Methods,” submitted to the IEEE Transactions on Speech and Audio Processing, January 2003 (in revision).Advances in Speech Recognition Using Sparse Bayesian Methods 2.A. Ganapathiraju, J. Hamaker and J. Picone, “Applications of Risk Minimization to Speech Recognition,” to appear in the IEEE Transactions on Signal Processing, August 2004.Applications of Risk Minimization to Speech Recognition 3.J. Hamaker, J. Picone, and A. Ganapathiraju, “A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines,” Proceedings of the International Conference of Spoken Language Processing, vol. 2, pp , Denver, Colorado, USA, September 2002.A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines 4.J. Hamaker, Sparse Bayesian Methods for Continuous Speech Recognition, Ph.D. Dissertation, Department of Electrical and Computer Engineering, Mississippi State University, December 2003.Sparse Bayesian Methods for Continuous Speech Recognition 5.A. Ganapathiraju, Support Vector Machines for Speech Recognition, Ph.D. Dissertation, Department of Electrical and Computer Engineering, Mississippi State University, January 2002.Support Vector Machines for Speech Recognition Influential work: 6.M. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine,” Journal of Machine Learning, vol. 1, pp , June D. J. C. MacKay, “Probable networks and plausible predictions --- a review of practical Bayesian methods for supervised neural networks,” Network: Computation in Neural Systems, 6, pp , D. J. C. MacKay, Bayesian Methods for Adaptive Models, Ph. D. thesis, California Institute of Technology, Pasadena, California, USA, E. T. Jaynes, “Bayesian Methods: General Background,” Maximum Entropy and Bayesian Methods in Applied Statistics, J. H. Justice, ed., pp. 1-25, Cambridge Univ. Press, Cambridge, UK, V.N. Vapnik, Statistical Learning Theory, John Wiley, New York, NY, USA, V.N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, NY, USA, C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” AT&T Bell Laboratories, November Applications of Risk Minimization Brief Bibliography

Ram Sundaram Joseph Picone BBN TechnologiesMississippi State University Cambridge, MassachusettsMississippi State, Mississippi URL: Effects of Transcriptions Errors on Supervised Learning in Speech Recognition

ABSTRACT: Hidden Markov model-based speech recognition systems use supervised learning to train acoustic models. On difficult tasks such as conversational speech there has been concern over the impact erroneous transcriptions have on the parameter estimation process. This work analyzes the effects of mislabeled data on recognition accuracy. Training is performed using manually corrupted transcriptions, and results are presented on three tasks: TIDigits, Alphadigits and Switchboard. For Alphadigits, with 16% of the training data mislabeled, the performance of the system degrades by 12% relative to the baseline. On Switchboard, at 16% mislabeled training data, the performance of the system degrades by 8.5% relative to the baseline. An analysis of these results revealed that the Gaussian mixture model contributes significantly to the robustness of the supervised learning training process. MOTIVATION: Recover an investment of three and a half long years spent retranscribing and resegmenting Switchboard Abstract and Motivation Transcription Errors Page 1 of 5

Robustness to Transcription Errors — TIDigits Introduced random transcription word errors in a controlled fashion on TIDigits Observed no significant degradation in performance until the TER was artificially high (16%). What makes an HMM-based speech recognition system so robust to such errors? Transcription Errors Page 2 of 5

Robustness to Transcription Errors — Comparison No significant degradation with transcription errors (including Switchboard!) Context-dependent phone models are more robust than word models Transcription Errors Page 3 of 5

Analyze State Occupancies Through Training Study maximum likelihood estimates of the mean and variance for a Gaussian estimator Analyze how much does an incorrect model learn from the erroneous data by examining state occupancies Analyze how much the correct model is influenced by the erroneous transcriptions Transcription Errors Page 4 of 5

Transcription errors do not corrupt the acoustic models significantly Alphadigits — at 16% TER, WER degrades only by 12% SWB — at 16% TER, WER degrades only by 8.5% Robustness to erroneous data mainly due to Gaussian distribution State-tying helps in decreasing the TER during the context-dependent modeling stage Mixture training adds more robustness by modeling other variations in the correct portion of the data Summary Transcription Errors Page 5 of 5

1.R. Sundaram and J. Picone, “Effects of Transcription Errors on Supervised Learning in Speech Recognition,” submitted to the International Conference on Acoustics, Speech, and Signal Processing, Montreal, Quebec, Canada, May R. Sundaram, Effects of Transcription Errors on Supervised Learning in Speech Recognition, M.S. Thesis, Department of Electrical and Computer Engineering, Mississippi State University, August 2003.Effects of Transcription Errors on Supervised Learning in Speech Recognition 3.R. Sundaram and J. Picone, “The Effects of Transcription Errors,” Proceedings of the Speech Transcription Workshop, Linthicum Heights, Maryland, USA, May L. Lamel, J. L. Gauvain, G. Adda, “Lightly Supervised Acoustic Model Training,” Proceedings of the ISCA ITRW ASR2000, Paris, France, September G. Zavaliagkos, T. Colthurst, “Utilizing Untranscribed Training Data to Improve Performance,” Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, Virginia, February P. Placeway, J. Lafferty, “Cheating with Imperfect Transcripts,” Proceedings of the International Conference on Speech and Language Processing, Philadelphia, Pennsylvania, USA, pp , September T. Kemp, A. Waibel, “Unsupervised Training of a Speech Recognizer Recent Experiments,” Proceedings of ESCA Eurospeech’99, pp , Budapest, Hungary, September Brief Bibliography Transcription Errors