PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

Building an ASR using HTK CS4706
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Motivation Traditional approach to speech and speaker recognition:
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.
Presenter: Yufan Liu November 17th,
Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Speech Recognition in Noise
1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Nonlinear Mixture Autoregressive Hidden Markov Models for Speech Recognition S. Srinivasan, T. Ma, D. May, G. Lazarou and J. Picone Department of Electrical.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
Tracking Pedestrians Using Local Spatio- Temporal Motion Patterns in Extremely Crowded Scenes Louis Kratz and Ko Nishino IEEE TRANSACTIONS ON PATTERN ANALYSIS.
Isolated-Word Speech Recognition Using Hidden Markov Models
Page 0 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Saurabh Prasad Intelligent Electronic Systems Human and Systems.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Computer vision: models, learning and inference Chapter 19 Temporal models.
7-Speech Recognition Speech Recognition Concepts
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines Jon Hamaker and Joseph Picone Institute for.
LINEAR DYNAMIC MODEL FOR CONTINUOUS SPEECH RECOGNITION Ph.D. Candidate: Tao Ma Department of Electrical and Computer Engineering Mississippi State University.
LINEAR DYNAMIC MODEL FOR CONTINUOUS SPEECH RECOGNITION Ph.D. Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Prognosis of gear health using stochastic dynamical models with online parameter estimation 10th International PhD Workshop on Systems and Control a Young.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author: Aravind.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
LINEAR DYNAMIC MODEL FOR CONTINUOUS SPEECH RECOGNITION URL: Ph.D.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
CS Statistical Machine learning Lecture 24
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
NCAF Manchester July 2000 Graham Hesketh Information Engineering Group Rolls-Royce Strategic Research Centre.
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
An Introduction To The Kalman Filter By, Santhosh Kumar.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
S.Patil, S. Srinivasan, S. Prasad, R. Irwin, G. Lazarou and J. Picone Intelligent Electronic Systems Center for Advanced Vehicular Systems Mississippi.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Kalman Filtering And Smoothing
Nonlinear State Estimation
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
A NONPARAMETRIC BAYESIAN APPROACH FOR
Automatic Speech Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Statistical Models for Automatic Speech Recognition
Computational NeuroEngineering Lab
Speech Processing Speech Recognition
CRANDEM: Conditional Random Fields for ASR
Statistical Models for Automatic Speech Recognition
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
Automatic Speech Recognition: Conditional Random Fields for ASR
LECTURE 15: REESTIMATION, EM AND MIXTURES
Presentation transcript:

PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic Model (LDM) for Automatic Speech Recognition

Institute for Signal and Information Processing (ISIP) Page 1 of 20 An Example of Kalman Filter (another name of LDM) Observation A Kalman Filter models the position evolution In control system engineering, Kalman Filter succeeds to model a system with noisy observations Filtering : Position at present time (remove noise effect) Predicting : Position at a future time Smoothing : Position at a time in the past

Institute for Signal and Information Processing (ISIP) Page 2 of 20 Outline Why Linear Dynamic Model (LDM)? Linear Dynamic Model Pilot experiment: LDM phone classification on Aurora 4 Hybrid HMM/LDM decoder architecture for LVCSR Future work

Institute for Signal and Information Processing (ISIP) Page 3 of 20 HMM & Speech Recognition System Hidden Markov Models

Institute for Signal and Information Processing (ISIP) Page 4 of 20 Is HMM a perfect model for ASR? Progress on improving the accuracy of HMM-based system has slowed in the past decade Theory drawbacks of HMM –False assumption that frames are independent and stationary –Spatial correlation is ignored (diagonal covariance matrix) –Limited discrete state space Accuracy Time Clean Noisy

Institute for Signal and Information Processing (ISIP) Page 5 of 20 Motivation of Linear Dynamic Model (LDM) Research Motivation –A model which reflects the characteristics of speech signals will ultimately lead to great ASR performance improvement –LDM incorporates frame correlation information of speech signals, which is potential to increase recognition accuracy –“Filter” characteristic of LDM has potential to improve noise robustness of speech recognition –Fast growing computation capacity (thanks to Intel) make it realistic to build a two-way HMM/LDM hybrid speech engine

Institute for Signal and Information Processing (ISIP) Page 6 of 20 State Space Model Linear Dynamic Model (LDM) is derived from State Space Model Equations of State Space Model: y: observation feature vector x: corresponding internal state vector h(): relationship function between y and x at current time f(): relationship function between current state and all previous states epsilon: noise component eta: noise component

Institute for Signal and Information Processing (ISIP) Page 7 of 20 Linear Dynamic Model Equations of Linear Dynamic Model (LDM) –Current state is only determined by previous state –H, F are linear transform matrices –Epsilon and Eta are driving components y: observation feature vector x: corresponding internal state vector H: linear transform matrix between y and x F: linear transform matrix between current state and previous state epsilon: driving component eta: driving component

Institute for Signal and Information Processing (ISIP) Page 8 of 20 Kalman filtering for state inference (E-Step of EM training) Human Being Sound System Kalman Filtering Estimation e For a speech sound,

Institute for Signal and Information Processing (ISIP) Page 9 of 20 RTS smoother for better inference Standard Kalman FilterKalman Filter with RTS smoother Rauch-Tung-Striebel (RTS) smoother –Additional backward pass to minimize inference error –During EM training, computes the expectations of state statistics

Institute for Signal and Information Processing (ISIP) Page 10 of 20 Maximum Likelihood Parameter Estimation (M-Step of EM training) Nothing but matrix multiplication! LDM Parameters aa ae ah ao aw ay b ch d dh eh er ………

Institute for Signal and Information Processing (ISIP) Page 11 of 20 LDM for Speech Classification MFCC Feature ……… aa ch eh x y HMM-Based Recognition LDM-Based Recognition MFCC Feature ……… aa ch eh x y Hypothesis x ^ x ^ x ^ x ^ x ^ x ^

Institute for Signal and Information Processing (ISIP) Page 12 of 20 Challenges of Applying LDM to ASR Segment-based model –frame-to-phoneme information is needed before classification EM training is sensitive to state initialization –Each phoneme is modeled by a LDM, EM training is to find a set of parameters for a specific LDM –No good mechanism for state initialization yet More parameters than HMM (2~3x) –Currently mono-phone model, to build a tri-phone model for LVCSR would need more training data

Institute for Signal and Information Processing (ISIP) Page 13 of 20 Pilot experiment: phone classification on Aurora 4 Aurora 4: Wall Street Journal + six kinds of noises –Airport, Babble, Car, Restaurant, Street, and Train Frame-to-phone alignment is generated by ISIP decoder (force align mode) – Adding language model will get 93% accuracy for clean data 40 phones, one vs. all classifier model clean dataset (Acc) noisy dataset (Acc) HMM46.9%36.8% LDM49.2%39.2%

Institute for Signal and Information Processing (ISIP) Page 14 of 20 Hybrid HMM/LDM decoder architecture for LVCSR Confidence Measurement Best Hypothesis

Institute for Signal and Information Processing (ISIP) Page 15 of 20 Status and future work The development of HMM/LDM hybrid decoder is still in progress –HMM/LDM hybrid decoder is Expected to be done in 2009 –ISIP HMM/SVM hybrid decoder acts as the reference for implementation Future work –Research has proved the nonlinear effects in speech signals –Investigate the probability of replacing Kalman filtering with nonlinear filtering (such as Unscented Kalman Filter, Extended Kalman Filter)

Institute for Signal and Information Processing (ISIP) Page 16 of 20 Thank you! Questions?

Institute for Signal and Information Processing (ISIP) Page 17 of 20 References Digalakis, V., “Segment-based Stochastic Models of Spectral Dynamics for Continuous Speech Recognition,” Ph.D. Dissertation, Boston University, Boston, Massachusetts, USA, Digalakis, V., Rohlicek, J. and Ostendorf, M., “ML Estimation of a Stochastic Linear System with the EM Algorithm and Its Application to Speech Recognition,” IEEE Transactions on Speech and Audio Processing, vol. 1, no. 4, pp. 431–442, October Frankel, J., “Linear Dynamic Models for Automatic Speech Recognition,” Ph.D. Dissertation, The Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK, 2003.