Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.

Slides:



Advertisements
Similar presentations
1 Discriminative Learning for Hidden Markov Models Li Deng Microsoft Research EE 516; UW Spring 2009.
Advertisements

Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.
Building an ASR using HTK CS4706
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Inducing Structure for Perception Slav Petrov Advisors: Dan Klein, Jitendra Malik Collaborators: L. Barrett, R. Thibaux, A. Faria, A. Pauls, P. Liang,
Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov.
Basics of Statistical Estimation
Building an ASR using HTK CS4706
Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
2004/11/161 A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition LAWRENCE R. RABINER, FELLOW, IEEE Presented by: Chi-Chun.
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK Technical University of Crete Speech Processing and Dialog Systems Group.
Speaker Adaptation for Vowel Classification
MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.
Fast Temporal State-Splitting for HMM Model Selection and Learning Sajid Siddiqi Geoffrey Gordon Andrew Moore.
Scalable Text Mining with Sparse Generative Models
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Study of Word-Level Accent Classification and Gender Factors
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
1 Word Recognition with Conditional Random Fields Jeremy Morris 12/03/2009.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
FIGURE 1: Spectrogram of the phrase “that experience”, shown with phonetic labels and corresponding neural network posterior distributions over each phonetic.
Tom Ko and Brian Mak The Hong Kong University of Science and Technology.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
Combining Phonetic Attributes Using Conditional Random Fields Jeremy Morris and Eric Fosler-Lussier – Department of Computer Science and Engineering A.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Automatic Speech Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Statistical Models for Automatic Speech Recognition
Computational NeuroEngineering Lab
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
CRANDEM: Conditional Random Fields for ASR
Conditional Random Fields An Overview
Statistical Models for Automatic Speech Recognition
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
Segment-Based Speech Recognition
Automatic Speech Recognition: Conditional Random Fields for ASR
Decision Making Based on Cohort Scores for
LECTURE 15: REESTIMATION, EM AND MIXTURES
The Application of Hidden Markov Models in Speech Recognition
Presentation transcript:

Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Acoustic Modeling

Motivation  Standard acoustic models impose many structural constraints  We propose an automatic approach  Use TIMIT Dataset  MFCC features  Full covariance Gaussians (Young and Woodland, 1994)

Phone Classification ??????????

æ

HMMs for Phone Classification

Temporal Structure

Standard subphone/mixture HMM Temporal Structure Gaussian Mixtures Model Error rate HMM Baseline25.1%

Our Model Standard Model Single Gaussians Fully Connected

Hierarchical Baum-Welch Training 32.1% 28.7% 25.6% HMM Baseline25.1% 5 Split rounds21.4% 23.9%

Phone Classification Results MethodError Rate GMM Baseline (Sha and Saul, 2006) 26.0 % HMM Baseline (Gunawardana et al., 2005) 25.1 % SVM (Clarkson and Moreno, 1999) 22.4 % Hidden CRF (Gunawardana et al., 2005) 21.7 % Our Work21.4 % Large Margin GMM (Sha and Saul, 2006) 21.1 %

Phone Recognition ?????????

Standard State-Tied Acoustic Models

No more State-Tying

No more Gaussian Mixtures

Fully connected internal structure

Fully connected external structure

Refinement of the /ih/-phone

Refinement of the /l/-phone

Hierarchical Refinement Results HMM Baseline41.7% 5 Split Rounds28.4%

Merging  Not all phones are equally complex  Compute log likelihood loss from merging Split modelMerged at one node t-1tt+1t-1tt+1

Merging Criterion t-1tt+1 t-1tt+1

Split and Merge Results Split Only28.4% Split & Merge27.3%

HMM states per phone

Alignment Hand Aligned27.3% Auto Aligned26.3% Results

Alignment State Distribution

Inference  State sequence: d 1 -d 6 -d 6 -d 4 -ae 5 -ae 2 -ae 3 -ae 0 -d 2 -d 2 -d 3 -d 7 -d 5  Phone sequence: d - d - d -d -ae - ae - ae - ae - d - d -d - d - d  Transcription d - ae - d Viterbi Variational ???

Variational Inference Variational Approximation: Viterbi26.3% Variational25.1% : Posterior edge marginals Solution:

Phone Recognition Results MethodError Rate State-Tied Triphone HMM (HTK) (Young and Woodland, 1994) 27.7 % Gender Dependent Triphone HMM (Lamel and Gauvain, 1993) 27.1 % Our Work26.1 % Bayesian Triphone HMM (Ming and Smith, 1998) 25.6 % Heterogeneous classifiers (Halberstadt and Glass, 1998) 24.4 %

Conclusions  Minimalist, Automatic Approach  Unconstrained  Accurate  Phone Classification  Competitive with state-of-the-art discriminative methods despite being generative  Phone Recognition  Better than standard state-tied triphone models

Thank you!