English vs. Mandarin: A Phonetic Comparison Experimental Setup Abstract The focus of this work is to assess the performance of three new variational inference.

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

Active Appearance Models
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs Andrew Rosenberg Queens College / CUNY Interspeech 2013 August 26, 2013.
Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.
Bootstrapping pronunciation models: a South African case study Presented at the CSIR Research and Innovation Conference Marelie Davel & Etienne Barnard.
Motivation Parametric models can capture a bounded amount of information from the data. Real data is complex and therefore parametric assumptions is wrong.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
A Comparative Analysis of Bayesian Nonparametric Variational Inference Algorithms for Speech Recognition John Steinberg Institute for Signal and Information.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Joseph Picone Co-PIs: Amir Harati, John Steinberg and Dr. Marc Sobel
Abstract The emergence of big data and deep learning is enabling the ability to automatically learn how to interpret EEGs from a big data archive. The.
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
A Comparative Analysis of Bayesian Nonparametric Variational Inference Algorithms for Phoneme Recognition A Thesis Proposal By: John Steinberg Institute.
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
A Comparative Analysis of Bayesian Nonparametric Variational Inference Algorithms for Speech Recognition John Steinberg Institute for Signal and Information.
Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.
World Languages Mandarin English Challenges in Mandarin Speech Recognition  Highly developed language model is required due to highly contextual nature.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines Jon Hamaker and Joseph Picone Institute for.
Prognosis of Gear Health Using Gaussian Process Model Department of Adaptive systems, Institute of Information Theory and Automation, May 2011, Prague.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Randomized Algorithms for Bayesian Hierarchical Clustering
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of three new variational inference.
Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
Amir Harati and Joseph Picone
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.
Tone Recognition With Fractionized Models and Outlined Features Ye Tian, Jian-Lai Zhou, Min Chu, Eric Chang ICASSP 2004 Hsiao-Tsung Hung Department of.
English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of new variational inference.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Online Multiscale Dynamic Topic Models
College of Engineering Temple University
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
College of Engineering
Pick samples from task t
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Introduction to Data Mining, 2nd Edition by
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Collapsed Variational Dirichlet Process Mixture Models
Sampling Distribution
Sampling Distribution
Automatic Speech Recognition: Conditional Random Fields for ASR
Stochastic Optimization Maximization for Latent Variable Models
feature extraction methods for EEG EVENT DETECTION
Research on the Modeling of Chinese Continuous Speech Recognition
Presentation transcript:

English vs. Mandarin: A Phonetic Comparison Experimental Setup Abstract The focus of this work is to assess the performance of three new variational inference algorithms for the acoustic modeling task in speech recognition:  Accelerated variational Dirichlet process mixtures (AVDPM)  Collapsed variational stick breaking (CVSB)  Collapsed Dirichlet priors (CDP). Speech recognition (SR) performance is highly dependent on the data it was trained on. Our goal is to reduce the complexity and sensitivity of training. Dirichlet Processes Mixtures (DPMs) can learn underlying structure from data and can potentially help improve a system’s ability to generalize to unseen data. Inference algorithms are needed to make calculations tractable for DPMs. Abstract The focus of this work is to assess the performance of three new variational inference algorithms for the acoustic modeling task in speech recognition:  Accelerated variational Dirichlet process mixtures (AVDPM)  Collapsed variational stick breaking (CVSB)  Collapsed Dirichlet priors (CDP). Speech recognition (SR) performance is highly dependent on the data it was trained on. Our goal is to reduce the complexity and sensitivity of training. Dirichlet Processes Mixtures (DPMs) can learn underlying structure from data and can potentially help improve a system’s ability to generalize to unseen data. Inference algorithms are needed to make calculations tractable for DPMs. John Steinberg and Dr. Joseph Picone Department of Electrical and Computer Engineering, Temple University Variational Inference Algorithms for Acoustic Modeling in Speech Recognition College of Engineering Temple University Speech Recognition Systems Speech Recognition Systems Gaussian Mixture Models Variational Inference Results Probabilistic Modeling: DPMs and Variational Inference Conclusions  DPMs can optimize the # of mixtures for GMMs  AVDPM, CVSB, and CDP yield slightly improved error rates over GMMs  AVDPM, CVSB, and CDP found much fewer # ‘s of mixtures than GMMs  CH-E and CH-M performance gap is due to the number of class labels. Future Work  Assess computational complexity of AVPDM, CVSB, and CDP (CPU time)  Compare error rates on CH-E and CHM to results from TIMIT  Evaluate effects of collapsing the label set in Mandarin to further reduce error rates Conclusions  DPMs can optimize the # of mixtures for GMMs  AVDPM, CVSB, and CDP yield slightly improved error rates over GMMs  AVDPM, CVSB, and CDP found much fewer # ‘s of mixtures than GMMs  CH-E and CH-M performance gap is due to the number of class labels. Future Work  Assess computational complexity of AVPDM, CVSB, and CDP (CPU time)  Compare error rates on CH-E and CHM to results from TIMIT  Evaluate effects of collapsing the label set in Mandarin to further reduce error rates What is a phoneme? An Example  Training Features:  # Study Hours  Age  Training Labels  Previous grades An Example  Training Features:  # Study Hours  Age  Training Labels  Previous grades Dirichlet Processes  DPMs model distributions of distributions  Can find the best # of classes automatically! Dirichlet Processes  DPMs model distributions of distributions  Can find the best # of classes automatically! [1] Applications Applications Mobile Technology Auto/GPS National Intelligence Other Applications Translators Prostheses Lang. Educ. Media Search CH-ECH-E about Word a – bout Syllable ax –b – aw – t PhonemeEnglish  ~10,000 syllables  ~42 phonemes  Non-Tonal LanguageEnglish  ~10,000 syllables  ~42 phonemes  Non-Tonal LanguageMandarin  ~1300 syllables  ~92 phonemes  Tonal Language  4 tones  1 neutral 7 instances of “ma”Mandarin  ~1300 syllables  ~92 phonemes  Tonal Language  4 tones  1 neutral 7 instances of “ma” QUESTION: Given a new set of features, what is the predicted grade? Variational Inference  DPMs require ∞ parameters  Variational inference is used to estimate DPM models Variational Inference  DPMs require ∞ parameters  Variational inference is used to estimate DPM models Why English and Mandarin?  Phonetically very different  Can help identify language specific artifacts that affect performance Why English and Mandarin?  Phonetically very different  Can help identify language specific artifacts that affect performance Corpora:   CALLHOME English (CH-E), CALLHOME Mandarin (CH-M)   Conversational telephone speech  ~  ~300,000 (CH-E) and ~250,000 (CH-M) training samples respectivelyCorpora:   CALLHOME English (CH-E), CALLHOME Mandarin (CH-M)   Conversational telephone speech  ~  ~300,000 (CH-E) and ~250,000 (CH-M) training samples respectively Paradigm:   Compare DPMs to a baseline Gaussian mixture model (GMM)   Optimize system parameters such as the number of mixtures and word error rate   Compare model complexityParadigm:   Compare DPMs to a baseline Gaussian mixture model (GMM)   Optimize system parameters such as the number of mixtures and word error rate   Compare model complexity CH-MCH-M k Error (%) (Val / Evl) % / 68.63% % / 66.32% % / 68.27% % / 65.30% % / 62.65% % / 63.53% % / 63.57% k Error (%) (Val / Evl) % / 63.28% % / 60.62% % / 63.55% % / 61.74% % / 59.69% % / 58.41% % / 58.37% CALLHOME English *This experiment has not been fully completed yet and this number is expected to dramatically decrease CALLHOME English *This experiment has not been fully completed yet and this number is expected to dramatically decrease CALLHOME Mandarin Algorithm Best Error Rate: CH-E Avg. k per Phoneme GMM58.41%128 AVDPM56.65%3.45 CVSB56.54%11.60 CDP57.14%27.93* Algorithm Best Error Rate: CH-M Avg. k per Phoneme GMM62.65%64 AVDPM62.59%2.15 CVSB63.08%3.86 CDP62.89% How many classes are there? 1? 2? 3?