College of Engineering Temple University

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

Identification of prosodic near- minimal Pairs in Spontaneous Speech Keesha Joseph Howard University Center for Spoken Language Understanding (CSLU) Oregon.
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs Andrew Rosenberg Queens College / CUNY Interspeech 2013 August 26, 2013.
Motivation Parametric models can capture a bounded amount of information from the data. Real data is complex and therefore parametric assumptions is wrong.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
A Comparative Analysis of Bayesian Nonparametric Variational Inference Algorithms for Speech Recognition John Steinberg Institute for Signal and Information.
English vs. Mandarin: A Phonetic Comparison Experimental Setup Abstract The focus of this work is to assess the performance of three new variational inference.
Joseph Picone Co-PIs: Amir Harati, John Steinberg and Dr. Marc Sobel
Abstract The emergence of big data and deep learning is enabling the ability to automatically learn how to interpret EEGs from a big data archive. The.
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
A Comparative Analysis of Bayesian Nonparametric Variational Inference Algorithms for Phoneme Recognition A Thesis Proposal By: John Steinberg Institute.
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
A Comparative Analysis of Bayesian Nonparametric Variational Inference Algorithms for Speech Recognition John Steinberg Institute for Signal and Information.
Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.
World Languages Mandarin English Challenges in Mandarin Speech Recognition  Highly developed language model is required due to highly contextual nature.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines Jon Hamaker and Joseph Picone Institute for.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of three new variational inference.
Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.
Tone Recognition With Fractionized Models and Outlined Features Ye Tian, Jian-Lai Zhou, Min Chu, Eric Chang ICASSP 2004 Hsiao-Tsung Hung Department of.
English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of new variational inference.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Bayes Risk Minimization using Metric Loss Functions R. Schlüter, T. Scharrenbach, V. Steinbiss, H. Ney Present by Fang-Hui, Chu.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Online Multiscale Dynamic Topic Models
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
ACCELEROMETER LOADS DURING BASKETBALL DRILLS IN PROFESSIONAL PLAYERS
2 Research Department, iFLYTEK Co. LTD.
met.no – SMHI experiences with ALARO
A study on the differences in general parameters and DNA integrity of spermatozoa of smokers and non-smokers Diganta Dey1, Abhijit Banerjee1, Sugat Sanyal1,
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Seasonal Availability, Culture, and Religious Practices Appear to Influence Fish Consumption Throughout the Year Jennifer Hanson & Mark Haub Kansas State.
Reading Notes Wang Ning Lab of Database and Information Systems
College of Engineering
Pick samples from task t
Conditional Random Fields for ASR
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Introduction to Data Mining, 2nd Edition by
Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov
Murat Açar - Zeynep Çipiloğlu Yıldız
CRANDEM: Conditional Random Fields for ASR
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Collapsed Variational Dirichlet Process Mixture Models
Statistical Models for Automatic Speech Recognition
Stochastic Optimization Maximization for Latent Variable Models
feature extraction methods for EEG EVENT DETECTION
Research on the Modeling of Chinese Continuous Speech Recognition
MMOVeS: Managing Mobility Outcomes for Vulnerable Seniors
Christoph F. Eick: A Gentle Introduction to Machine Learning
Presentation transcript:

Variational Inference Algorithms for Acoustic Modeling in Speech Recognition College of Engineering Temple University www.isip.piconepress.com John Steinberg and Dr. Joseph Picone Department of Electrical and Computer Engineering, Temple University Abstract The focus of this work is to assess the performance of three new variational inference algorithms for the acoustic modeling task in speech recognition: Accelerated variational Dirichlet process mixtures (AVDPM) Collapsed variational stick breaking (CVSB) Collapsed Dirichlet priors (CDP). Speech recognition (SR) performance is highly dependent on the data it was trained on. Our goal is to reduce the complexity and sensitivity of training. Dirichlet Processes Mixtures (DPMs) can learn underlying structure from data and can potentially help improve a system’s ability to generalize to unseen data. Inference algorithms are needed to make calculations tractable for DPMs. This poster template is from http://www.swarthmore.edu/NatSci/cpurrin1/posteradvice.htm. It is free, free, free for non-commercial use. But if you really like it, I’m always thrilled to get postcards from wherever you happen to be presenting your poster. Or, send me cookies! My kids made me put that last sentence in. Have fun. Sincerely, Colin Purrington, Department of Biology, Swarthmore College, Swarthmore, PA 19081, USA. Email: cpurrin1@swarthmore.edu Speech Recognition Systems English vs. Mandarin: A Phonetic Comparison Probabilistic Modeling: DPMs and Variational Inference What is a phoneme? An Example Training Features: # Study Hours Age Training Labels Previous grades about Word a – bout Syllable ax –b – aw – t Phoneme How many classes are there? 1? 2? 3? [1] QUESTION: Given a new set of features, what is the predicted grade? Applications English ~10,000 syllables ~42 phonemes Non-Tonal Language Mandarin ~1300 syllables ~92 phonemes Tonal Language 4 tones 1 neutral 7 instances of “ma” Other Applications Translators Prostheses Lang. Educ. Media Search Variational Inference DPMs require ∞ parameters Variational inference is used to estimate DPM models Dirichlet Processes DPMs model distributions of distributions Can find the best # of classes automatically! National Intelligence Auto/GPS Mobile Technology Variational Inference Results Conclusions DPMs can optimize the # of mixtures for GMMs AVDPM, CVSB, and CDP yield slightly improved error rates over GMMs AVDPM, CVSB, and CDP found much fewer # ‘s of mixtures than GMMs CH-E and CH-M performance gap is due to the number of class labels. Future Work Assess computational complexity of AVPDM, CVSB, and CDP (CPU time) Evaluate tradeoff between error rate and complexity Compare error rates on CH-E and CHM to results from TIMIT Evaluate effects of collapsing the label set in Mandarin to further reduce error rates Experimental Setup Gaussian Mixture Models Why English and Mandarin? Phonetically very different Can help identify language specific artifacts that affect performance CH-E CH-M CALLHOME English *This experiment has not been fully completed yet and this number is expected to dramatically decrease k Error (%) (Val / Evl) 4 63.23% / 63.28% 8 61.00% / 60.62% 16 64.19% / 63.55% 32 62.00% / 61.74% 64 59.41% / 59.69% 128 58.36% / 58.41% 192 58.72% / 58.37% k Error (%) (Val / Evl) 4 66.83% / 68.63% 8 64.97% / 66.32% 16 67.74% / 68.27% 32 63.64% / 65.30% 64 60.71% / 62.65% 128 61.95% / 63.53% 192 62.13% / 63.57% Algorithm Best Error Rate: CH-E Avg. k per Phoneme GMM 58.41% 128 AVDPM 56.65% 3.45 CVSB 56.54% 11.60 CDP 57.14% 27.93* Corpora: CALLHOME English (CH-E), CALLHOME Mandarin (CH-M) Conversational telephone speech ~300,000 (CH-E) and ~250,000 (CH-M) training samples respectively CALLHOME Mandarin Paradigm: Compare DPMs to a baseline Gaussian mixture model (GMM) Optimize system parameters such as the number of mixtures and word error rate Compare model complexity Algorithm Best Error Rate: CH-M Avg. k per Phoneme GMM 62.65% 64 AVDPM 62.59% 2.15 CVSB 63.08% 3.86 CDP 62.89% 9.45