1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science.

Slides:



Advertisements
Similar presentations
Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,
Advertisements

An Introduction of Support Vector Machine
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
1 Bayesian Adaptation in HMM Training and Decoding Using a Mixture of Feature Transforms Stavros Tsakalidis and Spyros Matsoukas.
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Principal Component Analysis
A Self-Supervised Terrain Roughness Estimator for Off-Road Autonomous Driving David Stavens and Sebastian Thrun Stanford Artificial Intelligence Lab.
A Data-Driven Approach to Quantifying Natural Human Motion SIGGRAPH ’ 05 Liu Ren, Alton Patrick, Alexei A. Efros, Jassica K. Hodgins, and James M. Rehg.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
Speaker Adaptation for Vowel Classification
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Optimal Adaptation for Statistical Classifiers Xiao Li.
Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.
Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.
9.0 Speaker Variabilities: Adaption and Recognition References: of Huang 2. “ Maximum A Posteriori Estimation for Multivariate Gaussian Mixture.
Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models C. J. Leggetter and P. C. Woodland Department of.
General Linear Model & Classical Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM M/EEGCourse London, May.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Big Data Research in Undergraduate Education George Karypis Department of Computer Science & Engineering University of Minnesota.
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Regression Approaches to Voice Quality Control Based on One-to-Many Eigenvoice Conversion Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, and.
1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
An Evaluation of Many-to-One Voice Conversion Algorithms with Pre-Stored Speaker Data Sets Daisuke Tani, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Detection, Classification and Tracking in a Distributed Wireless Sensor Network Presenter: Hui Cao.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Introduction to Biostatistics and Bioinformatics Regression and Correlation.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Robust Feature Extraction for Automatic Speech Recognition based on Data-driven and Physiologically-motivated Approaches Mark J. Harvilla1, Chanwoo Kim2.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Feature Selection and Extraction Michael J. Watts
SOME SIMPLE MANIPULATIONS OF SOUND USING DIGITAL SIGNAL PROCESSING Richard M. Stern demo January 15, 2015 Department of Electrical and Computer.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Inter Class MLLR for Speaker Adaptation
Ch3: Model Building through Regression
Principal Component Analysis (PCA)
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Sphinx Recognizer Progress Q2 2004
Introduction to Digital Speech Processing
Multivariate Methods Berlin Chen
Factor Analysis.
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science Carnegie Mellon University October 20, 2000

Carnegie Mellon Robust Speech Group 2 Outline Introduction Review Transformation-based adaptation Inter-class MLLR Application of weights For different neighboring classes Summary

Carnegie Mellon Robust Speech Group 3 Introduction We would like to achieve Better adaptation using small amount of adaptation data  Enhance recognition accuracy Current method Reduce the number of parameters Assume transformation function  Transformation-based adaptation Example: Maximum likelihood linear regression

Carnegie Mellon Robust Speech Group 4 Introduction (cont’d) Transformation-based adaptation  Transformation classes are assumed to be independent  It does not achieve reliable estimates for multiple classes using a small amount of adaptation data Better idea ?  Utilize inter-class relationship to achieve more reliable estimates for multiple classes

Carnegie Mellon Robust Speech Group 5 Transformation-Based Adaptation Estimate each target parameter (mean vector)

Carnegie Mellon Robust Speech Group 6 Transformation-Based Adaptation (cont’d) Estimate each transformation function

Carnegie Mellon Robust Speech Group 7 Transformation-Based Adaptation (cont’d) Trade-off Better estimation of transformation function More details of target parameters Number of transformation classes Quality

Carnegie Mellon Robust Speech Group 8 Previous Works Consider Correlations among model parameters Mostly in Bayesian framework Considering a few neighboring models:  Not effective Considering all neighboring models:  Too much computation It is difficult to apply correlation on multi-Gaussian mixtures: No explicit correspondence

Carnegie Mellon Robust Speech Group 9 Previous Works (cont’d) Using correlations among model parameters

Carnegie Mellon Robust Speech Group 10 Inter-Class Relationship Inter-class relationship among transformation functions ?

Carnegie Mellon Robust Speech Group 11 Inter-Class Relationship (cont’d) Two classes are independent Class 1 Class 2

Carnegie Mellon Robust Speech Group 12 Inter-Class Relationship (cont’d) If we know an inter-class transformation g 12 (.)  Now class 2 data contribute to the estimation of f 1 (.)  More reliable estimation of f 1 (.) while it keeps the characteristics of Class 1  Transform class 2 parameters f 2 (.) f 1 (.) g 12 (.)  2k (12) Class 1 Class 2 f 2 (.) can be estimated by transforming class 1 parameters

Carnegie Mellon Robust Speech Group 13 Use Linear Regression for inter-class transformation Estimate ( A 1, b 1 ) to minimize Q Where Inter-class MLLR

Carnegie Mellon Robust Speech Group 14 Application of Weights Neighboring classes have different contributions to the target class

Carnegie Mellon Robust Speech Group 15 Application of Weights (cont’d) Application of weights to the neighboring classes  We assume in neighboring class n  The error using (A 1, b 1 ) in neighboring class n  Weighted least squares estimation:  Use the variance of the error for weight  Large error  Small weight  Small error  Large weight

Carnegie Mellon Robust Speech Group 16 Number of Neighboring Classes Limit the number of neighboring classes  Sort neighboring classes  Set threshold for the number of samples  Use “closer” neighboring class first  Count the number of samples used  Use next neighboring classes until the number of samples exceed the threshold

Carnegie Mellon Robust Speech Group 17 Experiments Test data 1994 DARPA, Wall Street Journal (WSJ) task 10 Non-native speaker x 20 test sentences (Spoke 3: s3-94) Baseline System: CMU SPHINX-3 Continuous HMM, 6000 senones 39 dimensional features MFCC cepstra + delta + delta-delta + power Supervised/Unsupervised adaptation Focus on small amounts of adaptation data 13 phonetic-based classes for inter-class MLLR

Carnegie Mellon Robust Speech Group 18 ExperimentsExperiments (cont’d) Adaptation Method1 Adapt. Sent.3 Adapt. Sent. Baseline (No adapt)27.3% Conventional MLLR (one class)24.1%23.1% Inter-class MLLR without weights (full + shift) 20.4% (15.4%)19.6% (15.2%) Inter-class MLLR with weights (full + shift) 20.2% (16.2%)19.3% (16.5%) Supervised adaptation Word Error Rates

Carnegie Mellon Robust Speech Group 19 ExperimentsExperiments (cont’d) Adaptation Method1 Test Sent.10 Test Sent. Baseline (No adapt)27.3% Conventional MLLR (one class)26.7%23.9% Inter-class MLLR without weights (full + shift) 24.0 % (10.1%)20.1% (15.9%) Inter-class MLLR with weights (full + shift) 24.3 % (9.0%)19.9% (16.7%) Unsupervised adaptation Word Error Rates

Carnegie Mellon Robust Speech Group 20 Experiments (cont’d) Limit the number of neighboring classes Supervised adaptation: 10 adaptation sentences

Carnegie Mellon Robust Speech Group 21 Summary Application of weights Use weighted least square estimation Was helpful for supervised case Was not helpful for unsupervised case (with small amount of adaptation data) Number of neighboring classes Use smaller number of neighboring classes as more adaptation data are available

Carnegie Mellon Robust Speech Group 22 Summary Inter-class transformation It can have speaker-dependent information We may prepare several sets of inter-class transformations  Select appropriate set for a new speaker Combination with Principal Component MLLR Did not provide additional improvement

Carnegie Mellon Robust Speech Group 23 Thank you !