Speech Lab, ECE, State University of New York at Binghamton Classification accuracies of neural network (left) and MXL (right) classifiers with various percentages of training data using NLPCA2 (10 features) and original features (10 and 39 features) Classification accuracies of original features and NLPCA2 reduced features with 2% (left) and 50% (right) of the training data Simulation of NLPCA1 Plot of input and output for semi-random 2-D data. The output data is reconstructed data using an NLPCA1 trained neural network with 1 hidden node An example with 3D data. Input and output plots of 3-D Gaussian data before and after using neural network with 2 hidden nodes Dimensionality Reduction of Speech Features Using Nonlinear Principal Components Analysis Stephen A. Zahorian, Tara Singh*, Hongbing Hu Department of Electrical and Computer Engineering, Binghamton University, Binghamton, NY, USA * Department of Electrical and Computer Engineering, Old Dominion University, Norfolk, VA, USA Introduction Difficulties in automatic speech recognition Large dimensionality of acoustic feature spaces Significant load in feature training (“Curse of dimensionality”) Linear dimensionality reduction methods Principal Components Analysis (PCA) Linear Discriminant Analysis (LDA) Drawback of linear methods Can result in poor data representations The straight line fit to the data obtained by linear PCA does not accurately represent the original distribution of the data NLPCA Approaches Nonlinear Principal Components Analysis (NLPCA) Nonlinear transformation is applied to obtain a transformed version of the data for PCA Nonlinear transformation Two approaches (NLPCA1 and NLPCA2) were used for training the neural network (x): Transformed feature of the data point x for machine learning R M : M dimension feature space (.): A neural network mapping to obtain data more suitable for linear transformations NLPCA Approaches NLPCA1 The neural network is trained as an identity map –Minimize mean square error using targets that are the same as the inputs –Training with regularization is often needed to “guide” the network to a better minimum in error NLPCA2 The neural network is trained as classifier –The network is trained to maximize discrimination Input Data Bottleneck neural network Dimensionality Reduced Data Experimental Evaluation Database Transformation methods compared Original features, LDA, PCA, NLPCA1 and NLPCA2 Classifiers Neural network and MXL (maximum likelihood Mahalanobis distance based Gaussian assumption classifier) Experiment 1 The same training data were used to train the transformations and the classifiers The number of features varied from 1 to 39 Variable percentages of training data (1%, 2%, 5%, 10%, 25%, 50% and 100%) were used Experiment 1 Results Classification accuracies of neural network (left) and MXL (right) classifiers with various types of features using all available training data (Figures on next column) NTIMIT database Target (vowels)/ah/, /ee/, /ue/, /ae/, /ur/, /ih/, /eh/, /aw/, /uh/, /oo/ Training data31,300 tokens Testing data11,625 tokens Feature39 DCTC-DCS Conclusions The nonlinear technique minimizing mean square reconstruction error (NLPCA1) can be very effective for representing data which lies in curved subspaces, but does not appear to offer any advantages over linear dimensionality reduction methods for a speech classification task The nonlinear technique based on minimizing classification error (NLPCA2) is quite effective for accurate classification in low dimensionality spaces The reduced features appear to be well modeled as Gaussian features with a common covariance matrix Nonlinear PCA (NLPCA2) is much more effective than normal PCA for reducing dimensionality; however, with a “good” classification method, neither dimensionality reduction method improves classification accuracy. Acknowledgement This work was partially supported by JWFC 900 For both cases, highest accuracy was obtained with NLPCA2, especially with a small numbers of features. NLPCA2 shows better performance than 10-D original features using 10% of training data or more, and has similar performance with 39-D original features Using 50% of the training data, NLPCA2 performs substantially better than original features, at least for 12 or fewer features Experiment 2 50% of the training data was used for training transformations and a variable percentage, ranging from 1% to 100% of the other half of the training data, was used for training classifiers Experiment 2 Results Classification accuracies of neural network (left) and MXL (right) classifiers using 10% of classifier training data for training classifier Classification accuracies of neural network (left) and MXL (right) classifiers with various percentages of classifier training data using 4 features For both the neural network and MXL classifiers, NLPCA2 clearly performs much better than the other transformations or the original features. NLPCA2 yields the best performance, with about 68% accuracy for both cases. Similar trends were also observed for 1, 2, 8, 16, and 32 features.