Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Similar presentations


Presentation on theme: "Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System."— Presentation transcript:

1 Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System using SVM

2 Page 1 of 12 Research Progress: Jun-Won Suh Research Activities Software Release  Tcl/Tk Search Demo debug  Language Model Tester  Diagnose Method for LanguageModel Classes Speaker Verification System  Setup parameters for isip_verify  Run the isip_verify utility to make SVM baseline  Propose techniques to improve system (Thesis Topic) Overall  Speed up on my research to graduate at DECEMBER.

3 Page 2 of 12 Research Progress: Jun-Won Suh SVM baseline DEC curve Some minor changes in front-end cause little bit better results.

4 Page 3 of 12 Research Progress: Jun-Won Suh Classifying sequences using score-space kernels The score-space kernel enables SVMs to classify whole sequences. A variable length sequence of input vectors is mapped explicitly onto a single point in a space of fixed dimension. The score-space is derived from the likelihood score. Score-argument,,which is a function of scores of a set of generative model Score-mapping operator,,which maps the scalar score- argument to the score-space. Choosing the first derivative operator, the gradient of log likelihood wrt a parameter describes how that parameter contribute to generating a particular speaker model.

5 Page 4 of 12 Research Progress: Jun-Won Suh Computing the score-space vectors Define the global likelihood of a sequence X = {x 1, …, x N l } where, N g is the number of Gaussians that make up the mixture model. N d is the dimensionality of the input vectors with components, and parameters of GMM Since we define the sequence X = {x 1, …, x Nl }, The derivatives are with respect to the covariances, means, and priors of the GMM. Let

6 Page 5 of 12 Research Progress: Jun-Won Suh Computing the score-space vectors The derivative with respect to the j th prior is, The derivative with respect to the k th components of the j th mean is, Lastly, the derivative with respect to the k th component of the j th covariance is, The fixed length score-space vector can be expressed as, where, j* runs number of GMM, N g and k* runs dimensionality of input vectors N d.

7 Page 6 of 12 Research Progress: Jun-Won Suh Computing the score-space vectors. Using the first derivative with argument score-operator and the same score-argument the mapping becomes This mapping have a minimum test performance that equals the original generative model, M. The inclusion of the derivatives as “extra features” should give additional information for the classifier to use. An alternative score-argument is the ratio of two generative models, M1 and M2,

8 Page 7 of 12 Research Progress: Jun-Won Suh Computing the score-space vectors The dimensionality of the score-space is equal to the total number of parameters in the generative models. Hence the SVM can classify the complete utterance sequences. The kernel is constructed using dot products in score- space where, G is the inverse fisher information matrix in log likelihood score-space mapping.

9 Page 8 of 12 Research Progress: Jun-Won Suh Same approach as Score-space method (prosody, word choice, pronunciation, and etc.) Using the phone sequence of acoustic information, the system performs accurate on speaker verification job. This technique uses likelihood ratio score-space kernel with no derivative arguments. One of MIT-LL approach: Phonetic SVM System

10 Page 9 of 12 Research Progress: Jun-Won Suh Phone Sequence Extraction Phone sequence extraction for speaker recognition process is performed using the phone recognition system (PPRLM) designed by Zissman for language identification. Phone is modeled in a gender dependent context independent manner using a three state HMM. Phone recognition is performed with a Viterbi search using a fully connected null-grammar network on monophones (no explicit language model in decoding). The phone sequences is vectorized by computing frequencies of N-grams.

11 Page 10 of 12 Research Progress: Jun-Won Suh Bag of N-grams Produce N-grams by the standard transformation of the stream. Example: For bigrams, the sequence of phones, t 1, t 2, …, t n, is transformed to the t 1 _t 2, t 2 _t 3, …,t n-1 _t n. The unique unigrams and bigrams are designated d 1,…,d M, and d 1 _d 1, … d M _d M. Then we calculate probabilities and joint probabilities.

12 Page 11 of 12 Research Progress: Jun-Won Suh Kernel Construction Suppose that the sequence of N grams in each conversation side is t 1, t 2, …, t n and u 1, u 2, …, u m. Also denote the unique set of N grams as d 1, d 2, …,d M. Likelihood ratio computation serve as the kernel.

13 Page 12 of 12 Research Progress: Jun-Won Suh Conclusion Using the nature of human listeners, the Speaker Verification can be improved. Using phone N-gram and Score-space technique can improve the Speaker Verification system.

14 Page 13 of 12 Research Progress: Jun-Won Suh References V. Wan, Speaker Verification using Support Vector Machines, University of Sheffield, June 2003 V. Wan, Building Sequence Kernels for Speaker Verificaiton and Speech Recognition, University of Sheffield S. Bengio, and J. Marithoz, Learning the Decision Function for the Speaker Verification, IDIAP, 2001 W.M. Campbell, J.P. Campbell, D.A. Reynolds, D.A. Jones, and T.R. Leek, Phonetic Speaker Recognition with Support Vector Machines, Advances in Neural Information Processing Systems, 2004


Download ppt "Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System."

Similar presentations


Ads by Google