Download presentation
Presentation is loading. Please wait.
Published byTyrone Kennedy Modified over 9 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author: Aravind Ganapathiraju, Jonathan E. Hamaker and Joseph Picone Applications of Support Vector Machines to Speech Recognition IEEE 2004
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Introduction Speech Recognition Support Vector Machines Experimental Results Conclusions Personal Opinion
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation There are problems with an ML formulation for applications such as speech recognition. ─ Higher dimensional problem will never achieve perfect classification.
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective Apply SVMs to overcome higher dimensional problems and achieve perfect classification. Application of SVMs to large vocabulary speech recognition To the development and optimization of an SVM/HMM hybrid system
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction Speech Recognition Speech Recognition Process Hidden Markov Model Application of SVMs to Speech Recognition: Review the SVM approach Discuss applications to speech recognition Present experimental results
6
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Speech Recognition
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Speech Recognition Process (MFCC)
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Hidden Markov Model (1/2) A HMM is a doubly stochastic process with an underlying stochastic process that is not observable (it is hidden) It is a state transition process described For speech modeling applications, the HMM is a generator of vector sequences.
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Hidden Markov Model (2/2) Finite-State Machine + Probability Process
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. HMMs Problems Maximizing the likelihood (ML) ─ estimate the parameters that guarantee convergence Expectation–maximization (EM) ─ estimation with good convergence properties, although it does not guarantee finding the global maximum Problems with an ML formulation ─ will never achieve perfect classification
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Global maximum problem
12
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Support Vector Machines
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVM Support Vector Classification 的目標是在高維度的特徵空間 中找出一個區分平面 (separating hyperplanes ) 。而此區分平 面 (separating hyperplanes ) 可以找出最佳的邊界。 ERM and SRM be used to find a good hyperplane ─ ERM: empirical risk minimization Can be used to find a good hyperplane, although this does not guarantee a unique solution ─ SRM: structure risk minimization Can help choose the best hyperplane by ordering the hyperplanes based on the margin Real-world classification problems ANNs ─ attempt overcome many of problems ─ Slow convergence during training and a tendency to overfit the data.
14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A hyperplane classifier
15
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Kernels Allow a dot product to be computed in a higher dimensional space ─ Linear ─ Polynomial ─ Radial basis function (RFB) Slower than polynomial kernels but better performance ─ Sigmoid
16
Intelligent Database Systems Lab N.Y.U.S.T. I. M. One-against-all method y i ─ are the class assignments w ─ represents the weight vector defining the classifier, b ─ is a bias term ε i ─ the ’s arethe slack variables.
17
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Applications to speech recognition Hybrid approaches SVMs cannot model the temporal structure of speech effectively. So, we still need use HMM structure to model temporal evolution Use NN only to estimate posterior probabilities
18
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Several issues arise Posterior estimation Segmental Modeling N-best List Rescoring
19
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Posterior estimation There is significant overlap in the feature space. SVMs provide a distance or discriminate that can be used to compare classifiers. Main concerns in using SVMs ─ lack of a clear relationship between distance from the margin ─ the posterior class probability We used a sigmoid distribution to map the output distances to posteriors
20
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Sigmoid
21
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Segmental Modeling (1/2) At frame-level still not computationally feasible to train on all data available in the large corpora. In our work, we have chosen to use a segment-based approach to avoid these issues. Segmental data takes better advantage of the correlation in adjacent frames of speech data. A related problem is the variable length or duration problem.
22
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Segmental Modeling (2/2) A simple but effective approach motivated by the three-state HMMs is to assume that the segments are composed of a fixed number of sections. The first and third sections model the transition into and out of the segment The second section models the stable portion of the segment
23
Intelligent Database Systems Lab N.Y.U.S.T. I. M.
24
Intelligent Database Systems Lab N.Y.U.S.T. I. M. The concept of segmental probability model (SPM)
25
Intelligent Database Systems Lab N.Y.U.S.T. I. M. N-best List Rescoring Generate N-best lists using HMM system Alignment for each hypothesis in the N-best list using the HMM system. Segment-level feature vectors are generated from these alignments. The N-best list is reordered based on the likelihood, and the top hypothesis is used to calibrate the performance of the system.
26
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Overview of a hybrid HMM/SVM system
27
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Results The Deterding vowel data ─ Simple but popular static classification task ─ Used to benchmark nonlinear classifiers. Spoken Letters and Numbers ─ Spoken letters and long distance telephone lines. ─ OGI Alphadigits (AD) ─ Confusable for telephone-quality speech (e.g. “p” vs “b”)
28
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions A support vector machine as a classifier in a continuous speech recognition system. A hybrid SVM/HMM system has been developed. The results obtained in the experiments clearly indicate the classification power of SVMs and affirm the use of SVMs for acoustic modeling. Further research into the segmentation issue
29
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Personal Opinion I need study more and more… and I wish god can give me more time
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.