Download presentation
Presentation is loading. Please wait.
1
VQ for ASR 張智星 jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang 多媒體資訊檢索實驗室 清華大學 資訊工程系
2
-2--2- Construct VQ-based Classifiers Design phase Use k-means or generalized Lloyd algorithms to design a codebook (which contains a set of code words, centroids, or prototypical vectors) for each class. Application phase For a given utterance, find the min. average distortion scores among all classes.
3
-3--3- Vector Quantization (VQ) Advantages of VQ for classifier designs: Reduced storage Reduced computation Efficient representation that improves generalization capability
4
-4--4- ASR without Time Alignment Good for simple vocabularies of highly distinct words, such as English digits Bad for vocabularies With words that can only be distinguished by their temporal sequential characteristics, such as “ car ” and “ rack ”, “ we ” and “ you ” With complex speech classes that encompass long utterances with rich phonetic contents
5
-5--5- Centroid Computation in VQ The codebook for a class should be designed to minimize the average distortion. Different distortions lead to different codebook designs. L2 distance mean vector Mahalanobis distance mean vector L1 distance median vector Other distortions and corresponding centroid computation can be found in 5.2.2 of “ Fundamental of Speech Recognition ” by L. Rabiner and B.-H. Juang.
6
-6--6- Other Uses of VQ in ASR VQ can be used to speed up DTW-based comparison: VQ-based classifiers can be used as a preprocessor for DTW-based classifiers. VQ can be used for efficient computation of (approximate) DTW distance.
7
-7--7- DTW for Speaker-independent Tasks Basic procedures to adapt DTW for speaker-independent tasks: Massive recordings Use MFCC with vocal tract length normalization Use modified K-means to find cluster centers (or representative utterances) for each class
8
-8--8- Modified K-means Goal: Find representative objects in a set with elements in a non-Euclidean space Steps: 1. Select initial centers 2. Label each utterance to a cluster 3. Revise cluster centers: Minimax centers Pseudoaverage centers Segmental version of the above two (segmental VQ) 4. Go back to step 2 till converge
9
-9--9- Segmental Vector Quantization Utterance 1 Utterance 2 Utterance 3 Utterance n Codebook 1Codebook 2Codebook 3............ Utterances of the same class
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.