Presentation is loading. Please wait.

Presentation is loading. Please wait.

VQ for ASR 張智星 多媒體資訊檢索實驗室 清華大學 資訊工程系.

Similar presentations


Presentation on theme: "VQ for ASR 張智星 多媒體資訊檢索實驗室 清華大學 資訊工程系."— Presentation transcript:

1 VQ for ASR 張智星 jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang 多媒體資訊檢索實驗室 清華大學 資訊工程系

2 -2--2- Construct VQ-based Classifiers  Design phase  Use k-means or generalized Lloyd algorithms to design a codebook (which contains a set of code words, centroids, or prototypical vectors) for each class.  Application phase  For a given utterance, find the min. average distortion scores among all classes.

3 -3--3- Vector Quantization (VQ)  Advantages of VQ for classifier designs:  Reduced storage  Reduced computation  Efficient representation that improves generalization capability

4 -4--4- ASR without Time Alignment  Good for simple vocabularies of highly distinct words, such as English digits  Bad for vocabularies  With words that can only be distinguished by their temporal sequential characteristics, such as “ car ” and “ rack ”, “ we ” and “ you ”  With complex speech classes that encompass long utterances with rich phonetic contents

5 -5--5- Centroid Computation in VQ  The codebook for a class should be designed to minimize the average distortion. Different distortions lead to different codebook designs.  L2 distance  mean vector  Mahalanobis distance  mean vector  L1 distance  median vector  Other distortions and corresponding centroid computation can be found in 5.2.2 of “ Fundamental of Speech Recognition ” by L. Rabiner and B.-H. Juang.

6 -6--6- Other Uses of VQ in ASR  VQ can be used to speed up DTW-based comparison:  VQ-based classifiers can be used as a preprocessor for DTW-based classifiers.  VQ can be used for efficient computation of (approximate) DTW distance.

7 -7--7- DTW for Speaker-independent Tasks  Basic procedures to adapt DTW for speaker-independent tasks:  Massive recordings  Use MFCC with vocal tract length normalization  Use modified K-means to find cluster centers (or representative utterances) for each class

8 -8--8- Modified K-means  Goal: Find representative objects in a set with elements in a non-Euclidean space  Steps:  1. Select initial centers  2. Label each utterance to a cluster  3. Revise cluster centers:  Minimax centers  Pseudoaverage centers  Segmental version of the above two (segmental VQ)  4. Go back to step 2 till converge

9 -9--9- Segmental Vector Quantization Utterance 1 Utterance 2 Utterance 3 Utterance n Codebook 1Codebook 2Codebook 3............ Utterances of the same class


Download ppt "VQ for ASR 張智星 多媒體資訊檢索實驗室 清華大學 資訊工程系."

Similar presentations


Ads by Google