Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.

Similar presentations


Presentation on theme: "Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ."— Presentation transcript:

1 Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ

2 9.1 Basic idea of DTW (1)  The frames of T and R i are not exactly corresponding. They will have a non-linear corresponding such that the total distance will be minimal. This is a natural idea.  Suppose T(i) is frame i for test utterance (1<=i<=N T ), R k (j) is frame j for reference utterance k in the vocabulary or R(j) for short(1<=j<=N k ). T(i) and R(j) are both vector. d(i,j) is the distance between T(i) and R(j).

3 Basic idea of DTW (2)  If a set of (i,j) pair could be found and the total distance D along these points (or the path) will be minimal :  D = min Σd(i,j) (i,j) ∈ path  Suppose the point on the path is (n i,m i ); the path has some constraints : (n 1,m 1 ) = (1,1) and (n N,m N ) = (N T,N k ) (N=N T ); for limiting the calculation, it is supposed that N k /2 <= N T <= 2N k for any k

4 Basic idea of DTW (3)  So the average slope of the path will be in 0.5~2.0. For meeting that, if the current point is (n i,m i ), the next point will be: (n i +1,m i +2) or (n i +1,m i +1) or (n i +1, m i ), the last one is possible only if m i-1 != m i.  Sometime the initial point could be floating to get better matching.

5 DTW algorithm (1)  DTW means Dynamic Time Warping.  It uses Dynamic Programming method to implement the idea described in 9.1.  The algorithm could be described like following: (1) i=1,j=1, d[i,j] = d(T(i),R(j)) (2) if ++i <=N T, calculate j l and j h according to the constraint condition, and calculate d(T(i),R(j)) for j=j l to j=j h ; (3) For all (i,j l ) to (i,j h ) do

6 DTW algorithm (2) D[i,j] = d(T(i),R(j)) + D[i-1,j’] j’ could be j-2,j-1 or j determined by D[i-1,j’] = min { D[i-1,j],D[i-1,j-1],D[i-1,j-2] } Store D[i,j] and j’(i) (4)When i>N T, stop. D[N T,j’] = min k D[N T,k], k=N k, N k -1 or N k -2 (5) Start from (N T,j’) to backtrace the points on the path by j’(N T ) and get the path.

7 DTW algorithm (3)  The D[N,j’] will be the distance between T and a reference R k. By using same procedure to get all distances between T and R i ’s and use minimal distance principle we can easily determine the best matched word for input word.  This algorithm was used often before the HMM being used. The disadvantage is the large computing time. To overcome it, people figure out the VQ algorithm.

8 9.3 Basic idea of VQ algorithm (1)  VQ stands for Vector Quantization to contrast to scalar quantization.  The basic idea is to partition the whole feature space into a certain number (2 n ) of regions, and use the center of a region to represent any vector falling into the region. The calculation will become to looking up distance table if the table is pre-calculated before recognition. It will save some computing time.

9 Basic idea of VQ algorithm (2)  For doing that it uses some clustering algorithm like k-means algorithm to iteratively get the clustering centers and the membership of vectors until convergence.  The clustering centers will form the codebook. The membership of a vector will be a code label according to the minimal distance principle. Every vector will become a code label.

10 Basic idea of VQ algorithm (3)  The distance between two vectors will be represented by the distance between two centers. It could be pre-calculated as soon as the codebook is obtained. So during the recognition whenever the code is obtained the calculation will be a kind of looking up table operation.

11 Basic idea of VQ algorithm (4)  It will speed up the DTW process.  VQ get application not only in speech recognition but also in speech synthesis and speech coding.  VQ has also many applications in field of multimedia for data compression. Of course in this case there are some errors for restoration.

12 The LBG Algorithm (1)  (1) S = { x } is the set of all vector samples  (2) Set maximal number of iteration L  (3) Set threshold t  (4) Set m initial centers y 1 (0), y 2 (0), …, y m (0)  (5) Set initial distortion D(0) =∞  (6) Set iteration number k = 1  (7) Make partition of all samples into S 1, S 2, …, S m according to minimum distance principle.

13 The LBG Algorithm (2)  (8) Calculate the total distortion : D(k) = Σ i=1 m Σ x d(x,y i (k-1)) x ∈ S i (k)  (9) Calculate the relative improvement of distortion δ(k) = |D(k)-D(k-1)|/D(k)  (10) Calculate the new centers(codewords) y i (k)=(Σx)/N i (k), x ∈ S i (k), i=1~m  (11) if δ(k) < t goto (13) else goto (12)

14 The LBG Algorithm (3)  (12) if (++k<L) goto (7) else goto (13)  (13) output codewords y 1 (k), y 2 (k), …, y m (k) and D(k)  (14) end  The partition mode is called ‘Voronoi’ partition. It implied that for every iteration the total intra-class distances will be reduced.

15 The LBG Algorithm (4)  Approaches for setting the initial codebook  (1) Random initial codebook  It takes the initial centers arbitrarily. It might not be so good. We can ask the new center must have a distance to all other centers being larger than a threshold, and in this way the m initial centers will be set.

16 The LBG Algorithm (5)  (2) Splitting approach  At first get the center y 0 of all samples, then get y 1 with maximal distance to the y 0 and y 2 with maximal distance to y 1. Then get S 1 and S 2 for y 1 and y 2.  By using same way for S 1 and S 2, we can split them into 4 sets. The loop can be done until m=2 B initial centers are obtained by B iterations.  This is an often used way.

17 The improvement of VQ (1)  (1)VQ system by tree search (codebook structure) When doing recognition by VQ, for every vector a search is needed for getting its code or label. In general we need search for every center, it takes time. If we can create a tree of codebook and keep all levels of the codewords, then the search will be easy. The cost is about double storages.

18 The improvement of VQ (2)  It could be realized like this : At first a codebook of capacity 2 is generated: y 0 and y 1 and corresponding subset. Then for every these subsets the next level of centers are created : y 00,y 01,y 10,and y 11. This is the second level.

19 The improvement of VQ (3)  By repeating k steps a k levels tree will be created. It will have 2 k codewords.  The advantages are : search amount will be down to 2k (vs 2 k ) distance calculation and k (vs 2 k -1) comparison. Also the training amount will be reduced (every splitting only concerns two- codeword codebook)  The disadvantages are : average distortion is worse than the full search codebook and double storages.  Besides binary tree, other number of codewords for a level could be used.

20 The improvement of VQ (4)  (2) Tree codebook formed by full search codebook At first a full search codebook is created by LBG algorithm. Then m codewords are divided into m/2 pairs by minimal distance, and the center is found. Then next up level is created by same way. After k steps the tree is formed. It is better than previous, but some time it may make mistakes.

21 The improvement of VQ (5)  (3) Multi levels of VQ system  (4) Split VQ (suitable for LSP vector)  (5) Fast search in full search system


Download ppt "Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ."

Similar presentations


Ads by Google