Dynamic Time Warping and training methods

Dynamic Time Warping and training methods
Final Exam Project By Hesheng Li Instructor: Dr.Kepuska Department of Electrical and Computer Engineering © Florida Institute of Technology

Overview Introduction DTW and Implementation
Robust training and clustering training Result

Introduction The same speech utterance are seldom realized at the same speed across the entire utterance. There is a need to normalize out speaking rate fluctuation in order for the utterance comparison to be meaningful before a recognition decision can be made .

Introduction

DTW For isolated word recognition we need to determine the
distance or similarity of the reference sequence : R={r(1),r(2)……r(I)} and test sequence T={t(1)…..t(J)} Local distance between two warping function pairs: The DTW-distance of R and T is defined as the minimum overall distance among all path

DTW Type a) symmetric Wa(k)=[t(k)-t(k-1)]+[r(k)-r(k-1)]
Type b) asymmetric Wb1(k)=t(k)-t(k-1) Wb2(k)=r(k)-r(k-1)

DTW Type c) Wc(k)=min{[t(k)-t(k-1)],[r(k)-r(k-1)]} Type d)
Wd(k)=max{[t(k)-t(k-1)],[r(k)-r(k-1)]}

Normalization factor For weighting function type (a) Na=T+R
For weighting function type (b1) Nb1=T For weighting function type (b2) Nb2=R For weighting function type (c) and (d) the factor is depend on the shape of the path

Path limitations Monotonic condition: the path will not turn back on itself, both the i and j indexes either stay the same or increase, they never decrease. Continuity condition: The path advances one step at a time. Both i and j can only increase by 1 on each step along the path. Boundary condition: the path starts at the bottom left and ends at the top right.

Path limitations Adjustment window condition: a good path is unlikely to wander very far from the diagonal. The distance that the path is allowed to wander is the window length r. Slope constraint condition: The path should not be too steep or too shallow. This prevents very short sequences matching very long ones. The condition is expressed as a ratio n/m where m is the number of steps in the x direction and n is the number in the y direction. After m steps in x you must make a step in y and vice versa.

DTW-DP Algorithm If D(i, j) is the global distance up to, but not including point (i, j) and the local distance of (i, j) is given by d(i, j), then D(i, j) = min [D(i-1, j-1), D(i-1, j), D(i, j-1)] + d(i, j) The final global distance D(M, N) at the end of the path gives The overall lowest matching score of the template with the utterance, where M is the number of vectors of the utterance. Given that D(1, 1) = d(1, 1) (this is the initial condition), then we can use recursive algorithm by adding i and j in the adjustment window for computing D(i, j).

DTW-DP Algorithm Prove:

DTW-DP Algorithm

DTW-DP Algorithm Implement steps:
Calculate the global distance for the bottom most cell of the left-most column, column 0. Column 0 is then designated the predCol (predecessor column). The calculation then proceeds upward in column 0 Calculate the global distance to the bottom most cell of the next column, column 1 (which is designated the curCol, for current column). Calculate the global distance of the rest of the cells of curCol. For example, at cell (i, j) this is the local distance at (i, j) plus the minimum global distance at either (i-1, j), (i-1, j-1) or (i, j-1). curCol becomes predCol and step 2 and 3 is repeated until all columns have been calculated.

Training-creation of reference
How to construct the reference patterns 1.Robust training 2.Clustering training Unsupervised Clustering Without Average (UWA) Modified K-means algorithms (MKM)

Robust Training Let X1=(x1,x2,x3….xn) be the first token
Compare with X2 via a DTW process,resulting in a distortion score D(X1,X2) if D(X1,X2)< (threshold) Reference pattern compute as a warped average of X1 and X2

Robust Training Drawback 1.A single template is inadequate for words
have more than one mode such as word “eight”. 2.The inadequacy becomes even more serious when dealing with speaker independent tasks

Clustering The task of clustering is to cluster the L patterns into N clusters such that within each cluster,the patterns are highly similar under the specific pattern dissimilarity measure chosen for the recognizer design and hence can be efficiently represented by a typical template The main advantage of pattern clustering are the statistical consistency of the generated templates and their ability to cope with a wide range of individual speech variations in speaker-independent environment

Unsupervised Clustering Without Average
Distance or Dissimilarity matrix D: Let X be a set of training patterns, X={x1,x2……xL}, we define a dissimilarity between a pair of patterns as d(xi,xj).Then a L*L matrix D is defined with ijth entry dij, dij=0.5*[d(xi,xj)+d(xj,xi)] Minmax “center” : The minmax center of a set is the pattern in the set whose maximum distance to any other pattern in the set is the smallest

The UWA clustering algorithm: 1.Initialization: j=0,k=0, , 2.Determine by making use of D,the distortion matrix 3.Form by where and including within all patterns in that are with a distance threshold of the minmax center

4.Determine , the new minmax center of cluster according by making use of D 5.If that is , the cluster composition is unchanged after one iteration-convergence is obtained and ,increment j, j=j+1, and form the new partial training set if is not an empty set and j is smaller than maximum number of clusters allowed,go back to step 2 and iterate;otherwise, stop

Compute distance matrix D
Initialization Compute distance matrix D Compute matrix center k=0 Determine cluster set Compute Minmax center of yes no k=k+1 no k>kmax yes j=j+1 no yes j<N Is an empty set ? no yes done

Some inherent problems: 1.A distance threshold has to be prescribed by the user to define the compactness of a cluster 2.The procedure does not automatically guarantee coverage the entire training set, depending on the distance threshold

Example Center of cluster
Cluster the 400 tokens(“operator”) into 6 clusters: Center of cluster 277 375 134 385 149 103 Outliers Number in cluster 219 92 60 10 6 4 9

Result Testing result: Score 900 23 0.15 79 0.53 950 33 0.22 94 0.62
“Wildfire” “Operator” Error number Error rate Correct number Correct rate 900 23 0.15 79 0.53 950 33 0.22 94 0.62 1000 39 0.26 105 0.69 1050 46 0.31 116 0.76

Reference Huang, Acero, and Hon, Spoken Language Processing, Prentice-Hall, 2001. Rabiner and Juang, Fundamentals of Speech Recognition,Prentice-Hall,1993

Dynamic Time Warping and training methods

Similar presentations

Presentation on theme: "Dynamic Time Warping and training methods"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Time Warping and training methods

Similar presentations

Presentation on theme: "Dynamic Time Warping and training methods"— Presentation transcript:

Similar presentations

About project

Feedback