Download presentation
Presentation is loading. Please wait.
Published byJanis Hood Modified over 8 years ago
1
1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser e.g. voice dialling on mobile phones Method: 1.Record, parameterise and store vocabulary of reference words 2.Record test word to be recognised and parameterise 3.Measure distance between test word and each reference word 4.Choose reference word ‘closest’ to test word
2
2 Words are parameterised on a frame-by-frame basis Choose frame length, over which speech remains reasonably stationary Overlap frames e.g. 40ms frames, 10ms frame shift We want to compare frames of test and reference words i.e. calculate distances between them 40ms 20ms
3
3 Problem: Number of frames won’t always correspond Easy: Sum differences between corresponding frames Calculating Distances
4
4 Solution 1: Linear Time Warping Stretch shorter sound Problem? Some sounds stretch more than others
5
5 Solution 2: Dynamic Time Warping (DTW) 5 3 9 7 3 4 7 4 Test Reference Using a dynamic alignment, make most similar frames correspond Find distances between two utterences using these corresponding frames
6
6 Digression: Dynamic Programming The shortest route from Dublin to Limerick goes through: –Kildare –Monasterevin –Portlaoise –Mountrath –Roscrea –Nenagh Now consider the shortest route from Dublin to Nenagh –What towns does the route go through?
7
7 Intercity Example
8
8
9
9 351 x 4 x 1 x 743 x 0 x 3 x 935 x 2 x 5 x 321 x 4 x 1 x 51 2 x 1 x 123 474 Reference TestTest We can also find the path through the grid that minimizes total cost of path 3511 x 8 x 5 x 7410 x 4 x 7 x 93 4 x 9 x 322 x 5 x 4 x 511 x 3 x 4 x 123 474 Compute minimum distances dist each point and place in mindist matrix: mindist(5,3) = min {1 + mindist(5,2), 1 + mindist(4,2), 1 + mindist(4,3)} TestTest Reference Place distance between frame r of Test and frame c of Reference in cell(r,c) of distance matrix
10
10 Examples so far are uni-dimensional Speech is multi-dimensional e.g. two dimensions, using points (4,3) and (5,2) 4 5 1 2 3 4 5 5432154321 x Distance equation for 2 dimensions: Distance equation for multi-dimensional:
11
11 Constraints Global –Endpoint detection –Path should be close to diagonal Local –Must always travel upwards or eastwards –No jumps –Slope weighting –Consecutive moves upwards/eastwards
12
12 Global Constraints
13
13 Local Constraints mindist(r,c) mindist(r,c-1) mindist(r-1,c)mindist(r-1,c-1) 1 1 2 weights
14
14 Points to Note DTW really only suitable for small vocabularies and/or speaker dependent recognition Should normalise for reference length Can use multiple utterances and cluster them Poor performance if recording environment changes High computation cost
15
15 Evaluation Performance of designs only comparable by evaluation Use a test set For single word recognition we can simply quote % accuracy: In error analysis, it can be helpful to use a confusion matrix
16
16 Confusion Matrix references test tokens yesno yes242 no321
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.