Presentation is loading. Please wait.

Presentation is loading. Please wait.

DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學.

Similar presentations


Presentation on theme: "DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學."— Presentation transcript:

1

2 DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學 資工系 )

3 -2- Dynamic Time Warping (DTW)  Characteristics:  Pattern-matching-based approach  Require less memory/computation  Suitable for speaker-dependent recognition  Suitable for small to medium vocabulary  Suitable for microprocessor/chip implementation  Applications  Speaker identification & verification for surveillance  Voice commands for mobile phones, toys

4 -3- Dynamic Time Warping: Type 1 i j t(i-1) r(j) t: input MFCC matrix (Each column is a frame’s feature.) r: reference MFCC matrix Local paths: 27-45-63 degrees DTW recurrence: r(j-1) t(i)

5 -4- Dynamic Time Warping: Type 2 i j t(i-1) r(j) r(j-1) t(i) t: input MFCC matrix (Each row is a frame’s feature.) r: reference MFCC matrix Local paths: 0-45-90 degrees DTW recurrence:

6 -5- Local Path Constraints  Type 1  27-45-63 local paths  Type 2  0-45-90 local paths

7 -6- Path Penalty for Type-1 DTW  Path penalty  No penalty for 45-degree path  Some penalty for paths deviated from 45- degree

8 -7- DTW Paths of “ Match Corners ”  We assume the speed of a user ’ s acoustic input falls within 1/2 and 2 times of that of the intended sentence.  Both corners are fixed. (End point detection is critical.)  Suitable for voice command applications i j

9 -8- DTW Paths of “ Match Anywhere ”  No fixed anchored positions  Suitable for retrieval of personal spoken documents i j

10 -9- Other Variants  Local constraints  Start/ending area

11 -10- Implementation Issues  To save memory  Use 2-column table for type-1 DTW  Use 1-column table for type-2 DTW  To avoid too many if-then statements  Pad type-1 DTW with two-layer padding  Pad type-2 DTW with one-layer padding  To find a suitable path  Minimizing total distance  Minimizing average distance

12 -11- DTW Path of “ Match Corners ”

13 -12- DTW Path of “ Match Anywhere ”

14 -13- DTW Path of “ Match Anywhere ”

15 -14- DTW for Spoken Document Retrieval  Applications  Voice-based audio/video retrieval  Issues in SDR using DTW  Speaker normalization  Vocal track length normalization (VTLN)  Frequency warping  Efficiency

16 -15- DTW for Speaker-independent Voice Command Recognition  Applications  Digit recognition  Technical highlights  Extensive recordings  Clustering within each command  Some indexing methods for DTW  Suitable for small-vocabulary applications


Download ppt "DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學."

Similar presentations


Ads by Google