DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學.

DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學資工系 )

-2- Dynamic Time Warping (DTW)  Characteristics:  Pattern-matching-based approach  Require less memory/computation  Suitable for speaker-dependent recognition  Suitable for small to medium vocabulary  Suitable for microprocessor/chip implementation  Applications  Speaker identification & verification for surveillance  Voice commands for mobile phones, toys

-3- Dynamic Time Warping: Type 1 i j t(i-1) r(j) t: input MFCC matrix (Each column is a frame’s feature.) r: reference MFCC matrix Local paths: 27-45-63 degrees DTW recurrence: r(j-1) t(i)

-4- Dynamic Time Warping: Type 2 i j t(i-1) r(j) r(j-1) t(i) t: input MFCC matrix (Each row is a frame’s feature.) r: reference MFCC matrix Local paths: 0-45-90 degrees DTW recurrence:

-5- Local Path Constraints  Type 1  27-45-63 local paths  Type 2  0-45-90 local paths

-6- Path Penalty for Type-1 DTW  Path penalty  No penalty for 45-degree path  Some penalty for paths deviated from 45- degree

-7- DTW Paths of “ Match Corners ”  We assume the speed of a user ’ s acoustic input falls within 1/2 and 2 times of that of the intended sentence.  Both corners are fixed. (End point detection is critical.)  Suitable for voice command applications i j

-8- DTW Paths of “ Match Anywhere ”  No fixed anchored positions  Suitable for retrieval of personal spoken documents i j

-9- Other Variants  Local constraints  Start/ending area

-10- Implementation Issues  To save memory  Use 2-column table for type-1 DTW  Use 1-column table for type-2 DTW  To avoid too many if-then statements  Pad type-1 DTW with two-layer padding  Pad type-2 DTW with one-layer padding  To find a suitable path  Minimizing total distance  Minimizing average distance

-11- DTW Path of “ Match Corners ”

-12- DTW Path of “ Match Anywhere ”

-13- DTW Path of “ Match Anywhere ”

-14- DTW for Spoken Document Retrieval  Applications  Voice-based audio/video retrieval  Issues in SDR using DTW  Speaker normalization  Vocal track length normalization (VTLN)  Frequency warping  Efficiency

-15- DTW for Speaker-independent Voice Command Recognition  Applications  Digit recognition  Technical highlights  Extensive recordings  Clustering within each command  Some indexing methods for DTW  Suitable for small-vocabulary applications

DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學.

Similar presentations

Presentation on theme: "DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學.

Similar presentations

Presentation on theme: "DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學."— Presentation transcript:

Similar presentations

About project

Feedback