Download presentation
Presentation is loading. Please wait.
Published byAnis Bennett Modified over 9 years ago
2
DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學 資工系 )
3
-2- Dynamic Time Warping (DTW) Characteristics: Pattern-matching-based approach Require less memory/computation Suitable for speaker-dependent recognition Suitable for small to medium vocabulary Suitable for microprocessor/chip implementation Applications Speaker identification & verification for surveillance Voice commands for mobile phones, toys
4
-3- Dynamic Time Warping: Type 1 i j t(i-1) r(j) t: input MFCC matrix (Each column is a frame’s feature.) r: reference MFCC matrix Local paths: 27-45-63 degrees DTW recurrence: r(j-1) t(i)
5
-4- Dynamic Time Warping: Type 2 i j t(i-1) r(j) r(j-1) t(i) t: input MFCC matrix (Each row is a frame’s feature.) r: reference MFCC matrix Local paths: 0-45-90 degrees DTW recurrence:
6
-5- Local Path Constraints Type 1 27-45-63 local paths Type 2 0-45-90 local paths
7
-6- Path Penalty for Type-1 DTW Path penalty No penalty for 45-degree path Some penalty for paths deviated from 45- degree
8
-7- DTW Paths of “ Match Corners ” We assume the speed of a user ’ s acoustic input falls within 1/2 and 2 times of that of the intended sentence. Both corners are fixed. (End point detection is critical.) Suitable for voice command applications i j
9
-8- DTW Paths of “ Match Anywhere ” No fixed anchored positions Suitable for retrieval of personal spoken documents i j
10
-9- Other Variants Local constraints Start/ending area
11
-10- Implementation Issues To save memory Use 2-column table for type-1 DTW Use 1-column table for type-2 DTW To avoid too many if-then statements Pad type-1 DTW with two-layer padding Pad type-2 DTW with one-layer padding To find a suitable path Minimizing total distance Minimizing average distance
12
-11- DTW Path of “ Match Corners ”
13
-12- DTW Path of “ Match Anywhere ”
14
-13- DTW Path of “ Match Anywhere ”
15
-14- DTW for Spoken Document Retrieval Applications Voice-based audio/video retrieval Issues in SDR using DTW Speaker normalization Vocal track length normalization (VTLN) Frequency warping Efficiency
16
-15- DTW for Speaker-independent Voice Command Recognition Applications Digit recognition Technical highlights Extensive recordings Clustering within each command Some indexing methods for DTW Suitable for small-vocabulary applications
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.