핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee Brain Science Research Center and Department of Electrical Engineering and Computer Science Korea Advanced Institute of Science and Technology
Contents Keyword Spotting Dynamic Time Warping (DTW) Meaning & Necessity Problems Dynamic Time Warping (DTW) Advantages of DTW Some conventional types & Proposed DTW type Experimental Results Verification of proposed DTW performance Standard threshold setting Results of various conditions Conclusions
Keyword Spotting Meaning Necessity Detection of pre-defined keywords in the continuous speech Example) Keywords : ‘open’, ‘window’ Input : “um…okay, uh… please open the…uh…window” Necessity Human may say OOV(Out Of Vocabulary), sometimes stammer But machine only needs some specific words for recognition
Problems & Goal Difficulties Goal of process of implementation End-Point-Detection of speech segment Rejection of OOVs of implementation A big load of calculations Complex algorithm Hard to build up a real hardware system Goal Simple & Fast Algorithm
DTW for Keyword Spotting Hidden Markov Model (HMM) A statistical model : need large number of datum for training Complex algorithm : hard to implement a hardware system Many parameters : can cause memory problem Dynamic Time Warping (DTW) Advantages Small number of datum for training Simple algorithm (addition & multiplication) Small number of stored datum Weak points Need EPD process, Many calculations
General DTW Process Known both End Points Repetition of searches Finding corresponding frames
Advanced DTW Myers, Rabiner and Rosenberg No EPD Process Series of small area searches Global search in one area Setting next area around the best match point of local area Reducing amount of calculations but still much Tested in isolated word recognition
Proposal – Shape & Weights No EPD process Only one path Select the best match point and search again at the point Less computations Modifying weights To compensate weight-sum differences For search For distance accumulation
Proposal – End Point Small search area End condition Successive local searches Start search at one point End condition When the point is on the last frame of Ref. pattern Setting up End Point automatically
Proposal – Distance Modifying distance Using differences of pattern lengths Pattern lengths of same words are similar each other
DTW – Computation Loads 3 types
Data Base & EX-SET DB SET construction RoadRally Usages For keyword spotting Based on telephone channel Usages 11 keywords (Total 434 occurrences) 40 male speakers read speech (Total 47 min.) in Stonehenge SET construction 4 sub-set (about 108 keywords / set) 3 set for training , 1 set for test 2 reference patterns / keyword / set
Verification Result Isolated Word Recognition Test Set 3 set for training , 1 set for test Test Set Recognition Rate (%) General DTW Proposed DTW 1 96.3 98.2 2 100.0 99.1 3 95.4 4 97.2 Avg. 97.5
Experimental Setup Assumption Threshold Result presentation Any frame can be the last frame of keywords Threshold To reject OOV 1 threshold / ref. Standard threshold : no false alarm in training set Result presentation ROC (Receiver Operator Characteristic) X-axis : false alarm / hour / keyword Y-axis : recognition rate
Thresholds Setting & Recognition Rate of Training Set Training set = Test set (No false alarm) Keyword Right Total % Mountain 21 40 52.5 Secondary 38 95.0 Middleton 27 37 73.0 Boonsboro 32 39 82.1 Conway 33 82.5 Thicket 30 77.0 Keyword Right Total % Primary 34 40 85.0 Minus 25 39 64.1 Interstate 37 92.5 Waterloo 35 87.5 Retrace 36 90.0 368 434 84.8
Result – DTW & HMM ROC Curve
Changing Conditions No. of Keywords No. of References
Conclusion Proposed DTW Keyword Spotting Advantages Good performance Simple structure : addition & multiplication (good for hardware) No EPD processing Very small computation load Small stored datum : small memory Only keyword information Good performance Keyword Spotting Better than HMM in the case of small training datum