Download presentation
Presentation is loading. Please wait.
1
Keyword Spotting Dynamic Time Warping
Ali Akbar Jabini Alexandre Mercier-Dalphond Spring 2006
2
Introduction Speech recognition: Computer can interpret speech
Need input to digitalize sounds Microphone People can speak faster than type Commercial systems available since 1990s People prefer Physical interactions Keyboard/Mouse, On/Off switch Low Accuracy for large vocabulary with noise (50%)
3
Introduction Speech recognition is more and more used for smaller vocabulary banks Credit Card Systems Simple switching commands Directory assistance Cheap to implement High Accuracy Can verify their interpretation Idea: speech recognition for household appliances
4
OUTLINE Area of investigation Concrete task/Goal Schematic
Feature extraction DTW Training Evaluation metrics Conclusion
5
Area of Investigation Keyword Spotting:
Subfield of speech recognition Grammar constrained Keyword Spotting in isolated word recognition Keywords utterances Keyword separated by silence Main technique is DTW
6
Concrete task/Goal Goal: develop a robust speaker independent keyword spotting scheme to operate household appliances Concrete tasks Digitalize the sound inputs Implementation in MatLab Train the model with the grammar Analyze the performances of our scheme
7
Schematic Microphone A/D Feature extraction DTW Output Grammar
8
Feature extraction Pre-emphasis Blocking into frames Windowing
Flattening the spectrum of the signal Blocking into frames Length of the Fourier Transform Windowing Sample window (maybe Hamming) Mel frequency Cepstral coefficients More reliable than LPC coefficients This will be imputed in the DTW algorithm
9
DTW Idea: smallest distance between an input and the training bank
Cepstrum features Dynamic programming: the time axis his not linear to account for utterances t0 -> t0+5 t1 -> t1-2
10
DTW
11
DTW
12
Training Need to create our own grammar Use this data with DTW
On: Onnn, Honnn, open, opeeenn Off: Hooofff, Hoff, offfff, close As many potential utterances as possible Use this data with DTW
13
Evaluation metrics Accuracy High noise Low noise Independent speaker
Training data speaker Would like to obtain 80% or more
14
Conclusion Early stage
No code implemented yet Many challenges a head Our methodology may change slightly There is a big potential market for such technique -> influence on every day life.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.