Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
2 Reference Jialin Shen, Jeihweih Hung, Linshan Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998
3 Summary Entropy-based algorithm for accurate and robust endpoint detection for speech recognition under noisy environments Better than energy-based algorithms in both detection accuracy and recognition performance Error reduction: 16%
4 Motivation Energy-based endpoint detection becomes less reliable when dealing with non- stationary noise and sound artifacts such as lip smacks, heavy breathing and mouth clicks, etc. Spectral entropy is effective in distinguishing the speech segments from the non-speech parts.
5 Spectral Entropy PDF: Normalization Spectral entropy:
6 Properties of Entropy N=2 entropyPlot.m N=3
7 Entropy Weighting A set of weighting factors can be applied: These weighting factors are statistically estimated from a large collection of speech signals.
8 Endpoint Detection The sum of the spectral entropy values over a duration of frames (20 frames) is first evaluated and smoothed by a median filter Some thresholds are used to detect the beginning and ending boundaries of the embedded speech segments A short period of background noise is first taken as the reference for some initial boundary detection process. Short speech segments (<100ms) are rejected.
9 Experiment Settings Speech database Isolated digits in Mandarin Chinese produced by 100 speakers (10 speakers for test, others for training) Speech features: 12-order MFCC and 12- order delta MFCC Models Continuous-density HMM 6 states/digits, 3 mixture/state
10 Experiment Settings Noise NOISEX-92 noise-in-speech database White noise, pink noise, volvo noise (car noise), F16 noise, machinegun noise Sound artifacts Breath noise, cough noise and mouse click noise.
11 Example
12 Experimental Results
13 Experimental Results
14 Something Not Clear… What is the sample rate? Bit resolution? What is the frame size and overlap? What is the order of the median filter? How to use the “short period of background noise”? What is the value for the thresholds of spectral entropy for determining boundaries? What are the values for 1 and 2 ?