Download presentation
Presentation is loading. Please wait.
1
Endpoint Detection ( 端點偵測)
Jyh-Shing Roger Jang (張智星) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan
2
Intro to Endpoint Detection
Endpoint detection (EPD, 端點偵測) Goal: Determine the start and end of voice activity Also known as voice activity detection (VAD) Importance Acts as a preprocessing step for speech-based app. Requires as small computing power as possible Two modes for recording for speech-base app. Push to talk Offline EPD Example: Voice command Continuously listening Online EPD Example: Dialog system, such as SIRI Cell phone too! Quiz!
3
Types of Features for EPD
Time-domain Volume only Volume and ZCR (zero crossing rate) Volume and HOD (high-order difference) … Frequency-domain Variance of spectrum Entropy of spectrum Spectrum MFCC … Some features belong to both!
4
Typical Approaches to EPD
Thresholding Simple thresholding Compute a feature (e.g., volume) from each frame Select a threshold vth to identify frames of voice activity Combined thresholding Use two features (e.g., volume and ZCR) to make decision Static classification Extract features Perform binary classification Negative sil or noise Positive voice activity Sequence alignment Use hidden Markov models (HMM) for sequence alignment You need to use these approaches in EPD program competition.
5
Performance Evaluation for EPD (1/2)
Two types of errors (typical for all binary classification) False negative (aka false rejection) positive negative False positive (aka false acceptance) negative positive Confusion matrix/table Quiz!
6
Performance Evaluation for EPD (2/2)
Typical methods Start & end position accuracy Frame-based accuracy Quiz!
8
EPD by Volume Thresholding
The simplest method for EPD Volume is abs sum of samples in a frame. Four intuitive way to select vth: vth = vmax*a vth = vmedian*b vth = vmin*g vth = v1*d
9
How Do They Fail? Unfortunately… We need a a better strategy…
All the thresholds fail one way or another. Under what situations do they fail? vth = vmax*a Plosive sounds vth = vmedian*b Silence too long vth = vmin*g Total-zero frame vth = v1*d Unstable frame We need a a better strategy…
10
A Better Strategy for Threshold Finding
A presumably better way to select vth vlower = 3rd percentile of volumes vupper = 97th percentile of volumes vth = (vupper-vlower)*k+vlower Why do we need to use percentile? To deal with plosive sounds To deal total-zero frames Does it fail? Yes, still, in certain situation…
11
Example: EPD by Volume epdByVol01.m
12
How to Enhance EPD by Volume?
Major problem of EPD by volume Threshold is hard to determine Corpus-based fine-tuning Unvoiced parts are likely to be ignored We need a feature to enhance the unvoiced parts This can be achieved by ZCR or HOD
14
ZCR for Unvoiced Sound Detection
ZCR: zero crossing rate No. of zero crossing in a frame ZCRvoiced < ZCRsilence < ZCRunvoiced Example: epdShowZcr01.m Quiz: If frame=[ ], what is its ZCR? Quiz!
15
EPD by Volume and ZCR Determine initial endpoints by tu
Expand the initial endpoints based on tl Further expand the endpoints based on ZCR threshold tzc
16
Example: EPD by Volume and ZCR
epdByVolZcr01.m
18
EPD by Volume and HOD Another feature to enhance unvoiced sounds:
High order difference Order-1 HOD = sum(abs(diff(s))) Order-2 HOD = sum(abs(diff(diff(s)))) Order-3 HOD = sum(abs(diff(diff(diff(s))))) … Quiz: If frame=[ ], what is its order-n HOD when n is 1, 2, and 3?
19
Example: Plots of Volume and HOD
highOrderDiff01.m
20
Example: EPD by Vol. and HOD
epdByVolHod01.m
21
Hard Example: EPD by Vol. and HOD
A hard example: epdByVolHod02.m
23
Spectrogram Goal MATLAB command Facts
Describe energy distribution in each frame along time MATLAB command [S,F,T] = spectrogram(signal, frameSize, overlap, fftSize, fs); Facts Real signals for FFT Complex conjugate spectrum Take first frameSize/2+1 points when we consider magnitude only Use zero padding to have a larger fftSize finer freq resolution
24
EPD by Spectrum epdShowSpec01.m epdShowSpec02.m
25
How to Aggregate Spectrum?
How to aggregate spectrum as a single feature which is larger (or smaller) when the spectral energy distribution is diversified? Entropy function Geometric mean over arithmetic mean
26
Entropy Function (1/2) Quiz! Entropy function Property
27
Entropy Function (2/2) Quiz! Proof by taking derivative
28
Plots of Entropy Function
entropyPlot.m n=3
29
Spectral Entropy PDF: Normalization Spectral entropy:
Reference: Jialin Shen, Jeihweih Hung, Linshan Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998
30
Geometric/Arithmetic Means
Arithmetic & Geometric means Property Proof… Quiz!
32
Classification Based EPD
Classify each frame into silence or not Feature of a frame Magnitude/power spectrum Others: ZCR, HOD, entropy, gm/am, … Static classifiers to detect S from UV KNNC, NBC, SVM, NN, … Sequence aligner to find boundaries of SUV & UVS HMM, CRF, … Use Machine Learning Toolbox!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.