Presentation is loading. Please wait.

Presentation is loading. Please wait.

Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.

Similar presentations


Presentation on theme: "Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan."— Presentation transcript:

1

2 Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

3 -2--2- Intro to Endpoint Detection zEndpoint detection (EPD, 端點偵測 ) yGoal: Determine the start and end of voice activity yAlso known as voice activity detection (VAD) zImportance yActs as a preprocessing step for many recognition tasks yRequires as small computing power as possible zTwo activation modes for speech-base applications yPush to talk once  Offline EPD xExample: voice command yPush for continuously listening  Online EPD xExample: Dictation machine Quiz candidate!

4 -3--3- Types of Features for EPD zTime-domain yVolume only yVolume and ZCR (zero crossing rate) yVolume and HOD (high-order difference) y… zFrequency-domain yVariance of spectrum yEntropy of spectrum yMFCC y…

5 -4--4- Typical Frameworks to EPD zThresholding ySimple thresholding xCompute a feature (e.g., volume) from each frame xSelect a threshold v th to identify positive frames yCombined thresholding xUse two features (e.g., volume and ZCR) to make decision zStatic classification yTake features yPerform binary classification xNegative  sil or noise xPositive  sound activity zSequence alignment yUse hidden Markov models (HMM) for sequence alignment

6 -5--5- Performance Evaluation for EPD zTwo types of errors (typical for all binary classification) yFalse negative (aka false rejection) positive  negative yFalse positive (aka false acceptance) negative  positive zPerformance evaluation yStart & end position accuracy yFrame-based accuracy Quiz candidate!

7 -6--6- EPD by Volume Thresholding zThe simplest method for EPD yVolume is based on abs sum of frames. zFour intuitive way to select v th :  v th = v max *   v th = v median *   v th = v min *   v th = v 1 * 

8 -7--7- How Do They Fail? zUnfortunately… yAll the thresholds fail one way or another. yUnder what situations do they fail?  v th = v max *    Plosive sounds  v th = v median *    Silence too long  v th = v min *    Total-zero frame  v th = v 1 *    Unstable frame zWe need a a better strategy…

9 -8--8- A Better Strategy for Threshold Finding zA presumably better way to select v th yv lower = 3rd percentile of volumes yv upper = 97th percentile of volumes  v th = (v upper -v lower )*  +v lower zWhy do we need to use percentile? yTo deal with plosive sounds yTo deal total-zero frames zDoes it fail? Yes, still, in certain situation…

10 -9--9- Example: EPD by Volume zepdByVol01.mepdByVol01.m

11 -10- How to Enhance EPD by Volume? zMajor problem of EPD by volume yThreshold is hard to determine  Corpus-based fine-tuning yUnvoiced parts are likely to be ignored  We need a features to enhance the unvoiced parts  This can be achieved by ZCR or HOD

12 -11- ZCR for Unvoiced Sound Detection zZCR: zero crossing rate yNo. of zero crossing in a frame yz voiced ≤ z silence ≤ z unvoiced zExample: epdShowZcr01.m Quiz: If frame=[-1 2 -2 3 5 2 -2 1], what is its ZCR? Quiz candidate!

13 -12- EPD by Volume and ZCR 1.Determine initial endpoints by  u 2.Expand the initial endpoints based on  l 3.Further expand the endpoints based on ZCR threshold  zc

14 -13- Example: EPD by Volume and ZCR zepdByVolZcr01.mepdByVolZcr01.m

15 -14- EPD by Volume and HOD zAnother feature to enhance unvoiced sounds: yHigh order difference xOrder-1 HOD = sum(abs(diff(s))) xOrder-2 HOD = sum(abs(diff(diff(s)))) xOrder-3 HOD = sum(abs(diff(diff(diff(s))))) x… Quiz: If frame=[-1 2 -2 3 -3 2 -2 1], what is its order-1 HOD?

16 -15- Example: Plots of Volume and HOD zhighOrderDiff01.mhighOrderDiff01.m

17 -16- Example: EPD by Vol. and HOD zepdByVolHod01.mepdByVolHod01.m

18 -17- Hard Example: EPD by Vol. and HOD zA hard example: epdByVolHod02.mepdByVolHod02.m

19 -18- EPD by Spectrum zepdShowSpec01.mepdShowSpec01.mzepdShowSpec02.mepdShowSpec02.m

20 -19- How to Aggregate Spectrum? zHow to aggregate spectrum as a single feature which is larger (or smaller) when the spectral energy distribution is diversified? yEntropy function yGeometric mean over arithmetic mean

21 -20- Entropy Function zEntropy function zProperty zProof… Quiz candidate!

22 -21- Plots of Entropy Function zN=2 entropyPlot.m zN=3

23 -22- Spectral Entropy zPDF: zNormalization y zSpectral entropy: Reference: Jialin Shen, Jeihweih Hung, Linshan Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998

24 -23- Geometric/Arithmetic Means zArithmetic & Geometric means zProperty zProof… Quiz candidate!


Download ppt "Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan."

Similar presentations


Ads by Google