Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.

Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

-2--2- Intro to Endpoint Detection zEndpoint detection (EPD, 端點偵測 ) yGoal: Determine the start and end of voice activity yAlso known as voice activity detection (VAD) zImportance yActs as a preprocessing step for many recognition tasks yRequires as small computing power as possible zTwo activation modes for speech-base applications yPush to talk once  Offline EPD xExample: voice command yPush for continuously listening  Online EPD xExample: Dictation machine Quiz candidate!

-3--3- Types of Features for EPD zTime-domain yVolume only yVolume and ZCR (zero crossing rate) yVolume and HOD (high-order difference) y… zFrequency-domain yVariance of spectrum yEntropy of spectrum yMFCC y…

-4--4- Typical Frameworks to EPD zThresholding ySimple thresholding xCompute a feature (e.g., volume) from each frame xSelect a threshold v th to identify positive frames yCombined thresholding xUse two features (e.g., volume and ZCR) to make decision zStatic classification yTake features yPerform binary classification xNegative  sil or noise xPositive  sound activity zSequence alignment yUse hidden Markov models (HMM) for sequence alignment

-5--5- Performance Evaluation for EPD zTwo types of errors (typical for all binary classification) yFalse negative (aka false rejection) positive  negative yFalse positive (aka false acceptance) negative  positive zPerformance evaluation yStart & end position accuracy yFrame-based accuracy Quiz candidate!

-6--6- EPD by Volume Thresholding zThe simplest method for EPD yVolume is based on abs sum of frames. zFour intuitive way to select v th :  v th = v max *   v th = v median *   v th = v min *   v th = v 1 * 

-7--7- How Do They Fail? zUnfortunately… yAll the thresholds fail one way or another. yUnder what situations do they fail?  v th = v max *    Plosive sounds  v th = v median *    Silence too long  v th = v min *    Total-zero frame  v th = v 1 *    Unstable frame zWe need a a better strategy…

-8--8- A Better Strategy for Threshold Finding zA presumably better way to select v th yv lower = 3rd percentile of volumes yv upper = 97th percentile of volumes  v th = (v upper -v lower )*  +v lower zWhy do we need to use percentile? yTo deal with plosive sounds yTo deal total-zero frames zDoes it fail? Yes, still, in certain situation…

-9--9- Example: EPD by Volume zepdByVol01.mepdByVol01.m

-10- How to Enhance EPD by Volume? zMajor problem of EPD by volume yThreshold is hard to determine  Corpus-based fine-tuning yUnvoiced parts are likely to be ignored  We need a features to enhance the unvoiced parts  This can be achieved by ZCR or HOD

-11- ZCR for Unvoiced Sound Detection zZCR: zero crossing rate yNo. of zero crossing in a frame yz voiced ≤ z silence ≤ z unvoiced zExample: epdShowZcr01.m Quiz: If frame=[-1 2 -2 3 5 2 -2 1], what is its ZCR? Quiz candidate!

-12- EPD by Volume and ZCR 1.Determine initial endpoints by  u 2.Expand the initial endpoints based on  l 3.Further expand the endpoints based on ZCR threshold  zc

-13- Example: EPD by Volume and ZCR zepdByVolZcr01.mepdByVolZcr01.m

-14- EPD by Volume and HOD zAnother feature to enhance unvoiced sounds: yHigh order difference xOrder-1 HOD = sum(abs(diff(s))) xOrder-2 HOD = sum(abs(diff(diff(s)))) xOrder-3 HOD = sum(abs(diff(diff(diff(s))))) x… Quiz: If frame=[-1 2 -2 3 -3 2 -2 1], what is its order-1 HOD?

-15- Example: Plots of Volume and HOD zhighOrderDiff01.mhighOrderDiff01.m

-16- Example: EPD by Vol. and HOD zepdByVolHod01.mepdByVolHod01.m

-17- Hard Example: EPD by Vol. and HOD zA hard example: epdByVolHod02.mepdByVolHod02.m

-18- EPD by Spectrum zepdShowSpec01.mepdShowSpec01.mzepdShowSpec02.mepdShowSpec02.m

-19- How to Aggregate Spectrum? zHow to aggregate spectrum as a single feature which is larger (or smaller) when the spectral energy distribution is diversified? yEntropy function yGeometric mean over arithmetic mean

-20- Entropy Function zEntropy function zProperty zProof… Quiz candidate!

-21- Plots of Entropy Function zN=2 entropyPlot.m zN=3

-22- Spectral Entropy zPDF: zNormalization y zSpectral entropy: Reference: Jialin Shen, Jeihweih Hung, Linshan Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998

-23- Geometric/Arithmetic Means zArithmetic & Geometric means zProperty zProof… Quiz candidate!

Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.

Similar presentations

Presentation on theme: "Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.

Similar presentations

Presentation on theme: "Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan."— Presentation transcript:

Similar presentations

About project

Feedback