Download presentation
Presentation is loading. Please wait.
Published byDana Black Modified over 9 years ago
2
Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan
3
-2--2- Intro to Endpoint Detection zEndpoint detection (EPD, 端點偵測 ) yGoal: Determine the start and end of voice activity yAlso known as voice activity detection (VAD) zImportance yActs as a preprocessing step for many recognition tasks yRequires as small computing power as possible zTwo activation modes for speech-base applications yPush to talk once Offline EPD xExample: voice command yPush for continuously listening Online EPD xExample: Dictation machine Quiz candidate!
4
-3--3- Types of Features for EPD zTime-domain yVolume only yVolume and ZCR (zero crossing rate) yVolume and HOD (high-order difference) y… zFrequency-domain yVariance of spectrum yEntropy of spectrum yMFCC y…
5
-4--4- Typical Frameworks to EPD zThresholding ySimple thresholding xCompute a feature (e.g., volume) from each frame xSelect a threshold v th to identify positive frames yCombined thresholding xUse two features (e.g., volume and ZCR) to make decision zStatic classification yTake features yPerform binary classification xNegative sil or noise xPositive sound activity zSequence alignment yUse hidden Markov models (HMM) for sequence alignment
6
-5--5- Performance Evaluation for EPD zTwo types of errors (typical for all binary classification) yFalse negative (aka false rejection) positive negative yFalse positive (aka false acceptance) negative positive zPerformance evaluation yStart & end position accuracy yFrame-based accuracy Quiz candidate!
7
-6--6- EPD by Volume Thresholding zThe simplest method for EPD yVolume is based on abs sum of frames. zFour intuitive way to select v th : v th = v max * v th = v median * v th = v min * v th = v 1 *
8
-7--7- How Do They Fail? zUnfortunately… yAll the thresholds fail one way or another. yUnder what situations do they fail? v th = v max * Plosive sounds v th = v median * Silence too long v th = v min * Total-zero frame v th = v 1 * Unstable frame zWe need a a better strategy…
9
-8--8- A Better Strategy for Threshold Finding zA presumably better way to select v th yv lower = 3rd percentile of volumes yv upper = 97th percentile of volumes v th = (v upper -v lower )* +v lower zWhy do we need to use percentile? yTo deal with plosive sounds yTo deal total-zero frames zDoes it fail? Yes, still, in certain situation…
10
-9--9- Example: EPD by Volume zepdByVol01.mepdByVol01.m
11
-10- How to Enhance EPD by Volume? zMajor problem of EPD by volume yThreshold is hard to determine Corpus-based fine-tuning yUnvoiced parts are likely to be ignored We need a features to enhance the unvoiced parts This can be achieved by ZCR or HOD
12
-11- ZCR for Unvoiced Sound Detection zZCR: zero crossing rate yNo. of zero crossing in a frame yz voiced ≤ z silence ≤ z unvoiced zExample: epdShowZcr01.m Quiz: If frame=[-1 2 -2 3 5 2 -2 1], what is its ZCR? Quiz candidate!
13
-12- EPD by Volume and ZCR 1.Determine initial endpoints by u 2.Expand the initial endpoints based on l 3.Further expand the endpoints based on ZCR threshold zc
14
-13- Example: EPD by Volume and ZCR zepdByVolZcr01.mepdByVolZcr01.m
15
-14- EPD by Volume and HOD zAnother feature to enhance unvoiced sounds: yHigh order difference xOrder-1 HOD = sum(abs(diff(s))) xOrder-2 HOD = sum(abs(diff(diff(s)))) xOrder-3 HOD = sum(abs(diff(diff(diff(s))))) x… Quiz: If frame=[-1 2 -2 3 -3 2 -2 1], what is its order-1 HOD?
16
-15- Example: Plots of Volume and HOD zhighOrderDiff01.mhighOrderDiff01.m
17
-16- Example: EPD by Vol. and HOD zepdByVolHod01.mepdByVolHod01.m
18
-17- Hard Example: EPD by Vol. and HOD zA hard example: epdByVolHod02.mepdByVolHod02.m
19
-18- EPD by Spectrum zepdShowSpec01.mepdShowSpec01.mzepdShowSpec02.mepdShowSpec02.m
20
-19- How to Aggregate Spectrum? zHow to aggregate spectrum as a single feature which is larger (or smaller) when the spectral energy distribution is diversified? yEntropy function yGeometric mean over arithmetic mean
21
-20- Entropy Function zEntropy function zProperty zProof… Quiz candidate!
22
-21- Plots of Entropy Function zN=2 entropyPlot.m zN=3
23
-22- Spectral Entropy zPDF: zNormalization y zSpectral entropy: Reference: Jialin Shen, Jeihweih Hung, Linshan Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998
24
-23- Geometric/Arithmetic Means zArithmetic & Geometric means zProperty zProof… Quiz candidate!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.