VAD (Voice Activity Detector) Supervised by Dr. Kepuska By Preetham Nosum
VAD Part of Front End Module – Spectrum, MFCC Detects speech using various features Overall flag set when individual flags are all ON
Tools Visual C++ - Parameters stored into respective extensions Matlab - Graphs plotted from the data
VAD Features Energy Feature MFCC Feature Spectrum Feature MFCC Enhanced Feature
Energy Feature Input used from the previous module – Frame energy Mean Frame energy calculated Compared to current frame energy Energy flag set if the difference is high Mean not calculated during the VAD ON stage
MFCC Feature MFCC feature calculated from vector Each frame compared to the overall mean mfcc Deviation from mean sets the MFCC flag Mean not calculated during the VAD ON stage
Spectrum Feature Uses Variance for detectioin How much the signal changed after each frame Mean compared to variance Flag is set after a certain threshold is crossed
MFCC Enhanced Feature Uses Hybrid of Spectrum and MFCC Two ways to detect: Variance mean and Variance of variance Most sensitive of all features Flag set ON/OFF based on two different conditions
Logic Overall flag set when all the flags are turned on Overall flag turned off when any one of the feature flags is turned off Waits certain frames to make sure it’s speech
Future Needs more refinement Test the MFCC and MFCC Enhanced features with different set of MFCC vector values Test with more data
Refrences Discrete-Time Speech Signal Processing, Thomas F. Quantieri
Questions?