Jacob Zurasky ECE5526 – Spring 2011 Acoustic Modeling Jacob Zurasky ECE5526 – Spring 2011
Project Goals Generate an Acoustic Model for digit recognition Create a MATLAB tool to transcribe training data to be used in model generation Create a MATLAB tool to generate an Acoustic Model based on the training data and transcriptions
What is an Acoustic Model? A statistical representation of sounds that make up a word. Created from feature vectors that are extracted from the training data
Feature Vectors For each window of speech (~10ms), compress speech data to 39 MFCC’s Impossible to directly compare speech samples to identify characteristics
MFCC’s Mel Frequency Cepstral Coefficients Models the sound samples in a way that is similar to the way the human ear perceives sound Utterance is broken into windows, ~10mS Apply a hamming window to the section
MFCC’s FFT to obtain the log frequency spectrum Mel Filters applied to log FFT output DCT applied to Mel Filter output Result is 12 MFCC’s Compute energy of the signal
Training Data – ‘mark_words.m’ Obtain data containing the words you wish to build an acoustic model for Create a transcription file for each utterance
Training Data – ‘model.m’ For each part of a word to be analyzed, collect all feature vectors associated Repeat for each utterance Calculate the means and variances for each dimension of the feature vector for a given part of the word
Training Data – ‘evaluate.m’ Implements classification of sound Compares current feature vector to the acoustic model Determines if probability is high enough to transition states
Issues Encountered Small training data set Some digits need more than two sections Time constraints of making all transcriptions 1,3,5,7 models fairly reliable 2,4,6 models need refinement
Future Goals Implement true HMM system for classification Add automated training and re-evaluation of the HMM Apply techniques to analysis of breathing sounds for sleep disorder recognition