Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539
Objective Discrimination between speech, music and environmental audio (special effects) using short 3-second samples To extract a relevant set of feature vectors from the audio samples To develop a pattern classifier that can successfully discriminate the three different classes based on the extracted vectors
Feature extraction Frequency Centroid Bandwidth
Feature extraction 3 sec audio sample (22050 Hz) 512-sample frames 512 point FFT Extract centroid, energy in 22 critical bands,and bandwidth 23.21ms, 512 samples, 25% overlap, Hanning Calculate log power ratios in each band Calculate mean, SD for centroid, log power ratios and bandwidth across all frames 2 1 Calculate silence ratio (SR) Concatenate mean, SD of centroid, log power ratios, bandwidth and silence ratio Save 49 dimension feature vector
Neural network development Create a database of 135 training and 45 testing samples Develop neural network using MATLAB Dynamically partition training samples using 25% for tuning Decide on network architecture (No. of hidden layers and neurons) Decide on network parameters like and Attempt classification using various combinations of feature vectors Feedforward Multi-layer perceptron with back-propagation training Designed network,
Results Classification rate of 82.37% after using critical sub-band ratios, frequency centroid, bandwidth and silence ratios Classification rate of 79.78% after using only critical sub-band ratios. Classification rate of 84.44% after using only frequency centroid, bandwidth and silence ratios but extremely slow training and variable results (2.34% std. dev. in classification rate) Baseline study: Study by Zhang and Kuo [1] a classification rate of ~ 90% was reported, using a rule-based heuristic. However better results are expected on increasing database size. References: [1] Hierarchical System for Content-based Audio Classification and Retrieval, Tong Zhang, C.-C. Jay Kuo, Proc. SPIE Vol. 3527, p , Multimedia Storage and Archiving Systems III, 1998