Presentation is loading. Please wait.

Presentation is loading. Please wait.

Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539.

Similar presentations


Presentation on theme: "Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539."— Presentation transcript:

1 Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539

2 Objective Discrimination between speech, music and environmental audio (special effects) using short 3-second samples To extract a relevant set of feature vectors from the audio samples To develop a pattern classifier that can successfully discriminate the three different classes based on the extracted vectors

3 Feature extraction Frequency Centroid Bandwidth

4 Feature extraction 3 sec audio sample (22050 Hz) 512-sample frames 512 point FFT Extract centroid, energy in 22 critical bands,and bandwidth 23.21ms, 512 samples, 25% overlap, Hanning Calculate log power ratios in each band Calculate mean, SD for centroid, log power ratios and bandwidth across all frames 2 1 Calculate silence ratio (SR) Concatenate mean, SD of centroid, log power ratios, bandwidth and silence ratio Save 49 dimension feature vector

5 Neural network development Create a database of 135 training and 45 testing samples Develop neural network using MATLAB Dynamically partition training samples using 25% for tuning Decide on network architecture (No. of hidden layers and neurons) Decide on network parameters like  and  Attempt classification using various combinations of feature vectors Feedforward Multi-layer perceptron with back-propagation training 49 20 3 Designed network, 49-20-3

6 Results Classification rate of 82.37% after using critical sub-band ratios, frequency centroid, bandwidth and silence ratios Classification rate of 79.78% after using only critical sub-band ratios. Classification rate of 84.44% after using only frequency centroid, bandwidth and silence ratios but extremely slow training and variable results (2.34% std. dev. in classification rate) Baseline study: Study by Zhang and Kuo [1] a classification rate of ~ 90% was reported, using a rule-based heuristic. However better results are expected on increasing database size. References: [1] Hierarchical System for Content-based Audio Classification and Retrieval, Tong Zhang, C.-C. Jay Kuo, Proc. SPIE Vol. 3527, p. 398-409, Multimedia Storage and Archiving Systems III, 1998


Download ppt "Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539."

Similar presentations


Ads by Google