Download presentation
Presentation is loading. Please wait.
Published byMyron Paul Modified over 8 years ago
1
Automatic Speech Processing Methods For Bioacoustic Signal Analysis: A Case Study Of Cross-Disciplinary Acoustic Research Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab, University of Florida, Gainesville, FL ABSTRACT Automatic speech processing research has produced many advances in the analysis of time series. Knowledge of the production and perception of speech has guided the design of many useful algorithms, and automatic speech recognition has been at the forefront of the machine learning paradigm. In contrast to the advances made in automatic speech processing, analysis of other bioacoustic signals, such as those from dolphins and bats, has lagged behind. In this paper, we demonstrate how techniques from automatic speech processing can significantly impact bioacoustic analysis, using echolocating bats as our model animal. Compared to conventional techniques, machine learning methods reduced detection and species classification error rates by an order of magnitude. Furthermore, the signal-to-noise ratio of an audible monitoring signal was improved by 12 dB using techniques from noise-robust feature extraction and speech synthesis. The work demonstrates the impact that speech research can have across disciplines. DETECTION BIBLIOGRAPHY [1] M. K. Obrist, “Flexible bat echolocation: the influence of individual, habitat and conspecifics on sonar signal design,” Behav. Ecol. Sociobiol., vol. 36, pp. 207-219, 1995 [2] S. Parsons and G. Jones, “Acoustic identification of twelve species of echolocating bat by discriminant function analysis and artificial neural networks,” J. Exp. Biol., vol. 203, pp. 2641-2656, 2000 [3] M. D. Skowronski and J. G. Harris, “Acoustic detection and classification of microchiroptera using machine learning: lessons learned from automatic speech recognition,” J. Acoust. Soc. Am., 2005, submitted [4] M. B. Fenton and G. P. Bell, “Recognition of species of insectivorous bats by their echolocation calls,” J. Mammal., vol. 62, no. 2, pp. 233-243, May 1981 [5] M. J. O'Farrell, B. W. Miller, and W. L. Gannon, “Qualitative identification of free-flying bats using the Anabat detector,” J. Mammal., vol. 80, no. 1, pp. 11-23, Jan. 1999 [6] M. K. Obrist, “Flexible bat echolocation: the influence of individual, habitat and conspecifics on sonar signal design,” Behav. Ecol. Sociobiol., vol. 36, pp. 207-219, 1995 [7] M. K. Obrist, R. Boesch, and P. F. Fluckiger, “Variability in echolocation call design of 26 Swiss bat species: consequences, limits and options for automated field identification with a synergetic pattern recognition approach,” Mammalia, vol. 68, no. 4, pp. 307- 322, Dec. 2004 [8] R. F. Lance, B. Bollich, C. L. Callahan, and P. L. Leberg, “Surveying forest-bat communities with Anabat detectors,” in Bats and Forests Symposium, R. M. R. Barclay and R. M. Brigham, eds., Res. Br., B.C. Min. For., Victoria, B.C., CA, pp. 175-184, 1996 [9] D. Russo and G. Jones, “Identification of twenty-two bat species (Mammalia: Chiroptera) from Italy by analysis of time-expanded recordings of echolocation calls,” J. Zool., Lond., vol. 258, no. 1, pp. 91-103, Sept. 2002 [10] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Readings in Speech Recognition, A. Waibel and K.-F. Lee, eds., Kaufmann, San Mateo, CA, pp. 267-296, 1990 [11] S. Parsons, A. M. Boonman, and M. K. Obrist, “Advantages and disadvantages of techniques for transforming and analyzing chiropteran echolocation calls,” J. Mammal., vol. 81, no. 4, pp. 927-938, Nov. 2000 Conventional method [1,2]: x k (n) - frame k of raw signal x(n) E(k) - energy in frame k L - frame length (~1ms) Gaussian mixture model (GMM) [3]: x k - input features for frame k : spectral peak amplitude, frequency at peak amplitude, first- and second-order temporal derivatives ω i - class of signal: i = 1 for background frames, i = 2 for call frames p (x k |ω i ) - class-conditional probability density for frame k of input feature vector x given class ω i G - Gaussian kernel with mean vector μ and covariance matrix Σ, estimated from hand-labeled data w i,m, μ i,m, Σ i,m - mixture weight, mean, and covariance of m th kernel for class ω i d(k) - detection decision for frame k θ - likelihood threshold Conventional method: DETECTORTFSensitivity GMM1010 4314 3985 176 96615 0.96 Peak energy1010 4132 7819 358 92781 0.92 Broadband energy 1010 3047 31891 1443 68709 0.68 Confusion matrices at equal sensitivity and specificity:105,090 detection blocks (20 ms) Features [2,4-8]: min frequency, max frequency, frequency at peak amplitude, and duration, extracted from hand-labeled calls using noise-robust methods [3]. Classifier [2,7-9]: discriminant function analysis (DFA) with stratified covariance matrices (quadratic) Gaussian mixture model (GMM) classifer: Hidden Markov model (HMM) classifier [10]: Same as GMM detector, except ω i represent each species. Averaged log likelihood over all K frames of a call was calculated for each class, and the classifier output was the label of the class with the maximum averaged log likelihood. State model of nonstationary signal, each state represents pseudo- stationary probability density function with a GMM. One model for each species was trained using the Baum-Welch algorithm on hand- labeled calls. Testing was performed using the Viterbi dynamic programming algorithm, which determines the log likelihood of the single most likely state sequence through a model. Pipistrellus bodenheimeri: Tadarida brasiliensis: Lasiurus borealis: Molossus molossus: Detector output examples: Each gray column is a hand-labeled call from a pass of 25 calls from L. borealis. The black horizontal line represents θ for equal sensitivity and specificity. HMMPbMmLbLcTbTotal Pb99.8 ± 0.30 ± 0 0.2 ± 0.399.8 ± 0.3 Mm0.03 ± 0.295.6 ± 1.60 ± 0 4.3 ± 1.695.6 ± 1.6 Lb0 ± 0 99.8 ± 0.10.2 ± 0.10 ± 099.8 ± 0.1 Lc0 ± 0 0.2 ± 0.299.8 ± 0.20 ± 099.8 ± 0.2 Tb0 ± 00.2 ± 0.20 ± 0 99.8 ± 0.2 Total99.4 ± 0.2 DFAPbMmLbLcTbTotal Pb97.1 ± 0.80.2 ± 0.22.7 ± 0.80 ± 0 97.1 ± 0.8 Mm0.6 ± 0.576.7 ± 34.1 ± 217.3 ± 31.3 ± 0.676.7 ± 3 Lb1.2 ± 0.416.9 ± 1.579.6 ± 1.30.3 ± 0.32.1 ± 0.579.6 ± 1.3 Lc0 ± 01.1 ± 0.90.3 ± 0.589.7 ± 1.48.8 ± 0.989.7 ± 1.4 Tb0 ± 06.6 ± 1.55.4 ± 1.416.5 ± 371.4 ± 3 Total83.1 ± 1.1 CLASSIFICATION DETECTION Average and st. dev. over 20 trials of randomly selected test and train calls, 50% test, 50% train. The GMM and HMM results were statistically indistinguishable (t-test, p>0.9 ). Classification confusion matrices: d(k) - detection decision θ - energy threshold SYNTHESIS Conventional methods: Frequency division, time expansion, zero crossings, heterodyne [11]. Synthetic method: For each frame i of features [amplitude(i), frequency(i)]: Spectral mean subtraction:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.