Download presentation
Published byLawrence Hampton Modified over 9 years ago
1
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA November 19, 2004
2
Overview Motivations for bat acoustic research
Review bat call classification methods Contrast with 1970s human ASR Experiments Conclusions
3
Bat research motivations
Bats are among: the most diverse, the most endangered, and the least studied mammals. Close relationship with insects agricultural impact disease vectors Acoustical research non-invasive, significant domain (echolocation) Simplified biological acoustic communication system (compared to human speech)
4
Echolocation calls Features (holistic) Frequency extrema Duration
Shape # harmonics Call interval Mexican free-tailed calls, concatenated
5
Current classification methods
Expert spectrogram readers Manual or automatic feature extraction Comparison with exemplar spectrograms Automatic classification Decision trees Discriminant function analysis Parallels the knowledge-based approach to human ASR from the 1970s (acoustic phonetics, expert systems, cognitive approach).
6
Acoustic phonetics Bottom up paradigm
DH AH F UH T B AO L G EY EM IH Z OW V ER Bottom up paradigm Frames, boundaries, groups, phonemes, words Manual or automatic feature extraction Determined by experts to be important for speech Classification Decision tree, discriminant functions, neural network, Gaussian mixture model, Viterbi path
7
Acoustic phonetics limitations
Variability of conversational speech Complex rules, difficult to implement Feature estimates brittle Variable noise robustness Hard decisions, errors accumulate Shifted to information theoretic (machine learning) paradigm of human ASR, better able to account for variability of speech, noise.
8
Information theoretic ASR
Data-driven models from computer science Non-parametric: dynamic time warp (DTW) Parametric: hidden Markov model (HMM) Frame-based Expert information in feature extraction Models account for feature, temporal variability
9
Data collection UF Bat House, home to 60,000 bats Continuous recording
Mexican free-tailed bat (vast majority) Evening bat Southeastern myotis Continuous recording 90 minutes around sunset ~20,000 calls Equipment: B&K mic (4939), 100 kHz B&K preamp (2670) Custom amp/AA filter NI 6036E 200kS/s A/D card Laptop, Matlab
10
Experiment design Hand labels 436 calls (2% of data)
Four classes, a priori: 34, 40, 20, 6% All experiments on hand-labeled data only No hand-labeled calls excluded from experiments
11
Experiments Baseline DTW and HMM Features Classifier HMM Zero crossing
MUSIC super resolution frequency estimator Classifier Discriminant function analysis, quadratic boundaries DTW and HMM Frequency (MUSIC), log energy, first derivatives (HMM only) HMM 5 states/model 4 Gaussian mixtures/state diagonal covariances
12
Results Baseline, zero crossing Baseline, MUSIC DTW, MUSIC HMM, MUSIC
Leave one out: 72.5% correct Repeated trials: 72.5 ± 4% (mean ± std) Baseline, MUSIC Leave one out: 79.1% Repeated trials: 77.5 ± 4% DTW, MUSIC Leave one out: % Repeated trials: 74.1 ± 4% HMM, MUSIC Test on train: %
13
Confusion matrices Baseline, zero crossing Baseline, MUSIC DTW, MUSIC
1 2 3 4 107 38 72.3% 21 134 16 76.6% 29 57 64.8% 18 72.0% 72.5% 1 2 3 4 110 36 74.3% 12 149 85.1% 18 66 75.0% 20 80.0% 79.1% DTW, MUSIC HMM, MUSIC 1 2 3 4 115 29 77.7% 32 131 11 74.9% 5 20 63 71.6% 16 64.0% 74.5% 1 2 3 4 118 25 5 79.7% 10 154 6 88.0% 12 75 85.2% 100% 85.3%
14
Conclusions Human ASR algorithms applicable to bat echolocation calls
Experiments Weakness: accuracy of class labels HMM most accurate, undertrained MUSIC frequency estimate robust, slow Machine learning DTW: fast training, slow classification HMM: slow training, fast classification
15
Further information http://www.cnel.ufl.edu/~markskow
DTW reference: L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993 HMM reference: L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Readings in Speech Recognition, A. Waibel and K.-F. Lee, Eds., pp. 267–296. Kaufmann, San Mateo, CA, 1990.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.