Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.

Similar presentations


Presentation on theme: "Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski."— Presentation transcript:

1 Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA November 19, 2004

2 Overview Motivations for bat acoustic research
Review bat call classification methods Contrast with 1970s human ASR Experiments Conclusions

3 Bat research motivations
Bats are among: the most diverse, the most endangered, and the least studied mammals. Close relationship with insects agricultural impact disease vectors Acoustical research non-invasive, significant domain (echolocation) Simplified biological acoustic communication system (compared to human speech)

4 Echolocation calls Features (holistic) Frequency extrema Duration
Shape # harmonics Call interval Mexican free-tailed calls, concatenated

5 Current classification methods
Expert spectrogram readers Manual or automatic feature extraction Comparison with exemplar spectrograms Automatic classification Decision trees Discriminant function analysis Parallels the knowledge-based approach to human ASR from the 1970s (acoustic phonetics, expert systems, cognitive approach).

6 Acoustic phonetics Bottom up paradigm
DH AH F UH T B AO L G EY EM IH Z OW V ER Bottom up paradigm Frames, boundaries, groups, phonemes, words Manual or automatic feature extraction Determined by experts to be important for speech Classification Decision tree, discriminant functions, neural network, Gaussian mixture model, Viterbi path

7 Acoustic phonetics limitations
Variability of conversational speech Complex rules, difficult to implement Feature estimates brittle Variable noise robustness Hard decisions, errors accumulate Shifted to information theoretic (machine learning) paradigm of human ASR, better able to account for variability of speech, noise.

8 Information theoretic ASR
Data-driven models from computer science Non-parametric: dynamic time warp (DTW) Parametric: hidden Markov model (HMM) Frame-based Expert information in feature extraction Models account for feature, temporal variability

9 Data collection UF Bat House, home to 60,000 bats Continuous recording
Mexican free-tailed bat (vast majority) Evening bat Southeastern myotis Continuous recording 90 minutes around sunset ~20,000 calls Equipment: B&K mic (4939), 100 kHz B&K preamp (2670) Custom amp/AA filter NI 6036E 200kS/s A/D card Laptop, Matlab

10 Experiment design Hand labels 436 calls (2% of data)
Four classes, a priori: 34, 40, 20, 6% All experiments on hand-labeled data only No hand-labeled calls excluded from experiments

11 Experiments Baseline DTW and HMM Features Classifier HMM Zero crossing
MUSIC super resolution frequency estimator Classifier Discriminant function analysis, quadratic boundaries DTW and HMM Frequency (MUSIC), log energy, first derivatives (HMM only) HMM 5 states/model 4 Gaussian mixtures/state diagonal covariances

12 Results Baseline, zero crossing Baseline, MUSIC DTW, MUSIC HMM, MUSIC
Leave one out: 72.5% correct Repeated trials: 72.5 ± 4% (mean ± std) Baseline, MUSIC Leave one out: 79.1% Repeated trials: 77.5 ± 4% DTW, MUSIC Leave one out: % Repeated trials: 74.1 ± 4% HMM, MUSIC Test on train: %

13 Confusion matrices Baseline, zero crossing Baseline, MUSIC DTW, MUSIC
1 2 3 4 107 38 72.3% 21 134 16 76.6% 29 57 64.8% 18 72.0% 72.5% 1 2 3 4 110 36 74.3% 12 149 85.1% 18 66 75.0% 20 80.0% 79.1% DTW, MUSIC HMM, MUSIC 1 2 3 4 115 29 77.7% 32 131 11 74.9% 5 20 63 71.6% 16 64.0% 74.5% 1 2 3 4 118 25 5 79.7% 10 154 6 88.0% 12 75 85.2% 100% 85.3%

14 Conclusions Human ASR algorithms applicable to bat echolocation calls
Experiments Weakness: accuracy of class labels HMM most accurate, undertrained MUSIC frequency estimate robust, slow Machine learning DTW: fast training, slow classification HMM: slow training, fast classification

15 Further information http://www.cnel.ufl.edu/~markskow
DTW reference: L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993 HMM reference: L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Readings in Speech Recognition, A. Waibel and K.-F. Lee, Eds., pp. 267–296. Kaufmann, San Mateo, CA, 1990.


Download ppt "Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski."

Similar presentations


Ads by Google