Download presentation
Presentation is loading. Please wait.
Published byFelix Wilkinson Modified over 9 years ago
1
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA November 19, 2004
2
Overview Motivations for bat acoustic research Review bat call classification methods Contrast with 1970s human ASR Experiments Conclusions
3
Bat research motivations Bats are among: –the most diverse, –the most endangered, –and the least studied mammals. 1000 species, ~25% of all mammal species Close relationship with insects, agricultural impact, disease vectors Acoustical research non-invasive, significant domain (echolocation) Simplified biological acoustic communication system (compared to human speech)
4
Bat echolocation Ultrasonic, brief chirps Determine range, velocity of nearby objects (clutter, prey, conspecifics) Tailored for task, environment Tadarida brasiliensis (Mexican free-tailed bat) Listen to 10x time-expanded search calls:
5
Echolocation calls Two characteristics –Frequency modulated -- range –Constant frequency -- velocity Features (holistic) –Freq. extrema –Duration –Shape –# harmonics –Call interval Mexican free-tailed calls, concatenated
6
Current classification methods Expert sonogram readers –Manual or automatic feature extraction –Comparison with exemplar sonograms Automatic classification –Decision trees –Discriminant function analysis –Artificial neural networks –Spectrogram correlation Parallels the knowledge-based approach to human ASR from the 1970s (acoustic phonetics, expert systems, cognitive approach).
7
Acoustic phonetics Bottom up paradigm –Frames, boundaries, groups, phonemes, words Manual or automatic feature extraction –Formants, voicing, duration, intensity, transitions Classification –Decision tree, discriminant functions, neural network, Gaussian mixture model, Viterbi path DH AH F UH T B AO L G EY EM IH Z OW V ER
8
Acoustic phonetics limitations Variability of conversational speech –Complex rules, difficult to train Boundaries difficult to define –Coarticulation Feature estimates brittle –Variable noise robustness Hard decisions, errors accumulate Shifted to information theoretic paradigm of human ASR, better able to account for variability of speech, noise.
9
Information theoretic ASR Data-driven models from computer science –Non-parametric: dynamic time warp (DTW) –Parametric: hidden Markov model (HMM) Frame-based –Expert information in feature extraction –Models account for feature, temporal variability Information theoretic ASR dominates state-of-the-art speech understanding systems.
10
Data collection UF Bat House, home to 60,000 bats –Mexican free-tailed bat (vast majority) –Evening bat –Southeastern myotis Continuous recording –90 minutes around sunset –~20,000 calls Equipment: –B&K mic (4939), 100 kHz –B&K preamp (2670) –Custom amp/AA filter –NI 6036E 200kS/s A/D card –Laptop, Matlab
11
Experiment design Designs and assumptions –All recorded bats are Mexican free-tailed –Calls divided into different intraspecies calls –All calls are search phase –Hand-labeled call detection is complete (no discarded calls) Hand labels –Narrowband spectrogram –Endpoints, class label –436 calls in 261 0.5-sec sequences (2% of data) –Four classes, a priori: 34, 40, 20, 6% –All experiments on hand-labeled data only
12
Experiments Baseline –Features: Fmin, Fmax, Fmax_energy, and duration, from zero crossings and MUSIC –Classifier: Discriminant function analysis, quadratic boundaries DTW and HMM –Frame-based features: fundamental frequency (MUSIC super-resolution estimate), log energy, temporal derivatives (HMM only) –DTW: MUSIC frequencies, 10% endpoint range –HMM: 5 states/model, 4 Gaussian mixtures/state, diagonal covariances Tests –Leave one out –75% train, 25% test, 1000 trials –Test on train (HMM only)
13
Results Baseline, zero crossing –Leave one out: 72.5% correct –Repeated trials: 72.5 ± 4% (mean ± std) Baseline, MUSIC –Leave one out: 79.1% –Repeated trials: 77.5 ± 4% DTW, MUSIC –Leave one out: 74.5 % –Repeated trials: 74.1 ± 4% HMM, MUSIC –Test on train: 85.3 %
14
Confusion matrices 1234 1107381272.3% 22113416476.6% 322957064.8% 44301872.0% 72.5% Baseline, zero crossingBaseline, MUSIC DTW, MUSICHMM, MUSIC 1234 1110361174.3% 21214912285.1% 341866075.0% 43202080.0% 79.1% 1234 1115290477.7% 23213111174.9% 352063071.6% 45401664.0% 74.5% 1234 1118250579.7% 2101545688.0% 311275085.2% 400025100% 85.3%
15
Conclusions Human ASR algorithms applicable to bat echolocation calls Experiments –Weakness: accuracy of class labels –No labeled calls excluded –HMM most accurate, undertrained –MUSIC frequency estimate robust, slow Machine learning –DTW: fast training, slow classification –HMM: slow training, fast classification
16
Future work Find robust features of bat echolocation calls that match assumptions of machine learning algorithms –Noise robust –Distribution modeled by Gaussian mixtures Use hand-labeled subset of data to create call detection algorithm Explore unsupervised learning –Self-organized maps –Clustering Real-time portable detection/classification system on laptop PC
17
Further information http://www.cnel.ufl.edu/~markskow markskow@cnel.ufl.edu DTW reference: –L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993 HMM reference: –L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Readings in Speech Recognition, A. Waibel and K.- F. Lee, Eds., pp. 267–296. Kaufmann, San Mateo, CA, 1990.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.